

浏览全部资源
扫码关注微信
1.中山大学智能工程学院,广东 深圳518107
2.广东省智能交通系统重点实验室,广东 深圳518107
3.广东省公安厅科技信息化总队,广东 广州510050
Received:09 February 2026,
Revised:2026-03-26,
Accepted:01 April 2026,
Online First:15 May 2026,
移动端阅览
Li Xiying, Wu Hao, Pan Huayan, et al. Fine-grained detection method for special vehicles based on self-supervised vision models[J/OL]. Acta Scientiarum Naturalium Universitatis Sunyatseni, 2026, 1-11.
Li Xiying, Wu Hao, Pan Huayan, et al. Fine-grained detection method for special vehicles based on self-supervised vision models[J/OL]. Acta Scientiarum Naturalium Universitatis Sunyatseni, 2026, 1-11. DOI: 10.11714/acta.snus.ZR20260045.
本文结合通用检测与原型库匹配技术,提出了一种基于自监督视觉表征的域适应检测框架。该方法利用DINOv3的预训练特征作匹配与筛选,从而在少样本标注且无需微调的情况下定位并区分目标车辆。首先,构建了一个多视角合成的数据增强前置模块,生成多角度样本以对齐俯视监控场景,弥补跨域场景下的视角缺失。然后,设计了一种类间聚类与原型匹配方法,通过聚类算法挖掘数据模态,构建包含多种形态的真实原型库以解决类内差异大的问题;在此基础上引入全局与局部联合表征,结合图像不同网络层级中语义与纹理的细节,实现对目标车辆的细粒度判别。实验表明,该方法在少样本条件下有效克服了视角域偏移导致的检测能力降低;相比传统方法,该方法提升了检测召回率,显著降低了由非目标车辆引起的误报,验证了该域适应框架在特种车辆监管场景中的有效性与鲁棒性。
This paper proposes a domain adaptation detection framework based on self-supervised visual representations, integrating general object detection with prototype library matching techniques. By leveraging the pre-trained features of DINOv3 for matching and screening, the proposed method can localize and distinguish target vehicles under few-shot conditions without the need for fine-tuning. First, a data augmentation front-end module based on multi-view synthesis is constructed to generate multi-angle samples. This aligns with overhead surveillance scenes, compensating for viewpoint deficiency in cross-domain settings. Subsequently, an inter-class clustering and prototype matching method is designed. By mining data modalities via clustering algorithms, a real-world prototype library encompassing various morphologies is constructed to address the issue of large intra-class variations. Building upon this, a joint global and local representation is introduced, which integrates semantic and textural details from different network layers to achieve fine-grained discrimination of target vehicles. Experimental results demonstrate that under few-shot conditions, the proposed method effectively overcomes the degradation in detection performance caused by viewpoint domain shift. Compared with traditional approaches, it improves the detection recall rate and significantly reduces false positives triggered by non-target vehicles, validating the effectiveness and robustness of the proposed domain adaptation framework in special vehicle monitoring scenarios..
李经宇 , 杨静 , 孔斌 , 等 , 2021 . 基于注意力机制的多尺度车辆行人检测算法 [J]. 光学精密工程 , 29 ( 6 ): 1448 - 1458 .
邱铭凯 , 李熙莹 , 2021 . 用于车辆重识别的基于细节感知的判别特征学习模型 [J]. 中山大学学报(自然科学版) , 60 ( 4 ): 111 - 120 .
薛天朗 , 岳玉涛 , 2025 . 基于文本-视觉多模态学习的交通目标识别与检索 [J/OL]. 计算机应用与软件 . https://www.shcas.net/cn/article/pdf/preview/cf7b460e-894b-4dbf-ba90-bc968e958623.pdf https://www.shcas.net/cn/article/pdf/preview/cf7b460e-894b-4dbf-ba90-bc968e958623.pdf .
Bochkovskiy A , Wang C Y , Liao H M , 2020 . YOLOv4: Optimal speed and accuracy of object detection [PP/OL]. ( 2020-04-23 )[ 2026-02-09 ]. https://doi.org/10.48550/arXiv.2004.10934 https://doi.org/10.48550/arXiv.2004.10934 .
Cheng T , Song L , Ge Y , et al , 2024 . YOLO-World: Real-time open-vocabulary object detection [C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle, WA, USA : 16901 - 16911 .
Cubuk E D , Zoph B , Shlensh J , et al , 2020 . RandAugment: Practical automated data augmentation with a reduced search space [C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops . Seattle, WA, USA : 3008 - 3017 .
Ghiasi G , Cui Y , Srinivas A , et al , 2021 . Simple copy-paste is a strong data augmentation method for instance segmentation [C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville,TN,USA : 2917 - 2927 .
Hu J , Shen L , Sun G , 2018 . Squeeze-and-Excitation Networks [C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City, UT : 7132 - 7141 .
Huang S , Hou Y , Liu L , et al , 2025 . Real-time object detection meets DINOv3 [PP/OL]. ( 2026-01-26 )[ 2026-02-09 ]. https://doi.org/10.48550/arXiv.2509.20787.arXiv: https://doi.org/10.48550/arXiv.2509.20787.arXiv: 2509.20787 .
Li L H , Zhang P , Zhang H , et al , 2022 . Grounded language-image pre-training [C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans, LA, USA : 10955 - 10965 .
Lin T Y , Dollar P , Girshick R , et al , 2017 . Feature pyramid networks for object detection [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu, HI : 936 - 944 .
Liu R , Wu R , Van Hoorick B , et al , 2024 . Zero-1-to-3: Zero-shot one image to 3D object [C]// 2023 IEEE/CVF International Conference on Computer Vision . Paris, France : 9264 - 9275 .
Liu S , Qi L , Qin H , et al , 2018 . Path Aggregation network for instance segmentation [C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City, UT : 8759 - 8768 .
Liu S , Zeng Z , Ren T , et al , 2023 . Grounding DINO: Marrying DINO with grounded pre-training for open-set object detection [PP/OL]. ( 2024-07-19 )[ 2026-02-09 ]. https://doi.org/10.48550/arXiv.2303.05499.arXiv: https://doi.org/10.48550/arXiv.2303.05499.arXiv: 2303.05499 .
Oquab M , Darcet T , Moutakanni T , et al , 2023 . DINOv2: Learning robust visual features without supervision [PP/OL]. ( 2024-02-22 )[ 2026-02-09 ]. https://doi.org/10.48550/arXiv.2304.07193 https://doi.org/10.48550/arXiv.2304.07193 .
Poole B , Jain A , Barron J T , et al , 2022 . DreamFusion: Text-to-3D using 2D diffusion [PP/OL].( 2022-09-29 )[ 2026-02-09 ]. https://doi.org/10.48550/arXiv.2209.14988 https://doi.org/10.48550/arXiv.2209.14988 .
Rombach R , Blattmann A , Lorenz D , et al , 2022 . High-Resolution image synthesis with latent diffusion models [C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans,LA, USA : 10674 - 10685 .
Shi Y , Wang P , Ye J , et al , 2023 . MVDream:Multi-view diffusion for 3D generation [PP/OL].( 2024-04-18 )[ 2026-02-09 ]. https://doi.org/10.48550/arXiv.2308.16512 https://doi.org/10.48550/arXiv.2308.16512 .
Tang J , Ren J , Zhou H , et al , 2023 . DreamGaussian: Generative Gaussian splatting for efficient 3D content creation [PP/OL].( 2024-03-29 )[ 2026-02-09 ]. https://doi.org/10.48550/arXiv.2309.16653.0 https://doi.org/10.48550/arXiv.2309.16653.0
Wei J , Wang Q , Zhao Z , 2023 . YOLO-G: Improved YOLO for cross-domain object detection [J]. Plos One , 18 ( 9 ): e0291241 .
Woo S , Park J , Lee J Y , et al , 2018 . CBAM: Convolutional block attention module [M]// Computer Vision-ECCV 2018 . Cham : Springer International Publishing: 3 - 19 .
Yun S , Han D , Chun S , et al , 2004 . CutMix: Regularization strategy to train strong classifiers with localizable features [C]// 2019 IEEE/CVF International Conference on Computer Vision . Seoul, Korea : 6022 - 6031 .
Zhang H , Cisse M , Dauphin Y N , et al , 2017 . Mixup: Beyond empirical risk minimization [PP/OL ].( 2018-04-27 )[ 2026-02-09 ]. https://doi.org/10.48550/arXiv.1710.09412.0 https://doi.org/10.48550/arXiv.1710.09412.0
Zhang L , Rao A , Agrawala M , 2023 . Adding conditional control to text-to-image diffusion models [C]// 2023 IEEE/CVF International Conference on Computer Vision . Paris, France : 3813 - 3824 .
0
Views
21
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621