大连理工大学主页平台管理系统张平平 Complementary and Contrastive Learning for Audio-Visual Segmentation Home

Current position: Home >> Scientific Research >> Paper Publications

Multi-Modal Understanding and Generation for Object Tracking

Release Time:2024-12-22 Hits:

Indexed by: Journal Papers

Document Code: 471432

Date of Publication: 2025-05-26

Journal: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Volume: 35

Issue: 5

Page Number: 4384-4396

ISSN: 1051-8215

Key Words: BLIP; image-to-text generation; multi-modal understanding; Vision-language tracking

Prev One:MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection

Next One:Asymmetric Mask Scheme for Self-supervised Real Image Denoising