Visual Object Tracking: An Evaluation Perspective

Xin Zhao · Shiyu Hu · Xu-Cheng Yin

2025年7月 · Springer Nature

电子书

199

页

评分和评价未经验证了解详情

关于此电子书

This book delves into visual object tracking (VOT), a fundamental aspect of computer vision crucial for replicating human dynamic vision, with applications ranging from self-driving vehicles to surveillance systems. Despite significant strides propelled by deep learning, challenges such as target deformation and motion persist, exposing a disparity between cutting-edge VOT systems and human performance. This observation underscores the necessity to thoroughly scrutinize and enhance evaluation methodologies within VOT research.

Hence, the primary objective of this book is to equip readers with essential insights into dynamic visual tasks encapsulated by VOT. Beginning with the elucidation of task definitions, it integrates interdisciplinary perspectives on evaluation techniques. The book is organized into five parts, tracing the evolution of VOT from perceptual to cognitive intelligence, exploring the experimental frameworks utilized in assessments, analyzing the various agents involved, including tracking algorithms and human visual tracking, and dissecting evaluation mechanisms through both machine–machine and human–machine comparisons. Furthermore, it examines the trend toward crafting more human-like task definitions and comprehensive evaluation frameworks to effectively gauge machine intelligence.

This book serves as a roadmap for researchers aiming to grasp the bottlenecks in VOT capabilities and comprehend the gaps between current methodologies and human abilities, all geared toward advancing algorithmic intelligence. It also delves into the realm of data-centric AI, emphasizing the pivotal role of high-quality datasets and evaluation systems in the age of large language models (LLMs). Such systems are indispensable for training AI models while ensuring their safety and reliability. Utilizing VOT as a case study, the book offers detailed insights into these facets of data-centric AI research. Designed to cater to readers with foundational knowledge in computer vision, it employs diagrams and examples to facilitate comprehension, providing essential groundwork for understanding key technical components.

作者简介

Xin Zhao is Professor, School of Computer and Communication Engineering, University of Science and Technology Beijing. Xin Zhao received his Ph.D. degree from the University of Science and Technology of China (USTC) in 2013. His research interests include video analysis, performance evaluation, and protocol design, especially for object tracking tasks. He has published the international journal and conference papers, such as the IJCV, IEEE TPAMI, IEEE TIP, IEEE TCSVT, CVPR, ICCV, NeurIPS, AAAI, and IJCAI. He has regularly served as Program Committee Members or Peer Reviewers for the following conferences and journals: IJCV, IEEE TPAMI, IEEE TIP, CVPR, ICCV, ECCV, ICML, NeurIPS, ICLR, etc. He is Associate Editor of Pattern Recognition Journal. He has been involved in the organization of the 3rd CVPR Workshop on Vision Datasets Understanding collocated with CVPR2024 and a tutorial collocated with ICIP2024.

Dr. Shiyu Hu is a Research Fellow at Nanyang Technological University, Singapore. She received her PhD degree from the University of Chinese Academy of Sciences in January 2024. She has published over 20 research papers, including 5 first-author or corresponding-author publications in top-tier venues such as IEEE TPAMI, IJCV, and NeurIPS. She has open-sourced several research platforms, which have attracted over 400k visits from 130+ countries. She is also an active contributor to the academic community and has conducted several tutorials at ICIP, ICPR, and ACCV. Her current research focuses on computer vision and AI4Science.

Xu-Cheng Yin is Professor, School of Computer and Communication Engineering, University of Science and Technology Beijing. Xu-Cheng Yin received the B.Sc. and M.Sc. degrees in computer science from the University of Science and Technology Beijing, Beijing, China, in 1999 and 2002, respectively, and the Ph.D. degree in pattern recognition and intelligent systems from the Institute of Automation, Chinese Academy of Sciences, Beijing, in 2006. He is currently Full Professor and Director of Pattern Recognition and Information Retrieval Lab, Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing. He was Visiting Professor with the College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA, USA, for three times (January 2013 to January 2014, July 2014 to August 2014, and July 2016 to September 2016). He has published more than 100 top-tier international journal and conference papers.