This paper aims to investigate the problem of gaze object prediction in single images. We propose an application-friendly network based on CLIP for gaze object prediction. To avoid domain bias, we utilize a shallow feature adapter that transfers pre-trained features to target-oriented ones. Secondly, we introduce a pooling attention block to exploit the joint representation of multimodal elements, reducing gaze point deviation. Additionally, we introduce a loss that measures the prediction quality by comparing the distribution difference between the model's predictions heatmaps and the ground truth. Extensive experiments demonstrate the superior performance of our model compared to previous models. We will provide the method code at: https://github.com/fadaishaitaiyang/CCLIP.git.
Wed 30 OctDisplayed time zone: Pacific Time (US & Canada) change
16:00 - 17:30 | AsynchronousPapers at On Discord For papers that are not scheduled to be presented during any session. Some of the authors will be available for Q&A at a Discord channel of the conference. Please check your welcome email for the Discord link. | ||
16:00 15mTalk | Investigating Creation Perspectives and Icon Placement Preferences for On-Body Menus in Virtual Reality Papers Xiang Li University of Cambridge, Wei He The Hong Kong University of Science and Technology (Guangzhou), Shan Jin The Hong Kong University of Science and Technology (Guangzhou), Jan Gugenheimer TU Darmstadt, Germany, Pan Hui The Hong Kong University of Science and Technology, Hai-Ning Liang Xi’an Jiaotong-Liverpool University, Per Ola Kristensson University of Cambridge DOI | ||
16:15 15mTalk | A Virtual Reality Approach to Overcome Glossophobia among University Students Papers Aarav Balachandran Indraprastha Institute of Information Technology Delhi,, Prajna Vohra IIIT Delhi, Anmol Srivastava Indraprastha Institute of Information Technology DOI | ||
16:30 15mTalk | Evaluating Typing Performance in Different Mixed Reality Manifestations using Physiological Features Papers Francesco Chiossi LMU Munich, Yassmine El Khaoudi LMU Munich, Changkun Ou LMU Munich, Ludwig Sidenmark University of Toronto, Abdelrahman Zaky University of Konstanz, Tiare Feuchtner University of Konstanz; Aarhus University, Sven Mayer LMU Munich DOI Media Attached | ||
16:45 15mTalk | Towards Adapting CLIP for Gaze Object Prediction Papers DOI | ||
17:00 15mTalk | Experimental Analysis of Freehand Multi-object Selection Techniques in Virtual Reality Head-Mounted Displays Honorable Mention Papers Rongkai Shi The Hong Kong University of Science and Technology (Guangzhou), Yushi Wei The Hong Kong University of Science and Technology (Guangzhou), Xuning Hu Xi'an Jiaotong-Liverpool University, Yu Liu National University of Singapore, Yong Yue Xi’an Jiaotong-Liverpool University, Lingyun Yu Xi'an Jiaotong-Liverpool University, Hai-Ning Liang Xi’an Jiaotong-Liverpool University DOI |