Visual Studio 2008 Tutorial Video

Detection of Videos with Audio-Visual Inconsistency for Video Representation Learning

Abstract: Audio-visual alignment using video data is a conventional approach for the self-supervision of multi-modal representation learning. Nevertheless, the presence of background music, external ...

IEEE

Visual Evidence-aware for Object Hallucinations Rectification in LLM-based Video Captioning

Abstract: Recent neural models for video captioning are typically built using a framework that combines a pre-trained visual encoder with a large language model(LLM) decoder. However, large language ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Detection of Videos with Audio-Visual Inconsistency for Video Representation Learning

Visual Evidence-aware for Object Hallucinations Rectification in LLM-based Video Captioning

Trending now