Susan Liang

Hi, there! I am a forth-year Ph.D. student in the Computer Science Department at the University of Rochester. My advisor is Prof. Chenliang Xu. Before joining Prof. Xu's lab, I got my bachelor degree of Computer Science at the University of Chinese Academy of Sciences. I was lucky to study and research under the supervision of Prof. Shiguang Shan. I joined Prof. Shan's group in 2020 and had worked there for one and a half years, enjoying an exciting research experience. I also worked closely with Prof. Ming-Hsuan Yang.

My research interests lie in Computer Vision and Deep Learning, especially vision-language models/agents, multi-modal learning, spatial audio generation, and audio-visual synthesis.

✉️ I am actively seeking US-based internships for Summer 2026 and China-based full-time opportunities for 2027. Feel free to reach out if you’re interested. Email address and phone number are available in the CV.

Fun Fact: my Chinese name is 梁苏叁 (Liang, Su, San), so Susan is just the *pinyin* of my Chinese name. Commonly, people think I am female when they see my English Name. There is an interesting clip about the pronunciation of Susan in the film Johnny English Reborn. :D

Susan Liang profile picture

Publications

AdaTurn Teaser

AdaTurn: Scaling Visual Search Agents via Turn-Aware Dynamic Reasoning

Susan Liang, Chao Huang, Chenliang Xu

arXiv preprint, 2026

Coming soon...
OmniJudge Teaser

Omni-Judge: Can Omni-LLMs Serve as Human-Aligned Judges for Text-Conditioned Audio-Video Generation?

Susan Liang, Chao Huang, Filippos Bellos, Yolo Yunlong Tang, Qianxiang Shen, Jing Bi, Luchuan Song, Zeliang Zhang, Jason Corso, Chenliang Xu

arXiv preprint, 2026

Async VLM Teaser
CVPR 2026

Asynchronous Temporal Modeling with Two-Agent Framework for Streaming Dense Video Captioning

Yolo Yunlong Tang, Chao Huang, Susan Liang, Jing Bi, Yicheng Wang, Daiki Shimada, Chenliang Xu

The IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2026

Coming soon...
TDMM-LM Teaser
CVPR 2026

Bridging Facial Understanding and Animation via Language Models

Luchuan Song, Pinxin Liu, Haiyang Liu, Zhenchao Jin, Yolo Yunlong Tang, Zichong Xu, Susan Liang, Jing Bi, Jason J. Corso, Chenliang Xu

The IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2026

When Teaser
CVPR 2026

When to Think and When to Look: Uncertainty-Guided Lookback

Jing Bi, Filippos Bellos, Junjia Guo, Yayuan Li, Chao Huang, Yolo Y. Tang, Luchuan Song, Susan Liang, Zhongfei Mark Zhang, Jason J. Corso, Chenliang Xu

The IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2026

CATV Teaser
AAAI 2026 🏆 Best Demonstration Award Runner-up

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

Yunlong Tang, Jing Bi, Chao Huang, Susan Liang, Daiki Shimada, Hang Hua, Yunzhong Xiao, Yizhi Song, Pinxin Liu, Mingqian Feng, Junjia Guo, Zhuo Liu, Luchuan Song, Ali Vosoughi, Jinxi He, Liu He, Zeliang Zhang, Jiebo Luo, Chenliang Xu

The Association for the Advancement of Artificial Intelligence (AAAI), 2026

Reason Matters Teaser

Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning

Jing Bi, Susan Liang, Xiaofei Zhou, Pinxin Liu, Junjia Guo, Yunlong Tang, Luchuan Song, Chao Huang, Ali Vosoughi, Guangyu Sun, Jinxi He, Jiarui Wu, Shu Yang, Daoan Zhang, Chen Chen, Lianggong Bruce Wen, Zhang Liu, Jiebo Luo, Chenliang Xu

arXiv preprint, 2025

VERIFY Teaser

VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity

Jing Bi, Junjia Guo, Susan Liang, Guangyu Sun, Luchuan Song, Yunlong Tang, Jinxi He, Jiarui Wu, Ali Vosoughi, Chen Chen, Chenliang Xu

arXiv preprint, 2025

IJCV Teaser
IJCV 2025

High-Quality Sound Separation Across Diverse Categories via Visually-Guided Generative Modeling

Chao Huang, Susan Liang, Yapeng Tian, Anurag Kumar, Chenliang Xu

International Journal of Computer Vision, 2025

ZeroSep Teaser
NeurIPS 2025

ZeroSep: Separate Anything in Audio with Zero Training

Chao Huang, Yuesheng Ma, Junxuan Huang, Susan Liang, Yunlong Tang, Jing Bi, Wenqiang Liu, Nima Mesgarani, Chenliang Xu

The Thirty-Ninth Annual Conference on Neural Information Processing Systems, Dec. 2025

Harness ViT Teaser
NeurIPS 2025

Harnessing the Computation Redundancy in ViTs to Boost Adversarial Transferability

Jiani Liu*, Zhiyuan Wang*, Zeliang Zhang*, Chao Huang, Susan Liang, Yunlong Tang, Chenliang Xu. (* indicates equal contribution)

The Thirty-Ninth Annual Conference on Neural Information Processing Systems, Dec. 2025

MM Perspective Teaser
NeurIPS 2025 D&B Track

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness

Yunlong Tang, Pinxin Liu, Mingqian Feng, ..., Susan Liang, ..., Luchuan Song, Zeliang Zhang, Chenliang Xu.

The Thirty-Ninth Annual Conference on Neural Information Processing Systems, Dec. 2025

Video LLM Post-Training Teaser
🔥🔥🔥 HOT

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

Yunlong Tang, Jing Bi, Pinxin Liu, ..., Susan Liang, ..., Han Liu, Jiebo Luo, Chenliang Xu.

arXiv preprint, 2025

PI-AVAS Teaser
ICCV 2025

π-AVAS: Can Physics-Integrated Audio-Visual Modeling Boost Neural Acoustic Synthesis?

Susan Liang, Chao Huang, Yunlong Tang, Zeliang Zhang, Chenliang Xu.

International Conference on Computer Vision, Oct. 2025

BinauralFlow Teaser
ICML 2025

BinauralFlow: A Causal and Streamable Approach for High-Quality Binaural Speech Synthesis with Flow Matching Models

Susan Liang, Dejan Markovic, Israel D. Gebru, Steven Krenn, Todd Keebler, Jacob Sandakly, Frank Yu, Samuel Hassel, Chenliang Xu, Alexander Richard.

Forty-second International Conference on Machine Learning, Jul. 2025

VIDCOMPOSITION Teaser
CVPR 2025

VIDCOMPOSITION: Can MLLMs Analyze Compositions in Compiled Videos?

Yunlong Tang, Junjia Guo, Hang Hua, Susan Liang, Mingqian Feng, Xinyang Li, Rui Mao, Chao Huang, Jing Bi, Zeliang Zhang, Pooyan Fazli, Chenliang Xu.

The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2025

AV Attack Teaser
ICLR 2025

Rethinking Audio-Visual Adversarial Vulnerability from Temporal and Modality Perspectives

Zeliang Zhang*, Susan Liang*, Daiki Shimada, Chenliang Xu. (* indicates equal contribution)

The Thirteenth International Conference on Learning Representations, Apr. 2025

AI Animation Survey Teaser

Generative AI for Cel-Animation: A Survey

Yunlong Tang, Junjia Guo, Pinxin Liu, Zhiyuan Wang, Hang Hua, Jia-Xing Zhong, Yunzhong Xiao, Chao Huang, Luchuan Song, Susan Liang, and Yizhi Song, Liu He, Jing Bi, Mingqian Feng, Xinyang Li, Zeliang Zhang, Chenliang Xu.

arXiv preprint

Train Bias Teaser

Will the Inclusion of Generated Data Amplify Bias Across Generations in Future Image Classification Models?

Zeliang Zhang, Xin Liang, Mingqian Feng, Susan Liang, Chenliang Xu.

arXiv preprint

Scaling Concept Teaser

Scaling Concept with Text-Guided Diffusion Models

Chao Huang, Susan Liang, Yunlong Tang, Yapeng Tian, Anurag Kumar, Chenliang Xu.

arXiv preprint

DAVIS Teaser
ACCV 2024 🏆 Best Paper Honorable Mention

High-Quality Visually-Guided Sound Separation from Diverse Categories

Chao Huang, Susan Liang, Yapeng Tian, Anurag Kumar, Chenliang Xu.

17th Asian Conference on Computer Vision, Dec. 2024

AVEdit Teaser
ACCV 2024

Language-Guided Joint Audio-Visual Editing Via One-Shot Adaptation

Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu.

17th Asian Conference on Computer Vision, Dec. 2024

L2T Teaser
CVPR 2024

Learning to Transform Dynamically for Better Adversarial Transferability

Rongyi Zhu*, Zeliang Zhang*, Susan Liang, Zhuo Liu, Chenliang Xu. (* indicates equal contribution)

Conference on Computer Vision and Pattern Recognition, Jun. 2024

Text Attack Teaser
EACL 2024

Random Smooth-based Certified Defense against Text Adversarial Attack

Zeliang Zhang, Wei Yao, Susan Liang, Chenliang Xu.

Conference of the European Chapter of the Association for Computational Linguistics, Mar. 2024

Video LLM Survey Teaser
TCSVT 🔥🔥🔥 HOT

Video Understanding with Large Language Models: A Survey

Yunlong Tang*, Jing Bi*, Siting Xu*, Luchuan Song*, Susan Liang, Teng Wang, Daoan Zhang, Jie An, Jingyang Lin, Rongyi Zhu, Ali Vosoughi, Chao Huang, Zeliang Zhang, Feng Zheng, Jianguo Zhang, Ping Luo, Jiebo Luo, Chenliang Xu. (* indicates equal contribution)

IEEE Transactions on Circuits and Systems for Video Technology

AV-NeRF Teaser
NeurIPS 2023

AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis

Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu.

Conference on Neural Information Processing Systems, Dec. 2023

NACF Teaser
ICCV Workshop 2023

Neural Acoustic Context Field: Rendering Realistic Room Impulse Response With Neural Fields

Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu.

International Conference on Computer Vision Workshops, Oct. 2023

UniCon Teaser
ACM MM 2021 Oral

UniCon: Unified Context Network for Robust Active Speaker Detection

Yuanhang Zhang∗, Susan Liang∗, Shuang Yang, Xiao Liu, Zhongqin Wu, Shiguang Shan, Xilin Chen. (* indicates equal contribution)

ACM International Conference on Multimedia, Oct. 2021

Education

University of Rochester Logo

University of Rochester, NY, USA

Ph.D. Computer Science

Sept. 2022 – Present

University of Chinese Academy of Sciences Logo

University of Chinese Academy of Sciences, Beijing, China

B.Eng. Computer Science

Sept. 2018 – Jul. 2022

Research Experiences

Baidu Logo

Qianfan Algorithm Team, Baidu AI Cloud, Beijing, China

Research Intern

May 2025 – Aug. 2025

Advisors: Daxiang Dong

Meta Logo

Reality Labs Research, Meta, PA, USA

Research Scientist Intern

May 2024 – Aug. 2024

Advisors: Dr. Dejan Markovic, Dr. Israel D. Gebru, and Dr. Alexander Richard

UC Merced Logo

Vision and Learning Lab, University of California - Merced, CA, USA

Research Intern

Sept. 2021 – Mar. 2022

Advisors: Prof. Ming-Hsuan Yang and Dr. Taihong Xiao

Tsinghua University Logo

Institute for AI Industry Research, Tsinghua University, Beijing, China

Research Intern

Jun. 2021 – Aug. 2021

Advisors: Dr. Yizhi Wang and Dr. Hao Xu

UCAS Logo

Visual Information Processing and Learning Group, Chinese Academy of Sciences, Beijing, China

Research Assistant

Feb. 2020 – Apr. 2021

Advisors: Prof. Shiguang Shan and Dr. Shuang Yang