Tianle Chen

01 — About

A question of evidence

I'm a Ph.D. student in Computer Science at Boston University, advised by Prof. Deepti Ghadiyaram in the Image & Video Computing Group. My research focuses on multimodal large language models, vision-language reasoning, robustness, uncertainty quantification, interpretability, and generative AI.

My recent work studies how multimodal models integrate conflicting information across different modalities — revealing modality biases and vulnerabilities such as cross-modal attacks. More broadly, I want to understand when models rely on the right evidence, how their attention and uncertainty reflect reasoning, and how training and evaluation can make these systems more faithful and reliable.

Previously I earned B.S. degrees in Mathematics and Computer Science from The Ohio State University, where I worked on weakly supervised segmentation, long-tailed recognition, LiDAR-based 3D detection, wildfire segmentation, and multimodal video synthesis. My work has appeared at ECCV, ICLR, CVPR, and NeurIPS venues. This summer I'll be a Research Intern at Google Research.

Building multimodal AI that knows what evidence to trust.

02 — Research

What I work on

Diagnosing and improving how multimodal models allocate attention, estimate uncertainty, and integrate evidence across modalities.

Multimodal Reasoning & Integration

How do multimodal models decide which evidence to trust? I study whether MLLMs truly integrate complementary modalities — or over-rely on the most dominant or misleading signal.

Multimodal QA
Modality bias
Evidence conflict

Robustness & Cross-Modal Attacks

Seemingly natural cues — on-screen text, captions, or textual prompts — can mislead multimodal models even when the correct evidence is present. I study these attacks and their safety implications.

Adversarial cues
Typographic attacks
Content moderation

Interpretable Multimodal AI

Using attention shifts, attribution, and log-probability diagnostics, I ask whether a model's internal evidence allocation explains when it succeeds, fails, or becomes overconfident.

Attention analysis
Attribution
Effect size

Uncertainty Quantification for VLMs

Multimodal models should know not only what answer to give, but also when the evidence is insufficient, conflicting, or unreliable.

Calibration
Cross-modal uncertainty
Reliability

Generative AI & Object States

I improve text-to-image models' ability to represent fine-grained physical states — open, closed, full, broken, folded, or melted — via synthetic data and targeted fine-tuning.

Text-to-image
Synthetic data
Object states

Computer Vision Foundations

Earlier work building technical breadth: weakly supervised segmentation, long-tailed recognition, LiDAR 3D detection, wildfire segmentation, and topological losses.

Segmentation
3D detection
Augmentation

03 — Publications

Selected papers

* indicates equal contribution. Full list on Google Scholar.

A·V·T

CVPR 2026 Findings Multimodal

Some Modalities Are More Equal Than Others: Decoding and Architecting Multimodal Integration in MLLMs

Tianle Chen, Chaitanya Chakka, Arjun Reddy Akula, Xavier Thomas, Deepti Ghadiyaram

Controlled audio-video-text conflict benchmarks reveal that MLLMs exhibit strong modality biases rather than balanced multimodal reasoning.

Under Review Robustness

A Systematic Study of Cross-Modal Typographic Attacks on Audio-Visual Reasoning

Tianle Chen, Deepti Ghadiyaram

Semantic distractors injected through speech, on-screen text, and prompts can mislead audio-visual models even when correct evidence is present.

In Submission Uncertainty

VLM-UQBench: A Benchmark for Modality-Specific and Cross-Modality Uncertainties in Vision-Language Models

Chenyu Wang, Tianle Chen, H. M. Sabbir Ahmad, Kayhan Batmanghelich, Wenchao Li

A benchmark of human-labeled samples and synthetic perturbations to evaluate modality-specific uncertainty, calibration, and reliability in VLMs.

◐

CVPR 2025 Workshop Generative

Improving Physical Object State Representation in Text-to-Image Generative Systems

Tianle Chen, Chaitanya Chakka, Deepti Ghadiyaram

An automatic synthetic-data pipeline and fine-tuning improve GenAI-Bench by 8.2% and two new object-state datasets by 17% / 24%.

⚡

CVPR 2025 Multimodal

EEE-Bench: A Comprehensive Multimodal Electrical and Electronics Engineering Benchmark

Ming Li, Jike Zhong, Tianle Chen, Konstantinos Psounis

A multimodal benchmark probing reasoning over electrical and electronics engineering problems.

ICLR 2024 3D Vision

Pre-training LiDAR-based 3D Object Detectors through Colorization

Tai-Yu Pan, Chenyang Ma, Tianle Chen*, Cheng Perng Phoo, Katie Z. Luo, Yurong You, Mark Campbell, Kilian Q. Weinberger, Bharath Hariharan, Wei-Lun Chao

Colorization as a pretext task for self-supervised pre-training of LiDAR 3D detectors in autonomous driving.

arXiv

NeurIPS 2023 Workshop Segmentation

Segment Anything Model (SAM) Enhances Pseudo-Labels for Weakly Supervised Semantic Segmentation

Tianle Chen*, Zheda Mai*, Ruiwen Li, Wei-Lun Chao

SAM masks improve pseudo-label quality, narrowing the gap between image-level labels and dense masks.

arXiv Code

ECCV 2022 Segmentation

Learning with Free Object Segments for Long-Tailed Instance Segmentation

Cheng Zhang, Tai-Yu Pan, Tianle Chen*, Jike Zhong, Wenjin Fu, Wei-Lun Chao

Mines free object segments from web images and ranks high-quality masks to improve long-tailed instance segmentation.

Paper

🔥

AIAA SciTech 2026 Segmentation

Centralized Copy-Paste: Enhanced Data Augmentation for Wildland Fire Semantic Segmentation

Joon Tai Kim, Tianle Chen, Ziyu Dong, Nishanth Kunchala, Alexander Guller, Daniel Ospina Acero, Roger Williams, Mrinal Kumar

Extracts, centralizes, and pastes refined fire clusters to combat label scarcity and class imbalance in drone-based fire segmentation.

04 — Experience

Where I've worked

May – Aug 2026

Research Intern · Google Research

San Francisco Bay Area, CA

Robust multimodal LLMs — multimodal reasoning, evidence allocation, counterfactual evidence construction, and model behavior under conflicting or misleading cues. Exploring attention-aware supervision and self-distillation for better multimodal grounding.
Sep 2024 – Present

Research Assistant · BU Image & Video Computing

Boston, MA · Advisor: Dr. Deepti Ghadiyaram

Multimodal LLMs, vision-language reasoning, robustness, uncertainty, and generative AI: cross-modal conflict benchmarks, adversarial attacks, object-state generation, and modality-specific uncertainty.
Sep 2021 – May 2024

Research Assistant · OSU — ML as the Basis Lab

Columbus, OH · Advisor: Dr. Wei-Lun Chao

Weakly supervised segmentation with SAM, LiDAR-based 3D detection pre-training, long-tailed instance segmentation, and sensor-adaptation pipelines for 3D detection.
Aug 2023 – May 2024

Research Assistant · OSU Autonomy & Complex Systems Lab

Columbus, OH · Advisor: Dr. Mrinal Kumar

Semantic segmentation and data augmentation for wildland fire imagery, improving drone-based fire-monitoring perception.
Aug 2023 – May 2024

Researcher · OSU CS — ReelMaker & Talk-to-TikTok

Columbus, OH · Advisor: Dr. Arnab Nandi

Multimodal short-form video synthesis from papers and talks using PDF parsing, vision, speech-to-text, and LLMs. Most Innovative Project, HackOHI/O.

05 — Teaching

Teaching & mentoring

Teaching Fellow — CS 455: Computer Networks

Boston University · Spring 2025

Delivered lectures, led labs, and mentored students on routing, transport protocols, latency, and congestion control.

Teaching Assistant — CSE 1223 & CSE 2321

The Ohio State University · 2020–2024

Intro to Programming in Java; Foundations I: Discrete Structures.

I enjoy connecting formal concepts with intuitive examples, visual explanations, and hands-on implementation.

06 — Service & Honors

Community

Reviewer

CVPRICCV ICLRAAAI WACVCOLM TCSVTIJCV

Student Academic Chair — BU AI Research Initiative

Oct 2024 – Aug 2025

Organized AIR Seminar programming and academic events.

Honors & Awards

Dean's List (all attended semesters) · HackOHI/O 11 Most Innovative Project Award.

07 — Toolkit

Skills

Research Areas

Multimodal LLMs · Vision-Language Reasoning · Robustness & Safety · Uncertainty Quantification · Interpretability · Generative AI · Semantic Segmentation · 3D Detection

Tools

PyTorch · TensorFlow · Hugging Face Transformers · OpenCLIP/CLIP · Qwen-VL/Qwen-Omni · LLaVA-style models · Gemini APIs · WandB · Docker · Git · AWS S3

Languages

Python · MATLAB · Java · C · R · SQL · LaTeX

Get in touch

Let's talk about multimodal AI.

Open to research collaborations, reviewing, and conversations about trustworthy multimodal systems.

tianle@bu.edu Scholar GitHub CV