VLM Robustness Benchmark | Dhwanil R. Chauhan

Vision-language models (VLMs) are increasingly deployed in systems that must interpret both visual and linguistic inputs simultaneously — yet their robustness under the degraded conditions common in real-world environments remains largely uncharacterized. A model that performs well on clean benchmarks may fail unpredictably when camera quality drops, lighting shifts, or operator language is informal or domain-specific.

This benchmark provides a systematic evaluation framework for 20 VLMs under controlled simultaneous visual and linguistic corruption conditions. Key contributions include a novel text corruption module that simulates realistic degradation in operator-generated language, and a structured evaluation pipeline that enables fine-grained analysis of failure modes across corruption types and model architectures.

My role is lead researcher — I conceived the benchmark, designed the evaluation framework, built the text corruption module, and am running the full evaluation across all 20 models.

This work is independent research, developed in collaboration with a co-architect based in San Jose. It addresses a fundamental reliability question that applies across industrial AI deployment, autonomous systems, and any multimodal pipeline operating in real-world conditions.

Status: In preparation — targeting IEEE TPAMI / IJCV.

(Chauhan & others, 2026)

References

2026

TPAMI
VLM Robustness Benchmark Under Simultaneous Multimodal Degradation

Dhwanil Chauhan and others

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026

In Preparation — Targeting IEEE TPAMI / IJCV

Abs Bib

The robustness of vision-language models under simultaneous visual and linguistic degradation — a common failure condition in real-world deployments — remains largely unstudied. We introduce a systematic benchmark evaluating 20 VLMs across controlled simultaneous multimodal corruption conditions, featuring a novel text corruption module and structured evaluation pipeline designed to characterize failure modes relevant to safety-critical and industrial deployment scenarios.
@article{chauhan2026vlmbenchmark, title = {{VLM} Robustness Benchmark Under Simultaneous Multimodal Degradation}, author = {Chauhan, Dhwanil and others}, journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence}, year = {2026}, url = {https://dhwanil832.github.io/publications/}, note = {In Preparation — Targeting IEEE TPAMI / IJCV}, }