Air pollution causes over 1.6 million premature deaths annually in India. Yet, decision-makers face persistent barriers in turning diverse tabular data on air pollution, population, and funding into actionable insights. Existing tools demand technical expertise, offer shallow visualizations, or rely on static dashboards, leaving policy questions unresolved.
Large language models (LLMs) offer a potential alternative by translating natural-language questions into structured, multi-dataset analyses; however, their reliability for such domain-specific tasks remains unknown.
We present VayuBench, to our knowledge, the first executable benchmark for air-quality analytics. It comprises 5,000 natural-language queries paired with verified Python code across seven query categories: spatial, temporal, spatio-temporal, population-based, area-based, funding-related and specific pattern queries over multiple real-world datasets.
We evaluate 13 open-source LLMs under a unified, schema-aware protocol. While Qwen3-Coder-30B attains the strongest performance, frequent column-name and variable errors highlight risks for smaller models.
To bridge evaluation with practice, we deploy VayuChat, an interactive assistant that delivers real-time, code-backed analysis for Indian policymakers and citizens. Together, VayuBench and VayuChat demonstrate a reproducible pathway from benchmark to verified execution to deployment, establishing the foundations for trustworthy LLM-driven decision support in environmental monitoring.
Distribution of 5,000 queries across categories: Spatial (48.82%), Spatio-Temporal (24.56%), Temporal (12.15%), Funding (4.42%), Population-Based (3.82%), Area-Based (3.72%), and Specific Pattern (2.52%).
VayuChat provides an intuitive interface with AI model selection, quick prompts, natural language query processing, and automated visualizations for air quality analytics.
VayuBench and VayuChat address a critical gap in environmental analytics by making air quality data accessible to non-technical stakeholders. Our benchmark establishes rigorous evaluation standards for LLMs in domain-specific analytics, while our deployed system demonstrates the practical feasibility of LLM-powered decision support for policy-critical applications.
This work has been featured in Indian Express, Ahmedabad Mirror, and other leading environmental technology publications.
@inproceedings{acharya2025vayubench,
title={VayuBench and VayuChat: Executable Benchmarking and Deployment of LLMs for Multi-Dataset Air Quality Analytics},
author={Acharya, Vedant and Pisharodi, Abhay and Pasi, Ratnesh and Mondal, Rishabh and Batra, Nipun},
booktitle={CODS 2025},
year={2025}
}