Papers
Research papers from arXiv and related sources
Real-time Win Probability and Latent Player Ability via STATS X in Team Sports
This study proposes a statistically grounded framework for real-time win probability evaluation and player assessment in score-based team sports, based on minute-by-minute cumulative box-score data...
Yasutaka Shimizu, Atsushi Yamanobe
Pyramid MoA: A Probabilistic Framework for Cost-Optimized Anytime Inference
Large Language Models (LLMs) face a persistent trade-off between inference cost and reasoning capability. While "Oracle" models (e.g., Llama-3-70B) achieve state-of-the-art accuracy, they are prohi...
Arindam Khaled
Conversational AI for Automated Patient Questionnaire Completion: Development Insights and Design Principles
Collecting patient-reported outcome measures (PROMs) is essential for clinical care and research, yet traditional form-based approaches are often tedious for patients and burdensome for clinicians....
David Fraile Navarro, Mor Peleg
Test-Time Computing for Referring Multimodal Large Language Models
We propose ControlMLLM++, a novel test-time adaptation framework that injects learnable visual prompts into frozen multimodal large language models (MLLMs) to enable fine-grained region-based visua...
Mingrui Wu, Hao Chen, Jiayi Ji, Xiaoshuai Sun, Zhiyuan Liu, Liujuan Cao, Ming-Ming Cheng, Rongron...
VIRAASAT: Traversing Novel Paths for Indian Cultural Reasoning
Large Language Models (LLMs) have made significant progress in reasoning tasks across various domains such as mathematics and coding. However, their performance deteriorates in tasks requiring rich...
Harshul Raj Surana, Arijit Maji, Aryan Vats, Akash Ghosh, Sriparna Saha, Amit Sheth
SPQ: An Ensemble Technique for Large Language Model Compression
This study presents an ensemble technique, SPQ (SVD-Pruning-Quantization), for large language model (LLM) compression that combines variance-retained singular value decomposition (SVD), activation-...
Jiamin Yao, Eren Gultepe
AI-Wrapped: Participatory, Privacy-Preserving Measurement of Longitudinal LLM Use In-the-Wild
Alignment research on large language models (LLMs) increasingly depends on understanding how these systems are used in everyday contexts. yet naturalistic interaction data is difficult to access du...
Cathy Mengying Fang, Sheer Karny, Chayapatr Archiwaranguprok, Yasith Samaradivakara, Pat Pataranu...
How Fast Can I Run My VLA? Demystifying VLA Inference Performance with VLA-Perf
Vision-Language-Action (VLA) models have recently demonstrated impressive capabilities across various embodied AI tasks. While deploying VLA models on real-world robots imposes strict real-time inf...
Wenqi Jiang, Jason Clemons, Karu Sankaralingam, Christos Kozyrakis
"How Do I ...?": Procedural Questions Predominate Student-LLM Chatbot Conversations
Providing scaffolding through educational chatbots built on Large Language Models (LLM) has potential risks and benefits that remain an open area of research. When students navigate impasses, they ...
Alexandra Neagu, Marcus Messer, Peter Johnson, Rhodri Nelson
Quantum Maximum Likelihood Prediction via Hilbert Space Embeddings
Recent works have proposed various explanations for the ability of modern large language models (LLMs) to perform in-context prediction. We propose an alternative conceptual viewpoint from an infor...
Sreejith Sreekumar, Nir Weinberger
Statistical Confidence in Functional Correctness: An Approach for AI Product Functional Correctness Evaluation
The quality assessment of Artificial Intelligence (AI) systems is a fundamental challenge due to their inherently probabilistic nature. Standards such as ISO/IEC 25059 provide a quality model, but ...
Wallace Albertini, Marina Condé Araújo, Júlia Condé Araújo, Antonio Pedro Santos Alves, Marcos Ka...
Qualitative Coding Analysis through Open-Source Large Language Models: A User Study and Design Recommendations
Qualitative data analysis is labor-intensive, yet the privacy risks associated with commercial Large Language Models (LLMs) often preclude their use in sensitive research. To address this, we intro...
Tung T. Ngo, Dai Nguyen Van, Anh-Minh Nguyen, Phuong-Anh Do, Anh Nguyen-Quoc
Vichara: Appellate Judgment Prediction and Explanation for the Indian Judicial System
In jurisdictions like India, where courts face an extensive backlog of cases, artificial intelligence offers transformative potential for legal judgment prediction. A critical subset of this backlo...
Pavithra PM Nair, Preethu Rose Anish
VeriSoftBench: Repository-Scale Formal Verification Benchmarks for Lean
Large language models have achieved striking results in interactive theorem proving, particularly in Lean. However, most benchmarks for LLM-based proof automation are drawn from mathematics in the ...
Yutong Xin, Qiaochu Chen, Greg Durrett, Işil Dillig
ReqElicitGym: An Evaluation Environment for Interview Competence in Conversational Requirements Elicitation
With the rapid improvement of LLMs' coding capabilities, the bottleneck of LLM-based automated software development is shifting from generating correct code to eliciting users' requirements. Despit...
Dongming Jin, Zhi Jin, Zheng Fang, Linyu Li, XiaoTian Yang, Yuanpeng He, Xiaohong Chen
FeatureBleed: Inferring Private Enriched Attributes From Sparsity-Optimized AI Accelerators
Backend enrichment is now widely deployed in sensitive domains such as product recommendation pipelines, healthcare, and finance, where models are trained on confidential data and retrieve private ...
Darsh Asher, Farshad Dizani, Joshua Kalyanapu, Rosario Cammarota, Aydin Aysu, Samira Mirbagher Aj...
On the Semantic and Syntactic Information Encoded in Proto-Tokens for One-Step Text Reconstruction
Autoregressive large language models (LLMs) generate text token-by-token, requiring n forward passes to produce a sequence of length n. Recent work, Exploring the Latent Capacity of LLMs for One-St...
Ivan Bondarenko, Egor Palkin, Fedor Tikunov
Analyzing and Improving Chain-of-Thought Monitorability Through Information Theory
Chain-of-thought (CoT) monitors are LLM-based systems that analyze reasoning traces to detect when outputs may exhibit attributes of interest, such as test-hacking behavior during code generation. ...
Usman Anwar, Tim Bakker, Dana Kianfar, Cristina Pinneri, Christos Louizos
Context-Aware Mapping of 2D Drawing Annotations to 3D CAD Features Using LLM-Assisted Reasoning for Manufacturing Automation
Manufacturing automation in process planning, inspection planning, and digital-thread integration depends on a unified specification that binds the geometric features of a 3D CAD model to the geome...
Muhammad Tayyab Khana, Lequn Chen, Wenhe Feng, Seung Ki Moon
A Probabilistic Framework for LLM-Based Model Discovery
Automated methods for discovering mechanistic simulator models from observational data offer a promising path toward accelerating scientific progress. Such methods often take the form of agentic-st...
Stefan Wahl, Raphaela Schenk, Ali Farnoud, Jakob H. Macke, Daniel Gedon