Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

Real-time Win Probability and Latent Player Ability via STATS X in Team Sports

This study proposes a statistically grounded framework for real-time win probability evaluation and player assessment in score-based team sports, based on minute-by-minute cumulative box-score data...

Yasutaka Shimizu, Atsushi Yamanobe

2602.19513 2026-02-23
AI LLM

Pyramid MoA: A Probabilistic Framework for Cost-Optimized Anytime Inference

Large Language Models (LLMs) face a persistent trade-off between inference cost and reasoning capability. While "Oracle" models (e.g., Llama-3-70B) achieve state-of-the-art accuracy, they are prohi...

Arindam Khaled

2602.19509 2026-02-23
AI LLM

Conversational AI for Automated Patient Questionnaire Completion: Development Insights and Design Principles

Collecting patient-reported outcome measures (PROMs) is essential for clinical care and research, yet traditional form-based approaches are often tedious for patients and burdensome for clinicians....

David Fraile Navarro, Mor Peleg

2602.19507 2026-02-23
AI LLM

Test-Time Computing for Referring Multimodal Large Language Models

We propose ControlMLLM++, a novel test-time adaptation framework that injects learnable visual prompts into frozen multimodal large language models (MLLMs) to enable fine-grained region-based visua...

Mingrui Wu, Hao Chen, Jiayi Ji, Xiaoshuai Sun, Zhiyuan Liu, Liujuan Cao, Ming-Ming Cheng, Rongron...

2602.19505 2026-02-23
AI LLM

VIRAASAT: Traversing Novel Paths for Indian Cultural Reasoning

Large Language Models (LLMs) have made significant progress in reasoning tasks across various domains such as mathematics and coding. However, their performance deteriorates in tasks requiring rich...

Harshul Raj Surana, Arijit Maji, Aryan Vats, Akash Ghosh, Sriparna Saha, Amit Sheth

2602.18429 2026-02-20
AI LLM

SPQ: An Ensemble Technique for Large Language Model Compression

This study presents an ensemble technique, SPQ (SVD-Pruning-Quantization), for large language model (LLM) compression that combines variance-retained singular value decomposition (SVD), activation-...

Jiamin Yao, Eren Gultepe

2602.18420 2026-02-20
AI LLM

AI-Wrapped: Participatory, Privacy-Preserving Measurement of Longitudinal LLM Use In-the-Wild

Alignment research on large language models (LLMs) increasingly depends on understanding how these systems are used in everyday contexts. yet naturalistic interaction data is difficult to access du...

Cathy Mengying Fang, Sheer Karny, Chayapatr Archiwaranguprok, Yasith Samaradivakara, Pat Pataranu...

2602.18415 2026-02-20
AI LLM

How Fast Can I Run My VLA? Demystifying VLA Inference Performance with VLA-Perf

Vision-Language-Action (VLA) models have recently demonstrated impressive capabilities across various embodied AI tasks. While deploying VLA models on real-world robots imposes strict real-time inf...

Wenqi Jiang, Jason Clemons, Karu Sankaralingam, Christos Kozyrakis

2602.18397 2026-02-20
AI LLM

"How Do I ...?": Procedural Questions Predominate Student-LLM Chatbot Conversations

Providing scaffolding through educational chatbots built on Large Language Models (LLM) has potential risks and benefits that remain an open area of research. When students navigate impasses, they ...

Alexandra Neagu, Marcus Messer, Peter Johnson, Rhodri Nelson

2602.18372 2026-02-20
AI LLM

Quantum Maximum Likelihood Prediction via Hilbert Space Embeddings

Recent works have proposed various explanations for the ability of modern large language models (LLMs) to perform in-context prediction. We propose an alternative conceptual viewpoint from an infor...

Sreejith Sreekumar, Nir Weinberger

2602.18364 2026-02-20
AI LLM

Statistical Confidence in Functional Correctness: An Approach for AI Product Functional Correctness Evaluation

The quality assessment of Artificial Intelligence (AI) systems is a fundamental challenge due to their inherently probabilistic nature. Standards such as ISO/IEC 25059 provide a quality model, but ...

Wallace Albertini, Marina Condé Araújo, Júlia Condé Araújo, Antonio Pedro Santos Alves, Marcos Ka...

2602.18357 2026-02-20
AI LLM

Qualitative Coding Analysis through Open-Source Large Language Models: A User Study and Design Recommendations

Qualitative data analysis is labor-intensive, yet the privacy risks associated with commercial Large Language Models (LLMs) often preclude their use in sensitive research. To address this, we intro...

Tung T. Ngo, Dai Nguyen Van, Anh-Minh Nguyen, Phuong-Anh Do, Anh Nguyen-Quoc

2602.18352 2026-02-20
AI LLM

Vichara: Appellate Judgment Prediction and Explanation for the Indian Judicial System

In jurisdictions like India, where courts face an extensive backlog of cases, artificial intelligence offers transformative potential for legal judgment prediction. A critical subset of this backlo...

Pavithra PM Nair, Preethu Rose Anish

2602.18346 2026-02-20
AI LLM

VeriSoftBench: Repository-Scale Formal Verification Benchmarks for Lean

Large language models have achieved striking results in interactive theorem proving, particularly in Lean. However, most benchmarks for LLM-based proof automation are drawn from mathematics in the ...

Yutong Xin, Qiaochu Chen, Greg Durrett, Işil Dillig

2602.18307 2026-02-20
AI LLM

ReqElicitGym: An Evaluation Environment for Interview Competence in Conversational Requirements Elicitation

With the rapid improvement of LLMs' coding capabilities, the bottleneck of LLM-based automated software development is shifting from generating correct code to eliciting users' requirements. Despit...

Dongming Jin, Zhi Jin, Zheng Fang, Linyu Li, XiaoTian Yang, Yuanpeng He, Xiaohong Chen

2602.18306 2026-02-20
AI LLM

FeatureBleed: Inferring Private Enriched Attributes From Sparsity-Optimized AI Accelerators

Backend enrichment is now widely deployed in sensitive domains such as product recommendation pipelines, healthcare, and finance, where models are trained on confidential data and retrieve private ...

Darsh Asher, Farshad Dizani, Joshua Kalyanapu, Rosario Cammarota, Aydin Aysu, Samira Mirbagher Aj...

2602.18304 2026-02-20
AI LLM

On the Semantic and Syntactic Information Encoded in Proto-Tokens for One-Step Text Reconstruction

Autoregressive large language models (LLMs) generate text token-by-token, requiring n forward passes to produce a sequence of length n. Recent work, Exploring the Latent Capacity of LLMs for One-St...

Ivan Bondarenko, Egor Palkin, Fedor Tikunov

2602.18301 2026-02-20
AI LLM

Analyzing and Improving Chain-of-Thought Monitorability Through Information Theory

Chain-of-thought (CoT) monitors are LLM-based systems that analyze reasoning traces to detect when outputs may exhibit attributes of interest, such as test-hacking behavior during code generation. ...

Usman Anwar, Tim Bakker, Dana Kianfar, Cristina Pinneri, Christos Louizos

2602.18297 2026-02-20
AI LLM

Context-Aware Mapping of 2D Drawing Annotations to 3D CAD Features Using LLM-Assisted Reasoning for Manufacturing Automation

Manufacturing automation in process planning, inspection planning, and digital-thread integration depends on a unified specification that binds the geometric features of a 3D CAD model to the geome...

Muhammad Tayyab Khana, Lequn Chen, Wenhe Feng, Seung Ki Moon

2602.18296 2026-02-20
AI LLM

A Probabilistic Framework for LLM-Based Model Discovery

Automated methods for discovering mechanistic simulator models from observational data offer a promising path toward accelerating scientific progress. Such methods often take the form of agentic-st...

Stefan Wahl, Raphaela Schenk, Ali Farnoud, Jakob H. Macke, Daniel Gedon

2602.18266 2026-02-20