Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
AI LLM

Ensembling Language Models with Sequential Monte Carlo

Practitioners have access to an abundance of language models and prompting strategies for solving many language modeling tasks; yet prior work shows that modeling performance is highly sensitive to...

Robin Shing Moon Chan, Tianyu Liu, Samuel Kiegeland, Clemente Pasti, Jacob Hoover Vigly, Timothy ...

2603.05432 2026-03-05
AI LLM

RelaxFlow: Text-Driven Amodal 3D Generation

Image-to-3D generation faces inherent semantic ambiguity under occlusion, where partial observation alone is often insufficient to determine object category. In this work, we formalize text-driven ...

Jiayin Zhu, Guoji Fu, Xiaolu Liu, Qiyuan He, Yicong Li, Angela Yao

2603.05425 2026-03-05
AI LLM

MobileFetalCLIP: Selective Repulsive Knowledge Distillation for Mobile Fetal Ultrasound Analysis

Fetal ultrasound AI could transform prenatal care in low-resource settings, yet current foundation models exceed 300M visual parameters, precluding deployment on point-of-care devices. Standard kno...

Numan Saeed, Fadillah Adamsyah Maani, Mohammad Yaqub

2603.05421 2026-03-05
AI LLM

Dissociating Direct Access from Inference in AI Introspection

Introspection is a foundational cognitive ability, but its mechanism is not well understood. Recent work has shown that AI models can introspect. We study their mechanism of introspection, first ex...

Harvey Lederman, Kyle Mahowald

2603.05414 2026-03-05
AI LLM

Building Enterprise Realtime Voice Agents from Scratch: A Technical Tutorial

We present a technical tutorial for building enterprise-grade realtime voice agents from first principles. While over 25 open-source speech-to-speech models and numerous voice agent frameworks exis...

Jielin Qiu, Zixiang Chen, Liangwei Yang, Ming Zhu, Zhiwei Liu, Juntao Tan, Wenting Zhao, Rithesh ...

2603.05413 2026-03-05
AI LLM

An Exploration-Analysis-Disambiguation Reasoning Framework for Word Sense Disambiguation with Low-Parameter LLMs

Word Sense Disambiguation (WSD) remains a key challenge in Natural Language Processing (NLP), especially when dealing with rare or domain-specific senses that are often misinterpreted. While modern...

Deshan Sumanathilaka, Nicholas Micallef, Julian Hough

2603.05400 2026-03-05
AI LLM

Judge Reliability Harness: Stress Testing the Reliability of LLM Judges

We present the Judge Reliability Harness, an open source library for constructing validation suites that test the reliability of LLM judges. As LLM based scoring is widely deployed in AI benchmarks...

Sunishchal Dev, Andrew Sloan, Joshua Kavner, Nicholas Kong, Morgan Sandler

2603.05399 2026-03-05
AI LLM

Harnessing Synthetic Data from Generative AI for Statistical Inference

The emergence of generative AI models has dramatically expanded the availability and use of synthetic data across scientific, industrial, and policy domains. While these developments open new possi...

Ahmad Abdel-Azim, Ruoyu Wang, Xihong Lin

2603.05396 2026-03-05
AI LLM

Legal interpretation and AI: from expert systems to argumentation and LLMs

AI and Law research has encountered legal interpretation in different ways, in the context of its evolving approaches and methodologies. Research on expert system has focused on legal knowledge eng...

Václav Janeček, Giovanni Sartor

2603.05392 2026-03-05
AI LLM

Fusion-CAM: Integrating Gradient and Region-Based Class Activation Maps for Robust Visual Explanations

Interpreting the decision-making process of deep convolutional neural networks remains a central challenge in achieving trustworthy and transparent artificial intelligence. Explainable AI (XAI) tec...

Hajar Dekdegue, Moncef Garouani, Josiane Mothe, Jordan Bernigaud

2603.05386 2026-03-05
AI LLM

InfoFlow KV: Information-Flow-Aware KV Recomputation for Long Context

Retrieval-augmented generation (RAG) for long-context question answering is bottlenecked by inference-time prefilling over large retrieved contexts. A common strategy is to precompute key-value (KV...

Xin Teng, Canyu Zhang, Shaoyi Zheng, Danyang Zhuo, Tianyi Zhou, Shengjie Wang

2603.05353 2026-03-05
AI LLM

Building AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned

The landscape of AI coding assistance is undergoing a fundamental shift from complex IDE plugins to versatile, terminal-native agents. Operating directly where developers manage source control, exe...

Nghi D. Q. Bui

2603.05344 2026-03-05
AI LLM

Med-V1: Small Language Models for Zero-shot and Scalable Biomedical Evidence Attribution

Assessing whether an article supports an assertion is essential for hallucination detection and claim verification. While large language models (LLMs) have the potential to automate this task, achi...

Qiao Jin, Yin Fang, Lauren He, Yifan Yang, Guangzhi Xiong, Zhizheng Wang, Nicholas Wan, Joey Chan...

2603.05308 2026-03-05
AI LLM

STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks

Recent advances in large language models (LLMs) have enabled agentic systems for sequential decision-making. Such agents must perceive their environment, reason across multiple time steps, and take...

ELita Lobo, Xu Chen, Jingjing Meng, Nan Xi, Yang Jiao, Chirag Agarwal, Yair Zick, Yan Gao

2603.05294 2026-03-05
AI LLM

Knowledge Divergence and the Value of Debate for Scalable Oversight

AI safety via debate and reinforcement learning from AI feedback (RLAIF) are both proposed methods for scalable oversight of advanced AI systems, yet no formal framework relates them or characteriz...

Robin Young

2603.05293 2026-03-05
AI LLM

X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes

Large language models (LLMs) achieve promising performance, yet their ability to reason remains poorly understood. Existing evaluations largely emphasize task-level accuracy, often conflating patte...

Gao Tianxi, Cai Yufan, Yuan Yusi, Dong Jin Song

2603.05290 2026-03-05
AI LLM

A framework for assessing the capabilities of code generation of constraint domain-specific languages with large language models

Large language models (LLMs) can be used to support software development tasks, e.g., through code completion or code generation. However, their effectiveness drops significantly when considering l...

David Delgado, Lola Burgueño, Robert Clarisó

2603.05278 2026-03-05
AI LLM

Whispering to a Blackbox: Bootstrapping Frozen OCR with Visual Prompts

In the landscape of modern machine learning, frozen pre-trained models provide stability and efficiency but often underperform on specific tasks due to mismatched data distributions. This paper int...

Samandar Samandarov, Nazirjon Ismoiljonov, Abdullah Sattorov, Temirlan Sabyrbayev

2603.05276 2026-03-05
AI LLM

Oral to Web: Digitizing 'Zero Resource'Languages of Bangladesh

We present the Multilingual Cloud Corpus, the first national-scale, parallel, multimodal linguistic dataset of Bangladesh's ethnic and indigenous languages. Despite being home to approximately 40 m...

Mohammad Mamun Or Rashid

2603.05272 2026-03-05
AI LLM

VietJobs: A Vietnamese Job Advertisement Dataset

VietJobs is the first large-scale, publicly available corpus of Vietnamese job advertisements, comprising 48,092 postings and over 15 million words collected from all 34 provinces and municipalitie...

Hieu Pham Dinh, Hung Nguyen Huy, Mo El-Haj

2603.05262 2026-03-05