Papers
Research papers from arXiv and related sources
Ensembling Language Models with Sequential Monte Carlo
Practitioners have access to an abundance of language models and prompting strategies for solving many language modeling tasks; yet prior work shows that modeling performance is highly sensitive to...
Robin Shing Moon Chan, Tianyu Liu, Samuel Kiegeland, Clemente Pasti, Jacob Hoover Vigly, Timothy ...
RelaxFlow: Text-Driven Amodal 3D Generation
Image-to-3D generation faces inherent semantic ambiguity under occlusion, where partial observation alone is often insufficient to determine object category. In this work, we formalize text-driven ...
Jiayin Zhu, Guoji Fu, Xiaolu Liu, Qiyuan He, Yicong Li, Angela Yao
MobileFetalCLIP: Selective Repulsive Knowledge Distillation for Mobile Fetal Ultrasound Analysis
Fetal ultrasound AI could transform prenatal care in low-resource settings, yet current foundation models exceed 300M visual parameters, precluding deployment on point-of-care devices. Standard kno...
Numan Saeed, Fadillah Adamsyah Maani, Mohammad Yaqub
Dissociating Direct Access from Inference in AI Introspection
Introspection is a foundational cognitive ability, but its mechanism is not well understood. Recent work has shown that AI models can introspect. We study their mechanism of introspection, first ex...
Harvey Lederman, Kyle Mahowald
Building Enterprise Realtime Voice Agents from Scratch: A Technical Tutorial
We present a technical tutorial for building enterprise-grade realtime voice agents from first principles. While over 25 open-source speech-to-speech models and numerous voice agent frameworks exis...
Jielin Qiu, Zixiang Chen, Liangwei Yang, Ming Zhu, Zhiwei Liu, Juntao Tan, Wenting Zhao, Rithesh ...
An Exploration-Analysis-Disambiguation Reasoning Framework for Word Sense Disambiguation with Low-Parameter LLMs
Word Sense Disambiguation (WSD) remains a key challenge in Natural Language Processing (NLP), especially when dealing with rare or domain-specific senses that are often misinterpreted. While modern...
Deshan Sumanathilaka, Nicholas Micallef, Julian Hough
Judge Reliability Harness: Stress Testing the Reliability of LLM Judges
We present the Judge Reliability Harness, an open source library for constructing validation suites that test the reliability of LLM judges. As LLM based scoring is widely deployed in AI benchmarks...
Sunishchal Dev, Andrew Sloan, Joshua Kavner, Nicholas Kong, Morgan Sandler
Harnessing Synthetic Data from Generative AI for Statistical Inference
The emergence of generative AI models has dramatically expanded the availability and use of synthetic data across scientific, industrial, and policy domains. While these developments open new possi...
Ahmad Abdel-Azim, Ruoyu Wang, Xihong Lin
Legal interpretation and AI: from expert systems to argumentation and LLMs
AI and Law research has encountered legal interpretation in different ways, in the context of its evolving approaches and methodologies. Research on expert system has focused on legal knowledge eng...
Václav Janeček, Giovanni Sartor
Fusion-CAM: Integrating Gradient and Region-Based Class Activation Maps for Robust Visual Explanations
Interpreting the decision-making process of deep convolutional neural networks remains a central challenge in achieving trustworthy and transparent artificial intelligence. Explainable AI (XAI) tec...
Hajar Dekdegue, Moncef Garouani, Josiane Mothe, Jordan Bernigaud
InfoFlow KV: Information-Flow-Aware KV Recomputation for Long Context
Retrieval-augmented generation (RAG) for long-context question answering is bottlenecked by inference-time prefilling over large retrieved contexts. A common strategy is to precompute key-value (KV...
Xin Teng, Canyu Zhang, Shaoyi Zheng, Danyang Zhuo, Tianyi Zhou, Shengjie Wang
Building AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned
The landscape of AI coding assistance is undergoing a fundamental shift from complex IDE plugins to versatile, terminal-native agents. Operating directly where developers manage source control, exe...
Nghi D. Q. Bui
Med-V1: Small Language Models for Zero-shot and Scalable Biomedical Evidence Attribution
Assessing whether an article supports an assertion is essential for hallucination detection and claim verification. While large language models (LLMs) have the potential to automate this task, achi...
Qiao Jin, Yin Fang, Lauren He, Yifan Yang, Guangzhi Xiong, Zhizheng Wang, Nicholas Wan, Joey Chan...
STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks
Recent advances in large language models (LLMs) have enabled agentic systems for sequential decision-making. Such agents must perceive their environment, reason across multiple time steps, and take...
ELita Lobo, Xu Chen, Jingjing Meng, Nan Xi, Yang Jiao, Chirag Agarwal, Yair Zick, Yan Gao
Knowledge Divergence and the Value of Debate for Scalable Oversight
AI safety via debate and reinforcement learning from AI feedback (RLAIF) are both proposed methods for scalable oversight of advanced AI systems, yet no formal framework relates them or characteriz...
Robin Young
X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes
Large language models (LLMs) achieve promising performance, yet their ability to reason remains poorly understood. Existing evaluations largely emphasize task-level accuracy, often conflating patte...
Gao Tianxi, Cai Yufan, Yuan Yusi, Dong Jin Song
A framework for assessing the capabilities of code generation of constraint domain-specific languages with large language models
Large language models (LLMs) can be used to support software development tasks, e.g., through code completion or code generation. However, their effectiveness drops significantly when considering l...
David Delgado, Lola Burgueño, Robert Clarisó
Whispering to a Blackbox: Bootstrapping Frozen OCR with Visual Prompts
In the landscape of modern machine learning, frozen pre-trained models provide stability and efficiency but often underperform on specific tasks due to mismatched data distributions. This paper int...
Samandar Samandarov, Nazirjon Ismoiljonov, Abdullah Sattorov, Temirlan Sabyrbayev
Oral to Web: Digitizing 'Zero Resource'Languages of Bangladesh
We present the Multilingual Cloud Corpus, the first national-scale, parallel, multimodal linguistic dataset of Bangladesh's ethnic and indigenous languages. Despite being home to approximately 40 m...
Mohammad Mamun Or Rashid
VietJobs: A Vietnamese Job Advertisement Dataset
VietJobs is the first large-scale, publicly available corpus of Vietnamese job advertisements, comprising 48,092 postings and over 15 million words collected from all 34 provinces and municipalitie...
Hieu Pham Dinh, Hung Nguyen Huy, Mo El-Haj