Papers
Research papers from arXiv and related sources
Conversational Learning Diagnosis via Reasoning Multi-Turn Interactive Learning
Learning diagnosis is a critical task that monitors students' cognitive state during educational activities, with the goal of enhancing learning outcomes. With advancements in language models (LMs)...
Fangzhou Yao, Sheng Chang, Weibo Gao, Qi Liu
AI-for-Science Low-code Platform with Bayesian Adversarial Multi-Agent Framework
Large Language Models (LLMs) demonstrate potentials for automating scientific code generation but face challenges in reliability, error propagation in multi-agent workflows, and evaluation in domai...
Zihang Zeng, Jiaquan Zhang, Pengze Li, Yuan Qi, Xi Chen
Understanding and Mitigating Dataset Corruption in LLM Steering
Contrastive steering has been shown as a simple and effective method to adjust the generative behavior of LLMs at inference time. It uses examples of prompt responses with and without a trait to id...
Cullen Anderson, Narmeen Oozeer, Foad Namjoo, Remy Ogasawara, Amirali Abdullah, Jeff M. Phillips
Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use
Agentic language models operate in a fundamentally different safety regime than chat models: they must plan, call tools, and execute long-horizon actions where a single misstep, such as accessing f...
Aradhye Agarwal, Gurdit Siyan, Yash Pandya, Joykirat Singh, Akshay Nambi, Ahmed Awadallah
Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?
As large language models (LLMs) advance their mathematical capabilities toward the IMO level, the scarcity of challenging, high-quality problems for training and evaluation has become a significant...
Dadi Guo, Yuejin Xie, Qingyu Liu, Jiayu Liu, Zhiyuan Fan, Qihan Ren, Shuai Shao, Tianyi Zhou, Don...
Search for a massless particle beyond the Standard Model in the $Ξ^0\toΛ+ \text{invisible}$ decay
A search for a massless beyond-standard-model particle is performed in the decay $Ξ^{0}\toΛ+\text{invisible}$ using $(1.0087 \pm 0.0044)\times 10^{10}$ $J/ψ$ events collected with the BESIII detect...
BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, C. S. Akondi, R. Alibert...
Shared (Mis)Understandings and the Governance of AI: A Thematic Analysis of the 2023-2024 Oversight of AI Hearings
This paper investigates early legislative deliberations over Artificial Intelligence in the United States through a thematic analysis of the 2023-2024 Oversight of AI hearings held by the Senate Ju...
Rachel Leach
MoD-DPO: Towards Mitigating Cross-modal Hallucinations in Omni LLMs using Modality Decoupled Preference Optimization
Omni-modal large language models (omni LLMs) have recently achieved strong performance across audiovisual understanding tasks, yet they remain highly susceptible to cross-modal hallucinations arisi...
Ashutosh Chaubey, Jiacheng Pang, Mohammad Soleymani
Neuro-Symbolic Artificial Intelligence: A Task-Directed Survey in the Black-Box Models Era
The integration of symbolic computing with neural networks has intrigued researchers since the first theorizations of Artificial intelligence (AI). The ability of Neuro-Symbolic (NeSy) methods to i...
Giovanni Pio Delvecchio, Lorenzo Molfetta, Gianluca Moro
Saarthi for AGI: Towards Domain-Specific General Intelligence for Formal Verification
Saarthi is an agentic AI framework that uses multi-agent collaboration to perform end-to-end formal verification. Even though the framework provides a complete flow from specification to coverage c...
Aman Kumar, Deepak Narayan Gadde, Luu Danh Minh, Vaisakh Naduvodi Viswambharan, Keerthan Kopparam...
Conditioned Activation Transport for T2I Safety Steering
Despite their impressive capabilities, current Text-to-Image (T2I) models remain prone to generating unsafe and toxic content. While activation steering offers a promising inference-time interventi...
Maciej Chrabąszcz, Aleksander Szymczyk, Jan Dubiński, Tomasz Trzciński, Franziska Boenisch, Adam ...
An Investigation Into Various Approaches For Bengali Long-Form Speech Transcription and Bengali Speaker Diarization
Bengali remains a low-resource language in speech technology, especially for complex tasks like long-form transcription and speaker diarization. This paper presents a multistage approach developed ...
Epshita Jahan, Khandoker Md Tanjinul Islam, Pritom Biswas, Tafsir Al Nafin
From Language to Action: Can LLM-Based Agents Be Used for Embodied Robot Cognition?
In order to flexibly act in an everyday environment, a robotic agent needs a variety of cognitive capabilities that enable it to reason about plans and perform execution recovery. Large language mo...
Shinas Shaji, Fabian Huppertz, Alex Mitrevski, Sebastian Houben
Agentic AI-based Coverage Closure for Formal Verification
Coverage closure is a critical requirement in Integrated Chip (IC) development process and key metric for verification sign-off. However, traditional exhaustive approaches often fail to achieve ful...
Sivaram Pothireddypalli, Ashish Raman, Deepak Narayan Gadde, Aman Kumar
Channel-Adaptive Edge AI: Maximizing Inference Throughput by Adapting Computational Complexity to Channel States
\emph{Integrated communication and computation} (IC$^2$) has emerged as a new paradigm for enabling efficient edge inference in sixth-generation (6G) networks. However, the design of IC$^2$ technol...
Jierui Zhang, Jianhao Huang, Kaibin Huang
The Household Impact of Generative AI: Evidence from Internet Browsing Behavior
This paper studies the impact of generative AI on U.S. households' task allocation at home, using detailed Internet browsing data from a large sample of home devices between 2021 and 2024. Leveragi...
Michael Blank, Gregor Schubert, Miao Ben Zhang
APRES: An Agentic Paper Revision and Evaluation System
Scientific discoveries must be communicated clearly to realize their full potential. Without effective communication, even the most groundbreaking findings risk being overlooked or misunderstood. T...
Bingchen Zhao, Jenny Zhang, Chenxi Whitehouse, Minqi Jiang, Michael Shvartsman, Abhishek Charnali...
How to Model AI Agents as Personas?: Applying the Persona Ecosystem Playground to 41,300 Posts on Moltbook for Behavioral Insights
AI agents are increasingly active on social media platforms, generating content and interacting with one another at scale. Yet the behavioral diversity of these agents remains poorly understood, an...
Danial Amin, Joni Salminen, Bernard J. Jansen
UniSkill: A Dataset for Matching University Curricula to Professional Competencies
Skill extraction and recommendation systems have been studied from recruiter, applicant, and education perspectives. While AI applications in job advertisements have received broad attention, defic...
Nurlan Musazade, Joszef Mezei, Mike Zhang
The Science Data Lake: A Unified Open Infrastructure Integrating 293 Million Papers Across Eight Scholarly Sources with Embedding-Based Ontology Alignment
Scholarly data are largely fragmented across siloed databases with divergent metadata and missing linkages among them. We present the Science Data Lake, a locally-deployable infrastructure built on...
Jonas Wilinski