Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
AI LLM

Human or Machine? A Preliminary Turing Test for Speech-to-Speech Interaction

The pursuit of human-like conversational agents has long been guided by the Turing test. For modern speech-to-speech (S2S) systems, a critical yet unanswered question is whether they can converse l...

Xiang Li, Jiabao Gao, Sipei Lin, Xuan Zhou, Chi Zhang, Bo Cheng, Jiale Han, Benyou Wang

2602.24080 2026-02-27
TESTING

Ecological memory of hydrodynamic cues shapes growth and migration of motile microorganisms

Microorganisms live in inherently dynamic environments where fluctuations in biotic and abiotic factors shape their behaviour, physiology, and fitness. The concept of ecological memory: the lasting...

Narges Kakavand, Anupam Sengupta

2602.24073 2026-02-27
AI LLM

A Novel Hierarchical Multi-Agent System for Payments Using LLMs

Large language model (LLM) agents, such as OpenAI's Operator and Claude's Computer Use, can automate workflows but unable to handle payment tasks. Existing agentic solutions have gained significant...

Joon Kiat Chua, Donghao Huang, Zhaoxia Wang

2602.24068 2026-02-27
AI LLM

Task Complexity Matters: An Empirical Study of Reasoning in LLMs for Sentiment Analysis

Large language models (LLMs) with reasoning capabilities have fueled a compelling narrative that reasoning universally improves performance across language tasks. We test this claim through a compr...

Donghao Huang, Zhaoxia Wang

2602.24060 2026-02-27
AI LLM

CIRCLE: A Framework for Evaluating AI from a Real-World Lens

This paper proposes CIRCLE, a six-stage, lifecycle-based framework to bridge the reality gap between model-centric performance metrics and AI's materialized outcomes in deployment. While existing f...

Reva Schwartz, Carina Westling, Morgan Briggs, Marzieh Fadaee, Isar Nejadgholi, Matthew Holmes, F...

2602.24055 2026-02-27
TESTING

Unsupervised Baseline Clustering and Incremental Adaptation for IoT Device Traffic Profiling

The growth and heterogeneity of IoT devices create security challenges where static identification models can degrade as traffic evolves. This paper presents a two-stage, flow-feature-based pipelin...

Sean M. Alderman, John D. Hastings

2602.24047 2026-02-27
AI LLM

Data Driven Optimization of GPU efficiency for Distributed LLM Adapter Serving

Large Language Model (LLM) adapters enable low-cost model specialization, but introduce complex caching and scheduling challenges in distributed serving systems where hundreds of adapters must be h...

Ferran Agullo, Joan Oliveras, Chen Wang, Alberto Gutierrez-Torre, Olivier Tardieu, Alaa Youssef, ...

2602.24044 2026-02-27
TESTING

Spatio-Temporal Garment Reconstruction Using Diffusion Mapping via Pattern Coordinates

Reconstructing 3D clothed humans from monocular images and videos is a fundamental problem with applications in virtual try-on, avatar creation, and mixed reality. Despite significant progress in h...

Yingxuan You, Ren Li, Corentin Dumery, Cong Cao, Hao Li, Pascal Fua

2602.24043 2026-02-27
AI LLM

RewardUQ: A Unified Framework for Uncertainty-Aware Reward Models

Reward models are central to aligning large language models (LLMs) with human preferences. Yet most approaches rely on pointwise reward estimates that overlook the epistemic uncertainty in reward m...

Daniel Yang, Samuel Stante, Florian Redhardt, Lena Libon, Parnian Kassraie, Ido Hakimi, Barna Pás...

2602.24040 2026-02-27
AI LLM

Designing AI Tutors for Interest-Based Learning: Insights from Human Instructors

Interest-based learning (IBL) is a paradigm of instruction in which educational content is contextualized using learners' interests to enhance content relevance. IBL has been shown to result in imp...

Abhishek Kulkarni, Sharon Lynn Chu

2602.24036 2026-02-27
TESTING

A Quality Framework for Testing Gravity with Wide Binaries: No Evidence for MOND

Wide binaries (WBs) offer a unique opportunity to test gravity in the low-acceleration regime, where modifications such as Milgromian dynamics (MOND) predict measurable deviations from Newtonian gr...

Stephen A. Cookson, Indranil Banik, Kareem El-Badry, Will Sutherland, Zephyr Penoyre, Charalambos...

2602.24035 2026-02-27
TESTING

GuardAlign: Test-time Safety Alignment in Multimodal Large Language Models

Large vision-language models (LVLMs) have achieved remarkable progress in vision-language reasoning tasks, yet ensuring their safety remains a critical challenge. Recent input-side defenses detect ...

Xingyu Zhu, Beier Zhu, Junfeng Fang, Shuo Wang, Yin Zhang, Xiang Wang, Xiangnan He

2602.24027 2026-02-27
AI LLM

Breaking the Illusion of Artificial Consensus: Clone-Robust Weighting for Arbitrary Metric Spaces

Independent media are central to democratic decision-making, yet recent technological developments, such as social media, pseudonymous identities, and generative AI, have made them more vulnerable ...

Damien Berriaud, Roger Wattenhofer

2602.24024 2026-02-27
TESTING

Cross-order induced behaviors in contagion dynamics on higher-order networks

Recent studies have shown that novel collective behaviors emerge in complex systems due to higher-order interactions. However, the way in which the structural correlations of these interactions sha...

Kaloyan Danovski, Sandro Meloni, Michele Starnini

2602.24023 2026-02-27
AI LLM

Steering and Rectifying Latent Representation Manifolds in Frozen Multi-modal LLMs for Video Anomaly Detection

Video anomaly detection (VAD) aims to identify abnormal events in videos. Traditional VAD methods generally suffer from the high costs of labeled data and full training, thus some recent works have...

Zhaolin Cai, Fan Li, Huiyu Duan, Lijun He, Guangtao Zhai

2602.24021 2026-02-27
AI LLM

Interpretable Debiasing of Vision-Language Models for Social Fairness

The rapid advancement of Vision-Language models (VLMs) has raised growing concerns that their black-box reasoning processes could lead to unintended forms of social bias. Current debiasing approach...

Na Min An, Yoonna Jang, Yusuke Hirota, Ryo Hachiuma, Isabelle Augenstein, Hyunjung Shim

2602.24014 2026-02-27
TESTING

LeGend: A Data-Driven Framework for Lemma Generation in Hardware Model Checking

Property checking of RTL designs is a central task in formal verification. Among available engines, IC3/PDR is a widely used backbone whose performance critically depends on inductive generalizatio...

Mingkai Miao, Guangyu Hu, Wei Zhang, Hongce Zhang

2602.24010 2026-02-27
AI LLM

Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking

Jailbreak techniques for large language models (LLMs) evolve faster than benchmarks, making robustness estimates stale and difficult to compare across papers due to drift in datasets, harnesses, an...

Zhicheng Fang, Jingjie Zheng, Chenxu Fu, Wei Xu

2602.24009 2026-02-27
TESTING

Foundation World Models for Agents that Learn, Verify, and Adapt Reliably Beyond Static Environments

The next generation of autonomous agents must not only learn efficiently but also act reliably and adapt their behavior in open worlds. Standard approaches typically assume fixed tasks and environm...

Florent Delgrange

2602.23997 2026-02-27
TESTING

Large-scale portfolio optimization on a trapped-ion quantum computer

We present an end-to-end pipeline for large-scale portfolio selection with cardinality constraints and experimentally demonstrate it on trapped-ion quantum processors using hardware-aware decomposi...

Alejandro Gomez Cadavid, Ananth Kaushik, Pranav Chandarana, Miguel Angel Lopez-Ruiz, Gaurav Dev, ...

2602.23976 2026-02-27