Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

AndroWasm: an Empirical Study on Android Malware Obfuscation through WebAssembly

In recent years, stealthy Android malware has increasingly adopted sophisticated techniques to bypass automatic detection mechanisms and harden manual analysis. Adversaries typically rely on obfusc...

Diego Soi, Silvia Lucia Sanna, Lorenzo Pisu, Leonardo Regano, Giorgio Giacinto

2602.18082 2026-02-20
AI LLM

Gradient Regularization Prevents Reward Hacking in Reinforcement Learning from Human Feedback and Verifiable Rewards

Reinforcement Learning from Human Feedback (RLHF) or Verifiable Rewards (RLVR) are two key steps in the post-training of modern Language Models (LMs). A common problem is reward hacking, where the ...

Johannes Ackermann, Michael Noukhovitch, Takashi Ishida, Masashi Sugiyama

2602.18037 2026-02-20
AI LLM

Decision Support under Prediction-Induced Censoring

In many data-driven online decision systems, actions determine not only operational costs but also the data availability for future learning -- a phenomenon termed Prediction-Induced Censoring (PIC...

Yan Chen, Ruyi Huang, Cheng Liu

2602.18031 2026-02-20
AI LLM

Towards More Standardized AI Evaluation: From Models to Agents

Evaluation is no longer a final checkpoint in the machine learning lifecycle. As AI systems evolve from static models to compound, tool-using agents, evaluation becomes a core control function. The...

Ali El Filali, Inès Bedar

2602.18029 2026-02-20
AI LLM

DeepSVU: Towards In-depth Security-oriented Video Understanding via Unified Physical-world Regularized MoE

In the literature, prior research on Security-oriented Video Understanding (SVU) has predominantly focused on detecting and localize the threats (e.g., shootings, robberies) in videos, while largel...

Yujie Jin, Wenxin Zhang, Jingjing Wang, Guodong Zhou

2602.18019 2026-02-20
AI LLM

Towards LLM-centric Affective Visual Customization via Efficient and Precise Emotion Manipulating

Previous studies on visual customization primarily rely on the objective alignment between various control signals (e.g., language, layout and canny) and the edited images, which largely ignore the...

Jiamin Luo, Xuqian Gu, Jingjing Wang, Jiahong Lu

2602.18016 2026-02-20
AI LLM

DeCEAT: Decoding Carbon Emissions for AI-driven Software Testing

The increasing use of language models in automated software testing raises concerns about their environmental impact, yet existing sustainability analyses focus almost exclusively on large language...

Pragati Kumari, Novarun Deb

2602.18012 2026-02-20
AI LLM

NIMMGen: Learning Neural-Integrated Mechanistic Digital Twins with LLMs

Mechanistic models encode scientific knowledge about dynamical systems and are widely used in downstream scientific and policy applications. Recent work has explored LLM-based agentic frameworks to...

Zihan Guan, Rituparna Datta, Mengxuan Hu, Shunshun Liu, Aiying Zhang, Prasanna Balachandran, Shen...

2602.18008 2026-02-20
AI LLM

Aurora: Neuro-Symbolic AI Driven Advising Agent

Academic advising in higher education is under severe strain, with advisor-to-student ratios commonly exceeding 300:1. These structural bottlenecks limit timely access to guidance, increase the ris...

Lorena Amanda Quincoso Lugones, Christopher Kverne, Nityam Sharadkumar Bhimani, Ana Carolina Oliv...

2602.17999 2026-02-20
AI LLM

Turbo Connection: Reasoning as Information Flow from Higher to Lower Layers

Complex problems, whether in math, logic, or planning, are solved by humans through a sequence of steps where the result of one step informs the next. In this work, we adopt the perspective that th...

Mohan Tang, Sidi Lu

2602.17993 2026-02-20
AI LLM

WorkflowPerturb: Calibrated Stress Tests for Evaluating Multi-Agent Workflow Metrics

LLM-based systems increasingly generate structured workflows for complex tasks. In practice, automatic evaluation of these workflows is difficult, because metric scores are often not calibrated, an...

Madhav Kanda, Pedro Las-Casas, Alok Gautam Kumbhare, Rodrigo Fonseca, Sharad Agarwal

2602.17990 2026-02-20
AI LLM

Mining Type Constructs Using Patterns in AI-Generated Code

Artificial Intelligence (AI) increasingly automates various parts of the software development tasks. Although AI has enhanced the productivity of development tasks, it remains unstudied whether AI ...

Imgyeong Lee, Tayyib Ul Hassan, Abram Hindle

2602.17955 2026-02-20
AI LLM

CUICurate: A GraphRAG-based Framework for Automated Clinical Concept Curation for NLP applications

Background: Clinical named entity recognition tools commonly map free text to Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs). For many downstream tasks, however, the clini...

Victoria Blake, Mathew Miller, Jamie Novak, Sze-yuan Ooi, Blanca Gallego

2602.17949 2026-02-20
AI LLM

Optimizing Graph Causal Classification Models: Estimating Causal Effects and Addressing Confounders

Graph data is becoming increasingly prevalent due to the growing demand for relational insights in AI across various domains. Organizations regularly use graph data to solve complex problems involv...

Simi Job, Xiaohui Tao, Taotao Cai, Haoran Xie, Jianming Yong, Xin Wang

2602.17941 2026-02-20
AI LLM

Analyzing LLM Instruction Optimization for Tabular Fact Verification

Instruction optimization provides a lightweight, model-agnostic approach to enhancing the reasoning performance of large language models (LLMs). This paper presents the first systematic comparison ...

Xiaotang Du, Giwon Hong, Wai-Chung Kwan, Rohit Saxena, Ivan Titov, Pasquale Minervini, Emily Allaway

2602.17937 2026-02-20
AI LLM

Operational Agency: A Permeable Legal Fiction for Tracing Culpability in AI Systems

Modern artificial intelligence (AI) systems act with a high degree of independence yet lack legal personhood-a paradox that fractures doctrines grounded in human-centric notions of mens rea and act...

Anirban Mukherjee, Hannah Hanwen Chang

2602.17932 2026-02-20
AI LLM

Memory-Based Advantage Shaping for LLM-Guided Reinforcement Learning

In environments with sparse or delayed rewards, reinforcement learning (RL) incurs high sample complexity due to the large number of interactions needed for learning. This limitation has motivated ...

Narjes Nourzad, Carlee Joe-Wong

2602.17931 2026-02-20
AI LLM

MIRA: Memory-Integrated Reinforcement Learning Agent with Limited LLM Guidance

Reinforcement learning (RL) agents often suffer from high sample complexity in sparse or delayed reward settings due to limited prior structure. Large language models (LLMs) can provide subgoal dec...

Narjes Nourzad, Carlee Joe-Wong

2602.17930 2026-02-20
AI LLM

Visual Anthropomorphism Shifts Evaluations of Gendered AI Managers

This research examines whether competence cues can reduce gender bias in evaluations of AI managers and whether these effects depend on how the AI is represented. Across two preregistered experimen...

Ruiqing Han, Hao Cui, Taha Yasseri

2602.17919 2026-02-20
AI LLM

Alignment in Time: Peak-Aware Orchestration for Long-Horizon Agentic Systems

Traditional AI alignment primarily focuses on individual model outputs; however, autonomous agents in long-horizon workflows require sustained reliability across entire interaction trajectories. We...

Hanjing Shi, Dominic DiFranzo

2602.17910 2026-02-20