Papers
Research papers from arXiv and related sources
AndroWasm: an Empirical Study on Android Malware Obfuscation through WebAssembly
In recent years, stealthy Android malware has increasingly adopted sophisticated techniques to bypass automatic detection mechanisms and harden manual analysis. Adversaries typically rely on obfusc...
Diego Soi, Silvia Lucia Sanna, Lorenzo Pisu, Leonardo Regano, Giorgio Giacinto
Gradient Regularization Prevents Reward Hacking in Reinforcement Learning from Human Feedback and Verifiable Rewards
Reinforcement Learning from Human Feedback (RLHF) or Verifiable Rewards (RLVR) are two key steps in the post-training of modern Language Models (LMs). A common problem is reward hacking, where the ...
Johannes Ackermann, Michael Noukhovitch, Takashi Ishida, Masashi Sugiyama
Decision Support under Prediction-Induced Censoring
In many data-driven online decision systems, actions determine not only operational costs but also the data availability for future learning -- a phenomenon termed Prediction-Induced Censoring (PIC...
Yan Chen, Ruyi Huang, Cheng Liu
Towards More Standardized AI Evaluation: From Models to Agents
Evaluation is no longer a final checkpoint in the machine learning lifecycle. As AI systems evolve from static models to compound, tool-using agents, evaluation becomes a core control function. The...
Ali El Filali, Inès Bedar
DeepSVU: Towards In-depth Security-oriented Video Understanding via Unified Physical-world Regularized MoE
In the literature, prior research on Security-oriented Video Understanding (SVU) has predominantly focused on detecting and localize the threats (e.g., shootings, robberies) in videos, while largel...
Yujie Jin, Wenxin Zhang, Jingjing Wang, Guodong Zhou
Towards LLM-centric Affective Visual Customization via Efficient and Precise Emotion Manipulating
Previous studies on visual customization primarily rely on the objective alignment between various control signals (e.g., language, layout and canny) and the edited images, which largely ignore the...
Jiamin Luo, Xuqian Gu, Jingjing Wang, Jiahong Lu
DeCEAT: Decoding Carbon Emissions for AI-driven Software Testing
The increasing use of language models in automated software testing raises concerns about their environmental impact, yet existing sustainability analyses focus almost exclusively on large language...
Pragati Kumari, Novarun Deb
NIMMGen: Learning Neural-Integrated Mechanistic Digital Twins with LLMs
Mechanistic models encode scientific knowledge about dynamical systems and are widely used in downstream scientific and policy applications. Recent work has explored LLM-based agentic frameworks to...
Zihan Guan, Rituparna Datta, Mengxuan Hu, Shunshun Liu, Aiying Zhang, Prasanna Balachandran, Shen...
Aurora: Neuro-Symbolic AI Driven Advising Agent
Academic advising in higher education is under severe strain, with advisor-to-student ratios commonly exceeding 300:1. These structural bottlenecks limit timely access to guidance, increase the ris...
Lorena Amanda Quincoso Lugones, Christopher Kverne, Nityam Sharadkumar Bhimani, Ana Carolina Oliv...
Turbo Connection: Reasoning as Information Flow from Higher to Lower Layers
Complex problems, whether in math, logic, or planning, are solved by humans through a sequence of steps where the result of one step informs the next. In this work, we adopt the perspective that th...
Mohan Tang, Sidi Lu
WorkflowPerturb: Calibrated Stress Tests for Evaluating Multi-Agent Workflow Metrics
LLM-based systems increasingly generate structured workflows for complex tasks. In practice, automatic evaluation of these workflows is difficult, because metric scores are often not calibrated, an...
Madhav Kanda, Pedro Las-Casas, Alok Gautam Kumbhare, Rodrigo Fonseca, Sharad Agarwal
Mining Type Constructs Using Patterns in AI-Generated Code
Artificial Intelligence (AI) increasingly automates various parts of the software development tasks. Although AI has enhanced the productivity of development tasks, it remains unstudied whether AI ...
Imgyeong Lee, Tayyib Ul Hassan, Abram Hindle
CUICurate: A GraphRAG-based Framework for Automated Clinical Concept Curation for NLP applications
Background: Clinical named entity recognition tools commonly map free text to Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs). For many downstream tasks, however, the clini...
Victoria Blake, Mathew Miller, Jamie Novak, Sze-yuan Ooi, Blanca Gallego
Optimizing Graph Causal Classification Models: Estimating Causal Effects and Addressing Confounders
Graph data is becoming increasingly prevalent due to the growing demand for relational insights in AI across various domains. Organizations regularly use graph data to solve complex problems involv...
Simi Job, Xiaohui Tao, Taotao Cai, Haoran Xie, Jianming Yong, Xin Wang
Analyzing LLM Instruction Optimization for Tabular Fact Verification
Instruction optimization provides a lightweight, model-agnostic approach to enhancing the reasoning performance of large language models (LLMs). This paper presents the first systematic comparison ...
Xiaotang Du, Giwon Hong, Wai-Chung Kwan, Rohit Saxena, Ivan Titov, Pasquale Minervini, Emily Allaway
Operational Agency: A Permeable Legal Fiction for Tracing Culpability in AI Systems
Modern artificial intelligence (AI) systems act with a high degree of independence yet lack legal personhood-a paradox that fractures doctrines grounded in human-centric notions of mens rea and act...
Anirban Mukherjee, Hannah Hanwen Chang
Memory-Based Advantage Shaping for LLM-Guided Reinforcement Learning
In environments with sparse or delayed rewards, reinforcement learning (RL) incurs high sample complexity due to the large number of interactions needed for learning. This limitation has motivated ...
Narjes Nourzad, Carlee Joe-Wong
MIRA: Memory-Integrated Reinforcement Learning Agent with Limited LLM Guidance
Reinforcement learning (RL) agents often suffer from high sample complexity in sparse or delayed reward settings due to limited prior structure. Large language models (LLMs) can provide subgoal dec...
Narjes Nourzad, Carlee Joe-Wong
Visual Anthropomorphism Shifts Evaluations of Gendered AI Managers
This research examines whether competence cues can reduce gender bias in evaluations of AI managers and whether these effects depend on how the AI is represented. Across two preregistered experimen...
Ruiqing Han, Hao Cui, Taha Yasseri
Alignment in Time: Peak-Aware Orchestration for Long-Horizon Agentic Systems
Traditional AI alignment primarily focuses on individual model outputs; however, autonomous agents in long-horizon workflows require sustained reliability across entire interaction trajectories. We...
Hanjing Shi, Dominic DiFranzo