Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

Simplifying Outcomes of Language Model Component Analyses with ELIA

While mechanistic interpretability has developed powerful tools to analyze the internal workings of Large Language Models (LLMs), their complexity has created an accessibility gap, limiting their u...

Aaron Louis Eidt, Nils Feldhus

2602.18262 2026-02-20
AI LLM

Dual-Tree LLM-Enhanced Negative Sampling for Implicit Collaborative Filtering

Negative sampling is a pivotal technique in implicit collaborative filtering (CF) recommendation, enabling efficient and effective training by contrasting observed interactions with sampled unobser...

Jiayi Wu, Zhengyu Wu, Xunkai Li, Rong-Hua Li, Guoren Wang

2602.18249 2026-02-20
AI LLM

Reflections on the Future of Statistics Education in a Technological Era

Keeping pace with rapidly evolving technology is a key challenge in teaching statistics. To equip students with essential skills for the modern workplace, educators must integrate relevant technolo...

Craig Alexander, Jennifer Gaskell, Vinny Davies

2602.18242 2026-02-20
AI LLM

Thinking by Subtraction: Confidence-Driven Contrastive Decoding for LLM Reasoning

Recent work on test-time scaling for large language model (LLM) reasoning typically assumes that allocating more inference-time computation uniformly improves correctness. However, prior studies sh...

Lexiang Tang, Weihao Gao, Bingchen Zhao, Lu Ma, Qiao jin, Bang Yang, Yuexian Zou

2602.18232 2026-02-20
AI LLM

[Re] Benchmarking LLM Capabilities in Negotiation through Scoreable Games

Large Language Models (LLMs) demonstrate significant potential in multi-agent negotiation tasks, yet evaluation in this domain remains challenging due to a lack of robust and generalizable benchmar...

Jorge Carrasco Pollo, Ioannis Kapetangeorgis, Joshua Rosenthal, John Hua Yao

2602.18230 2026-02-20
AI LLM

Art Notions in the Age of (Mis)anthropic AI

In this paper, I take the cultural effects of generative artificial intelligence (generative AI) as a context for examining a broader perspective of AI's impact on contemporary art notions. After t...

Dejan Grba

2602.18202 2026-02-20
AI LLM

Role and Identity Work of Software Engineering Professionals in the Generative AI Era

The adoption of Generative AI (GenAI) suggests major changes for software engineering, including technical aspects but also human aspects of the professionals involved. One of these aspects is how ...

Jorge Melegati

2602.18190 2026-02-20
AI LLM

Computer Vision in Tactical AI Art

AI art comprises a spectrum of creative endeavors that emerge from and respond to the development of artificial intelligence (AI), the expansion of AI-powered economies, and their influence on cult...

Dejan Grba

2602.18189 2026-02-20
AI LLM

Capabilities Ain't All You Need: Measuring Propensities in AI

AI evaluation has primarily focused on measuring capabilities, with formal approaches inspired from Item Response Theory (IRT) being increasingly applied. Yet propensities - the tendencies of model...

Daniel Romero-Alvarado, Fernando Martínez-Plumed, Lorenzo Pacchiardi, Hugo Save, Siddhesh Milind ...

2602.18182 2026-02-20
AI LLM

SeedFlood: A Step Toward Scalable Decentralized Training of LLMs

This work presents a new approach to decentralized training-SeedFlood-designed to scale for large models across complex network topologies and achieve global consensus with minimal communication ov...

Jihun Kim, Namhoon Lee

2602.18181 2026-02-20
AI LLM

Can AI Lower the Barrier to Cybersecurity? A Human-Centered Mixed-Methods Study of Novice CTF Learning

Capture-the-Flag (CTF) competitions serve as gateways into offensive cybersecurity, yet they often present steep barriers for novices due to complex toolchains and opaque workflows. Recently, agent...

Cathrin Schachner, Jasmin Wachter

2602.18172 2026-02-20
AI LLM

Click it or Leave it: Detecting and Spoiling Clickbait with Informativeness Measures and Large Language Models

Clickbait headlines degrade the quality of online information and undermine user trust. We present a hybrid approach to clickbait detection that combines transformer-based text embeddings with ling...

Wojciech Michaluk, Tymoteusz Urban, Mateusz Kubita, Soveatin Kuntur, Anna Wroblewska

2602.18171 2026-02-20
AI LLM

FENCE: A Financial and Multimodal Jailbreak Detection Dataset

Jailbreaking poses a significant risk to the deployment of Large Language Models (LLMs) and Vision Language Models (VLMs). VLMs are particularly vulnerable because they process both text and images...

Mirae Kim, Seonghun Jeong, Youngjun Kwak

2602.18154 2026-02-20
AI LLM

The Statistical Signature of LLMs

Large language models generate text through probabilistic sampling from high-dimensional distributions, yet how this process reshapes the structural statistical organization of language remains inc...

Ortal Hadad, Edoardo Loru, Jacopo Nudo, Niccolò Di Marco, Matteo Cinelli, Walter Quattrociocchi

2602.18152 2026-02-20
AI LLM

Detecting Contextual Hallucinations in LLMs with Frequency-Aware Attention

Hallucination detection is critical for ensuring the reliability of large language models (LLMs) in context-based generation. Prior work has explored intrinsic signals available during generation, ...

Siya Qi, Yudong Chen, Runcong Zhao, Qinglin Zhu, Zhanghao Hu, Wei Liu, Yulan He, Zheng Yuan, Lin Gui

2602.18145 2026-02-20
AI LLM

Demonstrating Restraint

Some have claimed that the future development of powerful AI systems would enable the United States to shift the international balance of power dramatically in its favor. Such a feat may not be tec...

L. C. R. Patell, O. E. Guest

2602.18139 2026-02-20
AI LLM

Agentic Adversarial QA for Improving Domain-Specific LLMs

Large Language Models (LLMs), despite extensive pretraining on broad internet corpora, often struggle to adapt effectively to specialized domains. There is growing interest in fine-tuning these mod...

Vincent Grari, Ciprian Tomoiaga, Sylvain Lamprier, Tatsunori Hashimoto, Marcin Detyniecki

2602.18137 2026-02-20
AI LLM

Neurosymbolic Language Reasoning as Satisfiability Modulo Theory

Natural language understanding requires interleaving textual and logical reasoning, yet large language models often fail to perform such reasoning reliably. Existing neurosymbolic systems combine L...

Hyunseok Oh, Sam Stern, Youngki Lee, Matthai Philipose

2602.18095 2026-02-20
AI LLM

OODBench: Out-of-Distribution Benchmark for Large Vision-Language Models

Existing Visual-Language Models (VLMs) have achieved significant progress by being trained on massive-scale datasets, typically under the assumption that data are independent and identically distri...

Ling Lin, Yang Bai, Heng Su, Congcong Zhu, Yaoxing Wang, Yang Zhou, Huazhu Fu, Jingrun Chen

2602.18094 2026-02-20
AI LLM

Perceived Political Bias in LLMs Reduces Persuasive Abilities

Conversational AI has been proposed as a scalable way to correct public misconceptions and spread misinformation. Yet its effectiveness may depend on perceptions of its political neutrality. As LLM...

Matthew DiGiuseppe, Joshua Robison

2602.18092 2026-02-20