Papers
Research papers from arXiv and related sources
Simplifying Outcomes of Language Model Component Analyses with ELIA
While mechanistic interpretability has developed powerful tools to analyze the internal workings of Large Language Models (LLMs), their complexity has created an accessibility gap, limiting their u...
Aaron Louis Eidt, Nils Feldhus
Dual-Tree LLM-Enhanced Negative Sampling for Implicit Collaborative Filtering
Negative sampling is a pivotal technique in implicit collaborative filtering (CF) recommendation, enabling efficient and effective training by contrasting observed interactions with sampled unobser...
Jiayi Wu, Zhengyu Wu, Xunkai Li, Rong-Hua Li, Guoren Wang
Reflections on the Future of Statistics Education in a Technological Era
Keeping pace with rapidly evolving technology is a key challenge in teaching statistics. To equip students with essential skills for the modern workplace, educators must integrate relevant technolo...
Craig Alexander, Jennifer Gaskell, Vinny Davies
Thinking by Subtraction: Confidence-Driven Contrastive Decoding for LLM Reasoning
Recent work on test-time scaling for large language model (LLM) reasoning typically assumes that allocating more inference-time computation uniformly improves correctness. However, prior studies sh...
Lexiang Tang, Weihao Gao, Bingchen Zhao, Lu Ma, Qiao jin, Bang Yang, Yuexian Zou
[Re] Benchmarking LLM Capabilities in Negotiation through Scoreable Games
Large Language Models (LLMs) demonstrate significant potential in multi-agent negotiation tasks, yet evaluation in this domain remains challenging due to a lack of robust and generalizable benchmar...
Jorge Carrasco Pollo, Ioannis Kapetangeorgis, Joshua Rosenthal, John Hua Yao
Art Notions in the Age of (Mis)anthropic AI
In this paper, I take the cultural effects of generative artificial intelligence (generative AI) as a context for examining a broader perspective of AI's impact on contemporary art notions. After t...
Dejan Grba
Role and Identity Work of Software Engineering Professionals in the Generative AI Era
The adoption of Generative AI (GenAI) suggests major changes for software engineering, including technical aspects but also human aspects of the professionals involved. One of these aspects is how ...
Jorge Melegati
Computer Vision in Tactical AI Art
AI art comprises a spectrum of creative endeavors that emerge from and respond to the development of artificial intelligence (AI), the expansion of AI-powered economies, and their influence on cult...
Dejan Grba
Capabilities Ain't All You Need: Measuring Propensities in AI
AI evaluation has primarily focused on measuring capabilities, with formal approaches inspired from Item Response Theory (IRT) being increasingly applied. Yet propensities - the tendencies of model...
Daniel Romero-Alvarado, Fernando Martínez-Plumed, Lorenzo Pacchiardi, Hugo Save, Siddhesh Milind ...
SeedFlood: A Step Toward Scalable Decentralized Training of LLMs
This work presents a new approach to decentralized training-SeedFlood-designed to scale for large models across complex network topologies and achieve global consensus with minimal communication ov...
Jihun Kim, Namhoon Lee
Can AI Lower the Barrier to Cybersecurity? A Human-Centered Mixed-Methods Study of Novice CTF Learning
Capture-the-Flag (CTF) competitions serve as gateways into offensive cybersecurity, yet they often present steep barriers for novices due to complex toolchains and opaque workflows. Recently, agent...
Cathrin Schachner, Jasmin Wachter
Click it or Leave it: Detecting and Spoiling Clickbait with Informativeness Measures and Large Language Models
Clickbait headlines degrade the quality of online information and undermine user trust. We present a hybrid approach to clickbait detection that combines transformer-based text embeddings with ling...
Wojciech Michaluk, Tymoteusz Urban, Mateusz Kubita, Soveatin Kuntur, Anna Wroblewska
FENCE: A Financial and Multimodal Jailbreak Detection Dataset
Jailbreaking poses a significant risk to the deployment of Large Language Models (LLMs) and Vision Language Models (VLMs). VLMs are particularly vulnerable because they process both text and images...
Mirae Kim, Seonghun Jeong, Youngjun Kwak
The Statistical Signature of LLMs
Large language models generate text through probabilistic sampling from high-dimensional distributions, yet how this process reshapes the structural statistical organization of language remains inc...
Ortal Hadad, Edoardo Loru, Jacopo Nudo, Niccolò Di Marco, Matteo Cinelli, Walter Quattrociocchi
Detecting Contextual Hallucinations in LLMs with Frequency-Aware Attention
Hallucination detection is critical for ensuring the reliability of large language models (LLMs) in context-based generation. Prior work has explored intrinsic signals available during generation, ...
Siya Qi, Yudong Chen, Runcong Zhao, Qinglin Zhu, Zhanghao Hu, Wei Liu, Yulan He, Zheng Yuan, Lin Gui
Demonstrating Restraint
Some have claimed that the future development of powerful AI systems would enable the United States to shift the international balance of power dramatically in its favor. Such a feat may not be tec...
L. C. R. Patell, O. E. Guest
Agentic Adversarial QA for Improving Domain-Specific LLMs
Large Language Models (LLMs), despite extensive pretraining on broad internet corpora, often struggle to adapt effectively to specialized domains. There is growing interest in fine-tuning these mod...
Vincent Grari, Ciprian Tomoiaga, Sylvain Lamprier, Tatsunori Hashimoto, Marcin Detyniecki
Neurosymbolic Language Reasoning as Satisfiability Modulo Theory
Natural language understanding requires interleaving textual and logical reasoning, yet large language models often fail to perform such reasoning reliably. Existing neurosymbolic systems combine L...
Hyunseok Oh, Sam Stern, Youngki Lee, Matthai Philipose
OODBench: Out-of-Distribution Benchmark for Large Vision-Language Models
Existing Visual-Language Models (VLMs) have achieved significant progress by being trained on massive-scale datasets, typically under the assumption that data are independent and identically distri...
Ling Lin, Yang Bai, Heng Su, Congcong Zhu, Yaoxing Wang, Yang Zhou, Huazhu Fu, Jingrun Chen
Perceived Political Bias in LLMs Reduces Persuasive Abilities
Conversational AI has been proposed as a scalable way to correct public misconceptions and spread misinformation. Yet its effectiveness may depend on perceptions of its political neutrality. As LLM...
Matthew DiGiuseppe, Joshua Robison