Papers
Research papers from arXiv and related sources
Adaptive Decoding via Test-Time Policy Learning for Self-Improving Generation
Decoding strategies largely determine the quality of Large Language Model (LLM) outputs, yet widely used heuristics such as greedy or fixed temperature/top-p decoding are static and often task-agno...
Asmita Bhardwaj, Yuya Jeremy Ong, Eelaaf Zahid, Basel Shbita
The Impact of Corporate AI Washing on Farmers' Digital Financial Behavior Response -- An Analysis from the Perspective of Digital Financial Exclusion
In the context of the rapid development of digital finance, some financial technology companies exhibit the phenomenon of "AI washing," where they overstate their AI capabilities while underinvesti...
Li Wenxiu, Wen Zhanjie, Xia Jiechang, Guo Jingqiao
The Spillover Effects of Peer AI Rinsing on Corporate Green Innovation
At a time when the phenomenon of 'AI washing' is quietly spreading, an increasing number of enterprises are using the label of artificial intelligence merely as a cosmetic embellishment in their an...
Li Wenxiu, Wen Zhanjie, Xia Jiechang, Guo Jingqiao
Statistical Testing Framework for Clustering Pipelines by Selective Inference
A data analysis pipeline is a structured sequence of steps that transforms raw data into meaningful insights by integrating multiple analysis algorithms.In many practical applications, analytical f...
Yugo Miyata, Tomohiro Shiraishi, Shunichi Nishino, Ichiro Takeuchi
TARo: Token-level Adaptive Routing for LLM Test-time Alignment
Large language models (LLMs) exhibit strong reasoning capabilities but typically require expensive post-training to reach high performance. Recent test-time alignment methods offer a lightweight al...
Arushi Rai, Qiang Zhang, Hanqing Zeng, Yunkai Zhang, Dipesh Tamboli, Xiangjun Fan, Zhuokai Zhao
Where are the Hidden Gems? Applying Transformer Models for Design Discussion Detection
Design decisions are at the core of software engineering and appear in Q\&A forums, mailing lists, pull requests, issue trackers, and commit messages. Design discussions spanning a project's histor...
Lawrence Arkoh, Daniel Feitosa, Wesley K. G. Assunção
Reflection in the Dark: Exposing and Escaping the Black Box in Reflective Prompt Optimization
Automatic prompt optimization (APO) has emerged as a powerful paradigm for improving LLM performance without manual prompt engineering. Reflective APO methods such as GEPA iteratively refine prompt...
Shiyan Liu, Qifeng Xia, Qiyun Xia, Yisheng Liu, Xinyu Yu, Rui Qu
A Non-parametric Method for the Inference of Halo Occupation Distributions
The galaxy-halo connection traces processes by which galaxies form and evolve. The halo occupation distribution (HOD) describes the relationship between galaxies and their host dark matter haloes. ...
Jacob Kennedy, Eric Gawiser, Kartheik G. Iyer, L. Y. Aaron Yung
Lensing in the Blue III: Weak Lensing Shape Catalogs of 30 Merging Galaxy Clusters
We present the weak gravitational lensing dataset from the Super-pressure Balloon-Borne Imaging Telescope (SuperBIT), which imaged 30 galaxy clusters during its 45 night flight in April to May 2023...
Sayan Saha, Jacqueline E. McCleary, Spencer W. Everett, Maya Amit, Georgios N. Vassilakis, Emaad ...
TENSURE: Fuzzing Sparse Tensor Compilers (Registered Report)
Sparse Tensor Compilers (STCs) have emerged as critical infrastructure for optimizing high-dimensional data analytics and machine learning workloads. The STCs must synthesize complex, irregular con...
Kabilan Mahathevan, Yining Zhang, Muhammad Ali Gulzar, Kirshanthan Sundararajah
Contact Status Recognition and Slip Detection with a Bio-inspired Tactile Hand
Stable and reliable grasp is critical to robotic manipulations especially for fragile and glazed objects, where the grasp force requires precise control as too large force possibly damages the obje...
Chengxiao He, Wenhui Yang, Hongliang Zhao, Jiacheng Lv, Yuzhe Shao, Longhui Qin
Multi-material Direct Ink Writing and Embroidery for Stretchable Wearable Sensors
The development of wearable sensing systems for sports performance tracking, rehabilitation, and injury prevention has driven growing demand for smart garments that combine comfort, durability, and...
Lukas Cha, Ryman Hashem, Ria Prakash, Tanguy Declety, Wenze Zhang, Liang He
Interpretability without actionability: mechanistic methods cannot correct language model errors despite near-perfect internal representations
Language models encode task-relevant knowledge in internal representations that far exceeds their output performance, but whether mechanistic interpretability methods can bridge this knowledge-acti...
Sanjay Basu, Sadiq Y. Patel, Parth Sheth, Bhairavi Muralidharan, Namrata Elamaran, Aakriti Kinra,...
PeriphAR: Fast and Accurate Real-World Object Selection with Peripheral Augmented Reality Displays
Gaze-based selection in XR requires visual confirmation due to eye-tracking limitations and target ambiguity in 3D contexts. Current designs for wide-FOV displays use world-locked, central overlays...
Yutong Ren, Arnav Reddy, Michael Nebeling
Synthetic Data, Information, and Prior Knowledge: Why Synthetic Data Augmentation to Boost Sample Doesn't Work for Statistical Inference
The use of synthetic data to deidentify data and to improve predictive models is well-attested to. The augmentation of datasets using synthetically generated data is an alluring proposition: in the...
Reid Dale, Jordan Rodu, Mike Baiocchi
VISTA: Validation-Guided Integration of Spatial and Temporal Foundation Models with Anatomical Decoding for Rare-Pathology VCE Event Detection
Capsule endoscopy event detection is challenging because diagnostically relevant findings are sparse, visually heterogeneous, and embedded in long, noisy video streams, while evaluation is performe...
Bo-Cheng Qiu, Yu-Fan Lin, Yu-Zhe Pien, Chia-Ming Lee, Fu-En Yang, Yu-Chiang Frank Wang, Chih-Chun...
Can LLMs Reason Like Automated Theorem Provers for Rust Verification? VCoT-Bench: Evaluating via Verification Chain of Thought
As Large Language Models (LLMs) increasingly assist secure software development, their ability to meet the rigorous demands of Rust program verification remains unclear. Existing evaluations treat ...
Zichen Xie, Wenxi Wang
FaithSteer-BENCH: A Deployment-Aligned Stress-Testing Benchmark for Inference-Time Steering
Inference-time steering is widely regarded as a lightweight and parameter-free mechanism for controlling large language model (LLM) behavior, and prior work has often suggested that simple activati...
Zikang Ding, Qiying Hu, Yi Zhang, Hongji Li, Junchi Yao, Hongbo Liu, Lijie Hu
Sub-Yield Dynamics in Yield-Stress Materials
The mechanical response of yield-stress materials below the yield point remains a subject of debate. Two of the most widely used constitutive models for these materials offer fundamentally conflict...
Alice Woodbridge, Kasra Amini, Fredrik Lundell, Outi Tammisola, Anne Juel, Robert J. Poole, Cláud...
Unified Spatio-Temporal Token Scoring for Efficient Video VLMs
Token pruning is essential for enhancing the computational efficiency of vision-language models (VLMs), particularly for video-based tasks where temporal redundancy is prevalent. Prior approaches t...
Jianrui Zhang, Yue Yang, Rohun Tripathi, Winson Han, Ranjay Krishna, Christopher Clark, Yong Jae ...