Research Papers

arXiv · cs.AI, cs.LG, cs.CL · 20 papers

The Value Axis: Language Models Encode Whether They're on the Right Track

Nick Jiang, Isaac Kauvar, Jack Lindsey

We investigate whether language models internally track the value of their current trajectory, defined as the likelihood that their ongoing strategy will achieve their goals. Using synthetic, in-context reinforcement learning data, we construct a "va

Context-Aware RL for Agentic and Multimodal LLMs

Peiyang Xu, Bangzheng Li, Sijia Liu +4

Large language models (LLMs) often fail when answering requires identifying a small but decisive piece of evidence within a long or complex context, such as a single line in a tool trace or a subtle detail in an image. We propose ContextRL, a context

Exact Posterior Score Estimation for Solving Linear Inverse Problems

Abbas Mammadov, Ozgur Kara, Kaan Oktay +5

Diffusion and flow-based models learn powerful data priors by training a denoiser to reverse Gaussian corruption. To use this prior to solve a linear inverse problem, one needs to sample from the posterior, but the score that the prior provides is th

cs.LGcs.CVstat.ML

Geometric Action Model for Robot Policy Learning

Jisang Han, Seonghu Jeon, Jaewoo Jung +7

Generalist robot policies must follow user instructions while reasoning about how objects, cameras, and robot actions interact in the 3D physical world. Recent vision-language-action models (VLAs) and video world-action models (WAMs) inherit strong s

cs.ROcs.CVcs.LG

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

Tongyan Fang, Siyuan Huang, Naiyu Fang +6

When pretrained VLA policies are fine-tuned through online RL, each rollout episode produces only a single binary outcome (success or failure), yet the actor update requires per-transition supervision. Existing approaches commonly reduce this sparse

Benchmarking LLM Agents on Meta-Analysis Articles from Nature Portfolio

Anzhe Xie, Weihang Su, Yujia Zhou +2

Meta-analysis is a demanding form of evidence synthesis that combines literature retrieval, PI/ECO-guided study selection, and statistical aggregation. Its structured, verifiable workflow makes it an ideal substrate for evaluating systematic scientif

The Importance of Phase in Neural Representations: An Internal Oppenheim-Lim Test of Image Classifiers

Alper Yıldırım

Oppenheim and Lim (1981) showed that natural images stay recognizable when reconstructed from their Fourier phase alone, while the magnitude carries little of their identity. We ask whether trained image classifiers reproduce this asymmetry inside th

cs.CVcs.AIcs.LG

Your Privacy My Cloak: Backdoor Attacks on Differentially Private Federated Learning

Xiaolin Li, Ning Wang, Ninghui Li +1

Prior research suggests that differential privacy (DP) inherently enhances the robustness of federated learning (FL) against backdoor attacks. In this paper, we challenge this assumption. Through an empirical analysis of two baseline attack strategie

KVEraser: Learning to Steer KV Cache for Efficient Localized Context Erasing

Mufei Li, Shikun Liu, Dongqi Fu +5

Post-hoc context erasing over the KV cache is challenging because a local edit has a global consequence: once a span has been processed, its influence propagates into the cached states of all subsequent tokens. This issue arises naturally in long-con

DEEPRUBRIC: Evidence-Tree Rubric Supervision for Efficient Reinforcement Learning of Deep Research Agents

Minghang Zhu, Chuyang Wei, Junhao Xu +3

Deep research agents synthesize long-form reports by searching and reasoning over retrieved evidence. Reinforcement learning with rubric-based rewards improves these agents by optimizing them against checkable criteria that translate report quality i

HAMON: Passive Optical Sequence Mixing for Long-Horizon Forecasting

Alper Yıldırım

Simple linear and frequency-domain models remain surprisingly competitive in long-horizon time-series forecasting, and recent mechanistic evidence suggests that standard forecasting benchmarks may not require the dense superposed representations that

cs.LGcs.AIcs.AR

ExpRL: Exploratory RL for LLM Mid-Training

Violet Xiang, Amrith Setlur, Chase Blagden +2

Sparse reward reinforcement learning (RL) has become a standard tool for improving LLM reasoning, but its success depends critically on the coverage present in the base model. In practice, models are often primed for RL through \emph{mid-training} on

Learning the Geometry of Data: A Mathematical Review of Shape Space Analysis

Gary P. T. Choi, Khanh Dao Duc, Shira Faigenbaum-Golovin +6

A central objective of machine learning is to identify structure and patterns in data. Advances in data acquisition have increasingly produced datasets whose observations possess rich geometric form, giving rise to shape spaces that encode variabilit

math.STcs.LGstat.ML

FusionRS: A Large-Scale RGB-Infrared Remote Sensing Dataset for Dual-Modal Vision-Language Foundation Models

Jiaju Han, Ben Zhang, Xuemeng Sun +6

Remote sensing vision-language models have advanced Earth observation understanding, but most existing work remains centered on RGB imagery, leaving the complementary information in infrared data underexplored. Infrared images provide distinctive cue

TokenPilot: Cache-Efficient Context Management for LLM Agents

Buqiang Xu, Zirui Xue, Dianmou Chen +12

As LLM agents are deployed in long-horizon sessions, context accumulation drives up inference costs. Existing approaches utilize text pruning or dynamic memory eviction to minimize token footprints; however, their unconstrained sequence mutations alt

cs.CLcs.AIcs.LG

Filtered Conformal Ellipsoids for Graph-Native Time Series

Yannick Limmer

Joint prediction sets for multivariate time series should control a single event while adapting to cross-coordinate dependence. We study filtered conformal ellipsoids: a frozen state-space filter emits a one-step predictive mean and covariance, and s

cs.LGmath.STstat.ML

Exploding and vanishing gradients in deep neural networks: the effect of residual connections

Vivek S Borkar

The well known phenomenon of exploding and vanishing gradients in deep neural networks is analyzed using multiplicative ergodic theory. The effect of adding a residual connection is explained in this context. Specifically, a characterization of Liapu

ROVE: Unlocking Human Interventions for Humanoid Manipulation via Reinforcement Learning

Wei Xiao, Weiliang Tang, Yuying Ge +4

Human interventions provide crucial corrective signals for post-training Vision-Language-Action (VLA) models. However, enabling seamless humanoid interventions is a formidable systems challenge due to complex whole-body kinematics and dexterous-hand

From Tokens to Policy: Causal and Interpretable Heterogeneous Treatment Effects Identification

Riccardo Cadei, Frank Otchere, Nyasha Tirivayi +3

Heterogeneous Treatment Effect (HTE) identification is crucial to explain the impact of an intervention and optimize our policies accordingly. Existing approaches trade expressivity for interpretability, but, if some active heterogeneity drivers are

TuneJury: An Open Metric for Improving Music Generation Preference Alignment

Yonghyun Kim, Junwon Lee, Haiwen Xia +5

We introduce TuneJury, an open, instance-level pairwise reward model for text-to-music that predicts a music preference score from a text prompt and an audio clip. The released checkpoint is trained on publicly available human-preference labels cover

cs.SDcs.AIcs.LG