Research Papers
arXiv · cs.AI, cs.LG, cs.CL · 20 papers
The Value Axis: Language Models Encode Whether They're on the Right Track
Nick Jiang, Isaac Kauvar, Jack Lindsey
We investigate whether language models internally track the value of their current trajectory, defined as the likelihood that their ongoing strategy will achieve their goals. Using synthetic, in-context reinforcement learning data, we construct a "va
Context-Aware RL for Agentic and Multimodal LLMs
Peiyang Xu, Bangzheng Li, Sijia Liu +4
Large language models (LLMs) often fail when answering requires identifying a small but decisive piece of evidence within a long or complex context, such as a single line in a tool trace or a subtle detail in an image. We propose ContextRL, a context
Exact Posterior Score Estimation for Solving Linear Inverse Problems
Abbas Mammadov, Ozgur Kara, Kaan Oktay +5
Diffusion and flow-based models learn powerful data priors by training a denoiser to reverse Gaussian corruption. To use this prior to solve a linear inverse problem, one needs to sample from the posterior, but the score that the prior provides is th
Geometric Action Model for Robot Policy Learning
Jisang Han, Seonghu Jeon, Jaewoo Jung +7
Generalist robot policies must follow user instructions while reasoning about how objects, cameras, and robot actions interact in the 3D physical world. Recent vision-language-action models (VLAs) and video world-action models (WAMs) inherit strong s
Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes
Tongyan Fang, Siyuan Huang, Naiyu Fang +6
When pretrained VLA policies are fine-tuned through online RL, each rollout episode produces only a single binary outcome (success or failure), yet the actor update requires per-transition supervision. Existing approaches commonly reduce this sparse
Benchmarking LLM Agents on Meta-Analysis Articles from Nature Portfolio
Anzhe Xie, Weihang Su, Yujia Zhou +2
Meta-analysis is a demanding form of evidence synthesis that combines literature retrieval, PI/ECO-guided study selection, and statistical aggregation. Its structured, verifiable workflow makes it an ideal substrate for evaluating systematic scientif
The Importance of Phase in Neural Representations: An Internal Oppenheim-Lim Test of Image Classifiers
Alper Yıldırım
Oppenheim and Lim (1981) showed that natural images stay recognizable when reconstructed from their Fourier phase alone, while the magnitude carries little of their identity. We ask whether trained image classifiers reproduce this asymmetry inside th
Your Privacy My Cloak: Backdoor Attacks on Differentially Private Federated Learning
Xiaolin Li, Ning Wang, Ninghui Li +1
Prior research suggests that differential privacy (DP) inherently enhances the robustness of federated learning (FL) against backdoor attacks. In this paper, we challenge this assumption. Through an empirical analysis of two baseline attack strategie
KVEraser: Learning to Steer KV Cache for Efficient Localized Context Erasing
Mufei Li, Shikun Liu, Dongqi Fu +5
Post-hoc context erasing over the KV cache is challenging because a local edit has a global consequence: once a span has been processed, its influence propagates into the cached states of all subsequent tokens. This issue arises naturally in long-con
DEEPRUBRIC: Evidence-Tree Rubric Supervision for Efficient Reinforcement Learning of Deep Research Agents
Minghang Zhu, Chuyang Wei, Junhao Xu +3
Deep research agents synthesize long-form reports by searching and reasoning over retrieved evidence. Reinforcement learning with rubric-based rewards improves these agents by optimizing them against checkable criteria that translate report quality i
HAMON: Passive Optical Sequence Mixing for Long-Horizon Forecasting
Alper Yıldırım
Simple linear and frequency-domain models remain surprisingly competitive in long-horizon time-series forecasting, and recent mechanistic evidence suggests that standard forecasting benchmarks may not require the dense superposed representations that
ExpRL: Exploratory RL for LLM Mid-Training
Violet Xiang, Amrith Setlur, Chase Blagden +2
Sparse reward reinforcement learning (RL) has become a standard tool for improving LLM reasoning, but its success depends critically on the coverage present in the base model. In practice, models are often primed for RL through \emph{mid-training} on
Learning the Geometry of Data: A Mathematical Review of Shape Space Analysis
Gary P. T. Choi, Khanh Dao Duc, Shira Faigenbaum-Golovin +6
A central objective of machine learning is to identify structure and patterns in data. Advances in data acquisition have increasingly produced datasets whose observations possess rich geometric form, giving rise to shape spaces that encode variabilit
FusionRS: A Large-Scale RGB-Infrared Remote Sensing Dataset for Dual-Modal Vision-Language Foundation Models
Jiaju Han, Ben Zhang, Xuemeng Sun +6
Remote sensing vision-language models have advanced Earth observation understanding, but most existing work remains centered on RGB imagery, leaving the complementary information in infrared data underexplored. Infrared images provide distinctive cue
TokenPilot: Cache-Efficient Context Management for LLM Agents
Buqiang Xu, Zirui Xue, Dianmou Chen +12
As LLM agents are deployed in long-horizon sessions, context accumulation drives up inference costs. Existing approaches utilize text pruning or dynamic memory eviction to minimize token footprints; however, their unconstrained sequence mutations alt
Filtered Conformal Ellipsoids for Graph-Native Time Series
Yannick Limmer
Joint prediction sets for multivariate time series should control a single event while adapting to cross-coordinate dependence. We study filtered conformal ellipsoids: a frozen state-space filter emits a one-step predictive mean and covariance, and s
Exploding and vanishing gradients in deep neural networks: the effect of residual connections
Vivek S Borkar
The well known phenomenon of exploding and vanishing gradients in deep neural networks is analyzed using multiplicative ergodic theory. The effect of adding a residual connection is explained in this context. Specifically, a characterization of Liapu
ROVE: Unlocking Human Interventions for Humanoid Manipulation via Reinforcement Learning
Wei Xiao, Weiliang Tang, Yuying Ge +4
Human interventions provide crucial corrective signals for post-training Vision-Language-Action (VLA) models. However, enabling seamless humanoid interventions is a formidable systems challenge due to complex whole-body kinematics and dexterous-hand
From Tokens to Policy: Causal and Interpretable Heterogeneous Treatment Effects Identification
Riccardo Cadei, Frank Otchere, Nyasha Tirivayi +3
Heterogeneous Treatment Effect (HTE) identification is crucial to explain the impact of an intervention and optimize our policies accordingly. Existing approaches trade expressivity for interpretability, but, if some active heterogeneity drivers are
TuneJury: An Open Metric for Improving Music Generation Preference Alignment
Yonghyun Kim, Junwon Lee, Haiwen Xia +5
We introduce TuneJury, an open, instance-level pairwise reward model for text-to-music that predicts a music preference score from a text prompt and an audio clip. The released checkpoint is trained on publicly available human-preference labels cover