Papers

Recent AI research papers with accessible summaries. Updated daily from arXiv, summarized for developers who don't read papers regularly.

1552 papers42 this month12 topics

All Evaluation 40 Training 34 Efficiency 33 Reasoning 30 Agents 27 Applications 22 Multimodal 18 Data 17 Safety 13 Architecture 11 Alignment 7 scaling 5

Jul 6 – Jul 12(21)

OpenCoF: Learning to Reason Through Video Generation

Jul 9, 2026

Xinyan Chen, Ziyu Guo, Renrui Zhang et al.

Video generation can be a reasoning mechanism: training models on diverse temporal reasoning tasks and adding explicit reasoning tokens improves their ability to solve logical problems by generating step-by-step visual explanations.

OpenCoF introduces a dataset and fine-tuned video model designed to teach AI systems to reason through generating sequences of video frames. Unlike text-based reasoning, this 'Chain-of-Frame' approach lets models unfold logical steps visually across time. The work shows that video models trained on diverse reasoning tasks with special reasoning tokens perform better at solving complex problems.

reasoningmultimodaltraining

Score Accuracy Along the Forward Diffusion Does Not Certify Numerical Stability in Diffusion Sampling

Jul 9, 2026

Yiwei Zhou

Training diffusion models with low forward-marginal error doesn't guarantee stable sampling—you need additional safeguards like denoiser projection to ensure numerical stability and convergence of sample moments.

This paper reveals a critical gap in diffusion model training: a score function can have tiny errors on average (as measured during training) yet produce numerically unstable sampling with diverging moments. The authors prove this theoretically and show that projecting learned denoisers onto known data bounds fixes the problem.

Jun 29 – Jul 5(36)

LACUNA: A Testbed for Evaluating Localization Precision for LLM Unlearning

Jul 2, 2026

Matteo Boglioni, Thibault Rousset, Siva Reddy et al.

Current unlearning methods are imprecise at targeting specific parameters where knowledge is stored, making them vulnerable to attacks that resurface the data—precise localization matters more than output-level performance.

LACUNA is a new benchmark for testing whether LLM unlearning methods actually erase sensitive data from model parameters or just hide it. The researchers inject fake personal information into specific weights of language models, then check if unlearning methods successfully target those exact parameters.

safetyevaluationtraining

Program-as-Weights: A Programming Paradigm for Fuzzy Functions

Jul 2, 2026

Wentao Zhang, Liliana Hotsko, Woojeong Kim et al.

Instead of calling large language models for every fuzzy task, you can compile a natural-language specification once into a tiny reusable neural artifact that runs locally and cheaply—shifting from per-input problem solving to one-time function compilation.

This paper introduces Program-as-Weights (PAW), a method to compile natural-language function specifications into small, locally-executable neural adapters. A 4B compiler generates parameter-efficient adapters that run on a lightweight 0.6B interpreter, matching the performance of much larger models while using 50x less memory and running efficiently on consumer hardware like MacBook M3.

Jun 22 – Jun 28(32)

Second-Order KKT Guarantees for Bregman ADMM in Nonconvex and Non-Lipschitz Optimization

Jun 26, 2026

Shuang Li, Zhihui Zhu, Qiuwei Li

Bregman ADMM provably avoids saddle points and finds second-order stationary solutions for nonconvex problems without Lipschitz gradient requirements, making it applicable to polynomial and tensor optimization problems where standard methods fail.

This paper analyzes Bregman ADMM, an optimization algorithm for nonconvex problems with linear constraints that don't require standard smoothness assumptions.

training

HPRO: Hierarchical Progressive Reward Optimization via Preference Extraction for Emotional Text-to-Speech

Jun 26, 2026

Sihang Nie, Xiaofen Xing, Rui Xing et al.

Separating content and emotion into distinct latent spaces during training prevents reward conflicts and enables better emotional control in TTS systems without sacrificing intelligibility.

This paper addresses emotional expressiveness in LLM-based text-to-speech by proposing HPRO, a hierarchical reward optimization framework that separates emotional and semantic information to avoid conflicting gradients, then progressively aligns rewards across frame, word, and sentence levels to improve emotional control while maintaining speech clarity.

training

Jun 15 – Jun 21(11)

UNIEGO: Proxies as Mediators for Unified Egocentric Video Representation Learning

Jun 18, 2026

Wenhao Chi, Arkaprava Sinha, Dominick Reilly et al.

Using proxy models as intermediaries between diverse teachers prevents conflicting gradients and enables learning richer egocentric representations from heterogeneous knowledge sources—achieving better results than naive multi-teacher distillation.

This paper introduces UNIEGO, a unified egocentric video encoder trained through a novel multi-teacher distillation framework.

multimodaltrainingarchitecture

Toward Calibrated Mixture-of-Experts Under Distribution Shift

Jun 18, 2026

Gina Wong, Drew Prinster, Suchi Saria et al.

Expert-level calibration alone isn't enough for soft-routed MoE models under distribution shift—you need to explicitly calibrate the routing mechanism's aggregate predictions to maintain trustworthy uncertainty estimates.

This paper studies how mixture-of-experts (MoE) models maintain calibrated predictions under distribution shift. The authors show that calibrating individual experts works for hard-routed models but fails for soft-routed ones, and propose an adversarial reweighting method to improve calibration across different routing mechanisms and data distributions.

Papers

Jul 6 – Jul 12(21)

OpenCoF: Learning to Reason Through Video Generation

Score Accuracy Along the Forward Diffusion Does Not Certify Numerical Stability in Diffusion Sampling

Jun 29 – Jul 5(36)

LACUNA: A Testbed for Evaluating Localization Precision for LLM Unlearning

Program-as-Weights: A Programming Paradigm for Fuzzy Functions

Jun 22 – Jun 28(32)

Second-Order KKT Guarantees for Bregman ADMM in Nonconvex and Non-Lipschitz Optimization

HPRO: Hierarchical Progressive Reward Optimization via Preference Extraction for Emotional Text-to-Speech

Jun 15 – Jun 21(11)

UNIEGO: Proxies as Mediators for Unified Egocentric Video Representation Learning

Toward Calibrated Mixture-of-Experts Under Distribution Shift

SLORR: Simple and Efficient In-Training Low-Rank Regularization

Super Weights in LLMs and the Failure of Selective Training

Do You Need a Frontier Model as a Citation Verifier? Benchmarking Rubric LLMs for Deep-Research Source Attribution

Secure Decentralized Federated Learning via Gossip and Virtual Voting

Multi-Modal, Multi-Environment Machine Teaching for Robust Reward Learning

UltraX: Refining Pre-Training Data at Scale with Adaptive Programmatic Editing

Co-LMLM: Continuous-Query Limited Memory Language Models

From Noisy Traces to Root Causes: Structural Trajectory Analysis and Causal Extraction for Agent Optimization

Selective Timestep Weighting and Advantage-Based Replay for Sample-Efficient Diffusion RLHF

Agon: Competitive Cross-Model RL with Implicit Rival Grading of Reasoning

How Data Shapes RoPE Frequency Usage: From Positional Scale Matching to Length Generalization

Max Out GRPO Signal: Adaptive Trace Prefix Control for Hard Reasoning Problems

MedPMC: A Systematic Framework for Scaling High-Fidelity Medical Multimodal Data for Foundation Models

PeTeR: Post-Training Robustification of Probabilistic Circuits

Hierarchical Acoustic-Semantic Modeling: Modality Separation and Semantic Coherence for Full-Duplex SLMs

GraphBU: MILP Instance Generation with Graph-Native Block Units

Bridging Physical Reasoning and Task Generalization via Visual Action Outcome Reasoning Alignment

Weak-to-Strong Generalization via Direct On-Policy Distillation

Interpretable Human-Label-Free Deep Learning for Real-Bogus Classification with Uncertainty Quantification

DemoPSD: Disagreement-Modulated Policy Self-Distillation

Beyond Adam: SOAP and Muon for Faster, Label-Efficient Training of Machine Learning Interatomic Potentials

Controllable Sim Agents with Behavior Latents

Visually Grounded Self-Reflection for Vision-Language Models via Reinforcement Learning

Learning to Move Before Learning to Do: Task-Agnostic pretraining for VLAs

Neuron-Aware Data Selection for Annotation-Free LLM Self-Distillation

Understanding the Robustness of Distributed Self-Supervised Learning Frameworks Against Non-IID Data

Automated grading of Linux/bash examinations using large language models: a four-level cognitive taxonomy approach

WorldSample: Closed-loop Real-robot RL with World Modelling

Neuron-Aware Active Few-Shot Learning for LLMs

LIME: Learning Intent-aware Camera Motion from Egocentric Video

DecompRL: Solving Harder Problems by Learning Modular Code Generation

Transformer Geometry Observatory TGO-II: Representational Similarity Observatory

Is One Layer Enough? Training A Single Transformer Layer Can Match Full-Parameter RL Training

Language-Critique Imitation Learning from Suboptimal Demonstrations

AutoMem: Automated Learning of Memory as a Cognitive Skill

The State-Prediction Separation Hypothesis

Right in the Right Way: LM Training with Verifiable Rewards and Human Demonstrations

Decision-Aware Training for Sample-Based Generative Models

Introspective Coupling: Self-Explanation Training Tracks Behavioral Change Despite Fixed Supervision

QVal: Cheaply Evaluating Dense Supervision Signals for Long-Horizon LLM Agents

Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs

Generative Skill Composition for LLM Agents

FedLAB: Traceable Semantic Codebooks for Federated Multimodal Graph Foundation Learning

Scalable Behaviour Cloning on Browser Using via Skill Distillation

Self-Study Reconsidered: The Hidden Fragility of Learning from Self-Generated QA

Radial Suppression Accelerates Algorithmic Generalization: A Geometric Analysis of Delayed Generalization

LeVo 2: Stable and Melodious Song Generation via Hierarchical Representation Modeling and Progressive Post-Training

One-Step Gradient Delay is Not a Barrier for Large-Scale Asynchronous Pipeline Parallel LLM Pretraining

Pessimism's Paradox: Conservative Offline Training Amplifies Reward Hacking During Online Adaptation in Reasoning Models

DOPD: Dual On-policy Distillation

Optimization Dynamics Imprint Semantic Specificity in Contrastive Embedding Norms

Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent

C$^{2}$R: Cross-sample Consistency Regularization Mitigates Feature Splitting and Absorption in Sparse Autoencoders

How Width and Data Shape Generalization Scaling Laws in Quadratic Neural Networks

DanceOPD: On-Policy Generative Field Distillation

Reinforcement Learning without Ground-Truth Solutions can Improve LLMs

Autoregressive Boltzmann Generators

Empowering GUI Agents via Autonomous Experience Exploration and Hindsight Experience Utilization for Task Planning

Generative Models on Analog Hardware with Dynamics

Designing Reward Signals for Portable Query Generation: A Case Study in Industrial Semantic Job Search

Simulation-based inference for rapid Bayesian parameter estimation in epidemiological models: a comparison with MCMC

Effective Covariance Dynamics in Solvable High-Dimensional GANs

The Geometry of Updates: Fisher Alignment at Vocabulary Scale

CARVE: Content-Aware Recurrent with Value Efficiency for Chunk-Parallel Linear Attention

Hierarchical Muon: Tiled Newton-Schulz Updates for Efficient Muon Optimization

Paved with True Intents: Intent-Aware Training Improves LLM Safety Classification Across Training Regimes

Learning Action Priors for Cross-embodiment Robot Manipulation