Papers

Recent AI research papers with accessible summaries. Updated daily from arXiv, summarized for developers who don't read papers regularly.

1552 papers39 this month12 topics

All Evaluation 40 Training 34 Efficiency 33 Reasoning 30 Agents 27 Applications 22 Multimodal 18 Data 17 Safety 13 Architecture 11 Alignment 7 scaling 5

Jul 6 – Jul 12(20)

SLORR: Simple and Efficient In-Training Low-Rank Regularization

Jul 9, 2026

David González-Martínez, Shiwei Liu

You can make models significantly more compressible during training with a simple regularizer that costs less than 1% extra compute and doesn't require changing your model architecture or doing expensive matrix decompositions.

SLORR is a training-time regularization method that makes neural networks easier to compress using low-rank factorization. Unlike existing approaches, it works directly on weight matrices without requiring expensive computations, architectural changes, or cached data, adding minimal training overhead while improving how well compressed models perform.

trainingefficiency

The Illusion of Equivalency: Statistical Characterization of Quantization Effects in LLMs

Jul 9, 2026

Baha Rababah, Cuneyt Gurcan Akcora, Carson K. Leung

Standard accuracy metrics mask real behavioral divergence in quantized models—you need decision-level metrics to catch when quantized and base models disagree, even when both maintain similar overall performance.

This paper reveals that quantization (compressing LLMs to lower bit-widths) preserves accuracy metrics like perplexity but causes hidden behavioral changes. The authors introduce a new metric called correctness agreement to detect when quantized models make different predictions than base models, and analyze how quantization distorts attention weights differently across model layers.

Jun 29 – Jul 5(24)

Program-as-Weights: A Programming Paradigm for Fuzzy Functions

Jul 2, 2026

Wentao Zhang, Liliana Hotsko, Woojeong Kim et al.

Instead of calling large language models for every fuzzy task, you can compile a natural-language specification once into a tiny reusable neural artifact that runs locally and cheaply—shifting from per-input problem solving to one-time function compilation.

This paper introduces Program-as-Weights (PAW), a method to compile natural-language function specifications into small, locally-executable neural adapters. A 4B compiler generates parameter-efficient adapters that run on a lightweight 0.6B interpreter, matching the performance of much larger models while using 50x less memory and running efficiently on consumer hardware like MacBook M3.

efficiencytrainingapplications

Online Safety Monitoring for LLMs

Jul 2, 2026

Mona Schirmer, Metod Jazbec, Alexander Timans et al.

Simple threshold-based monitoring with statistical risk control can effectively catch unsafe LLM outputs in production without requiring complex sequential testing methods.

This paper presents a real-time safety monitoring system for LLMs that uses a verifier model to detect unsafe outputs at deployment time. The approach calibrates decision thresholds using risk control methods and proves competitive with more complex alternatives on reasoning and adversarial datasets.

Jun 22 – Jun 28(16)

Parameter Efficient Hybrid Transformer (PEHT) for Network Traffic Prediction via Dynamic Urban Congestion Integration

Jun 26, 2026

Abdolazim Rezaei, Mehdi Sookhak, Mahboobeh Haghparast

By combining parameter-efficient fine-tuning (LoRA) with multimodal fusion of urban context, you can build accurate traffic prediction models that use fewer trainable parameters without sacrificing performance.

This paper presents PEHT, a traffic prediction model that combines Transformers with urban mobility data to forecast cellular network demand. It uses LoRA to reduce parameters while a multimodal fusion strategy integrates congestion and mobility information, achieving better accuracy than existing methods on real telecom data.

efficiencymultimodalapplications

Learning Topology-Aware Representations via Test-Time Adaptation for Anomaly Segmentation

Jun 26, 2026

Ali Zia, Usman Ali, Abdul Rehman et al.

Using topological features (shape and connectivity patterns) during test-time adaptation significantly improves anomaly segmentation by preserving structural coherence that pixel-level methods miss, achieving 15% F1 improvement on standard benchmarks.

This paper introduces TopoTTA, a test-time adaptation framework for anomaly segmentation that uses topological data analysis (persistent homology) to preserve structural consistency in defect detection.

Jun 15 – Jun 21(22)

Toward Calibrated Mixture-of-Experts Under Distribution Shift

Jun 18, 2026

Gina Wong, Drew Prinster, Suchi Saria et al.

Expert-level calibration alone isn't enough for soft-routed MoE models under distribution shift—you need to explicitly calibrate the routing mechanism's aggregate predictions to maintain trustworthy uncertainty estimates.

This paper studies how mixture-of-experts (MoE) models maintain calibrated predictions under distribution shift. The authors show that calibrating individual experts works for hard-routed models but fails for soft-routed ones, and propose an adversarial reweighting method to improve calibration across different routing mechanisms and data distributions.

trainingevaluationefficiency

Multi-Task Bayesian In-Context Learning

Jun 18, 2026

Qingyang Zhu, Eric Karl Oermann, Kyunghyun Cho

You can train a transformer to act as a fast Bayesian predictor by treating prior information as part of the input context, achieving oracle-level accuracy orders of magnitude faster than traditional Bayesian methods.

This paper presents a method for training transformers to perform Bayesian inference quickly by learning from examples of prior distributions and target datasets. Instead of computing exact Bayesian predictions (which is slow), the model learns to map sequences of prior information and data directly to predictions, enabling fast uncertainty-aware inference that adapts to new priors at test time.

Jun 8 – Jun 14(18)

Persona-Pruner: Sculpting Lightweight Models for Role-Playing

Jun 12, 2026

Jinsu Kim, Jihoon Tack, Noah Lee et al.

You can shrink language models for specific character personas by 50%+ while keeping 93.8% of role-playing quality, making multi-NPC applications practical without sacrificing character consistency.

This paper introduces Persona-Pruner, a technique that creates lightweight language models optimized for specific character roles by identifying and preserving only the persona-relevant parts of a full model. Unlike standard pruning that indiscriminately removes parameters, this method maintains role-playing quality while reducing computational cost—useful for applications with many NPCs.

efficiencytrainingapplications

AdaSR: Adaptive Streaming Reasoning with Hierarchical Relative Policy Optimization

Jun 12, 2026

Junlong Tong, Wenqi Xu, Yingqi Fan et al.

Models can now learn to reason efficiently during streaming input instead of only after seeing everything, using fine-grained reward signals that separately optimize early thinking and final deliberation phases.

AdaSR enables language models to reason incrementally as data streams in (like audio or video), rather than waiting for complete input. It uses a new training method called Hierarchical Relative Policy Optimization to teach models when to think and how much computation to spend at each stage, balancing accuracy, speed, and efficiency.

Papers

Jul 6 – Jul 12(20)

SLORR: Simple and Efficient In-Training Low-Rank Regularization

The Illusion of Equivalency: Statistical Characterization of Quantization Effects in LLMs

Jun 29 – Jul 5(24)

Program-as-Weights: A Programming Paradigm for Fuzzy Functions

Online Safety Monitoring for LLMs

Jun 22 – Jun 28(16)

Parameter Efficient Hybrid Transformer (PEHT) for Network Traffic Prediction via Dynamic Urban Congestion Integration

Learning Topology-Aware Representations via Test-Time Adaptation for Anomaly Segmentation

Jun 15 – Jun 21(22)

Toward Calibrated Mixture-of-Experts Under Distribution Shift

Multi-Task Bayesian In-Context Learning

Jun 8 – Jun 14(18)

Persona-Pruner: Sculpting Lightweight Models for Role-Playing

AdaSR: Adaptive Streaming Reasoning with Hierarchical Relative Policy Optimization

Super Weights in LLMs and the Failure of Selective Training

Latent Memory Palace: Reasoning for Control as Autoregressive Variational Inference

LTM: Large-scale Terrain Model for Wildfire-prone Landscapes

A Practical Investigation of Training-free Relaxed Speculative Decoding

Resample or Reroute? Budget-Aware Test-Time Model Selection for Large Language Models

EdgeRefine: Privacy-Utility Balance for Graphs via Jaccard Sampling under Edge Differential Privacy

Secure Decentralized Federated Learning via Gossip and Virtual Voting

UltraX: Refining Pre-Training Data at Scale with Adaptive Programmatic Editing

Co-LMLM: Continuous-Query Limited Memory Language Models

The Key to Going Linear: Analysis-Driven Transformer Linearization

Breaking Database Lock-in: Agentic Regeneration of High Performance Storage Readers for Database Bypass

Selective Timestep Weighting and Advantage-Based Replay for Sample-Efficient Diffusion RLHF

PeTeR: Post-Training Robustification of Probabilistic Circuits

ELSA3D: Elastic Semantic Anchoring for Unified 3D Understanding and Generation

Graph Convolutional Attention: A Spectral Perspective on Graph Denoising and Diffusion

DepthWeave-KV: Token-Adaptive Cross-Layer Residual Factorization for Long-Context KV Cache Compression

FreqDepthKV: Frequency-Guided Depth Sharing for Robust KV Cache Compression in Long-Context LLM Inference

Weak-to-Strong Generalization via Direct On-Policy Distillation

Beyond Adam: SOAP and Muon for Faster, Label-Efficient Training of Machine Learning Interatomic Potentials

Combating Textual Noise and Redundancy: Entropy-Aware Dense Visual Token Pruning

Learning to Move Before Learning to Do: Task-Agnostic pretraining for VLAs

OrbitQuant: Data-Agnostic Quantization for Image and Video Diffusion Transformers

Neuron-Aware Data Selection for Annotation-Free LLM Self-Distillation

Understanding the Robustness of Distributed Self-Supervised Learning Frameworks Against Non-IID Data

Optimal Stabilizer Testing and Learning with Limited Quantum Memory

WorldSample: Closed-loop Real-robot RL with World Modelling

QFedAgent: Quantum-Enhanced Personalized Federated Learning for Multi-Agent Activity Recognition

Neuron-Aware Active Few-Shot Learning for LLMs

Fast Multi-dimensional Refusal Subspaces via RFM-AGOP

WattGPU: Predicting Inference Power and Latency on Unseen GPUs and LLMs

VisionAId: An Offline-First Multimodal Android Assistant for People with Visual Impairment, Featuring Personalized Object Retrieval

Is One Layer Enough? Training A Single Transformer Layer Can Match Full-Parameter RL Training

The State-Prediction Separation Hypothesis

TiRex-2: Generalizing TiRex to Multivariate Data and Streaming

QuasiMoTTo: Quasi-Monte Carlo Test-Time Scaling

FedLAB: Traceable Semantic Codebooks for Federated Multimodal Graph Foundation Learning

Radial Suppression Accelerates Algorithmic Generalization: A Geometric Analysis of Delayed Generalization

One-Step Gradient Delay is Not a Barrier for Large-Scale Asynchronous Pipeline Parallel LLM Pretraining

DOPD: Dual On-policy Distillation

C$^{2}$R: Cross-sample Consistency Regularization Mitigates Feature Splitting and Absorption in Sparse Autoencoders

Beyond the Hard Budget: Sparsity Regularizers for More Interpretable Top-k Sparse Autoencoders

Generative Models on Analog Hardware with Dynamics

Simulation-based inference for rapid Bayesian parameter estimation in epidemiological models: a comparison with MCMC

How Good Can Linear Models Be for Time-Series Forecasting?

Ribbon: Scalable Approximation and Robust Uncertainty Quantification

E-TTS: A New Embodied Test-Time Scaling Framework for Robotic Manipulation

The Geometry of Updates: Fisher Alignment at Vocabulary Scale

CARVE: Content-Aware Recurrent with Value Efficiency for Chunk-Parallel Linear Attention

Hierarchical Muon: Tiled Newton-Schulz Updates for Efficient Muon Optimization

Learning Action Priors for Cross-embodiment Robot Manipulation

SHERLOC: Structured Diagnostic Localization for Code Repair Agents

Tapered Language Models

Muown Implicitly Performs Angular Step-size Decay

Diffusion Models Adapt to Low-Dimensional Structure Under Flexible Coefficient Choices

Execution-State Capsules: Graph-Bound Execution-State Checkpoint and Restore for Low-Latency, Small-Batch, On-Device Physical-AI Serving

FlowEdit: Associative Memory for Lifelong Pronunciation Adaptation in Flow-Matching TTS

Marginal Advantage Accumulation for Memory-Driven Agent Self-Evolution

UltraQuant: 4-bit KV Caching for Context-Heavy Agents

Repurposing a Speech Classifier for Guided Diffusion-Based Speech Generation

Evolutionary Two-Stage Hyperparameter Optimization Strategies for Physics-Informed Neural Networks

HEPTv2: End-to-End Efficient Point Transformer for Charged Particle Reconstruction

On the Redundancy of Timestep Embeddings in Diffusion Models

UBP2: Uncertainty-Balanced Preference Planning for Efficient Preference-based Reinforcement Learning

Optimal scenario design for climate emulation