Papers

Recent AI research papers with accessible summaries. Updated daily from arXiv, summarized for developers who don't read papers regularly.

1552 papers29 this month12 topics

All Evaluation 40 Training 34 Efficiency 33 Reasoning 30 Agents 27 Applications 22 Multimodal 18 Data 17 Safety 13 Architecture 11 Alignment 7 scaling 5

Jul 6 – Jul 12(15)

MulTTiPop: A Multitrack Transcription Dataset for Pop Music

Jul 9, 2026

Nathan Pruyne, Benjamin Stoler, William Chen et al.

Automatic music transcription models still struggle with real-world pop music—the best model only achieves 38% Onset F1—suggesting this dataset will be valuable for developing better transcription systems.

MulTTiPop is a dataset of 572 pop music segments (3.5 hours) paired with multitrack MIDI transcriptions, spanning from the 1930s to 2000s. The authors created it by matching audio from existing datasets, manually aligning beats, and using tempo warping. They benchmark state-of-the-art transcription models and show significant room for improvement.

dataevaluationapplications

Using AI-based Learning Assistants in Higher Education: A Large-Scale Descriptive Analysis

Jul 9, 2026

Kristina Schaaff, Quintus Stierstorfer, Valerie Heckel

Large-scale log data shows AI learning assistants are already integrated into student routines, but usage varies substantially across demographics and study contexts—critical insights for designing inclusive educational AI.

This study analyzes real usage data from 77,543 students using Syntea, an AI learning assistant, to understand how different groups actually use educational chatbots. Unlike previous small surveys, this large-scale analysis reveals that usage patterns vary significantly by gender, age, study program, and other factors—providing concrete evidence for improving AI tutoring tools.

Jun 29 – Jul 5(16)

Program-as-Weights: A Programming Paradigm for Fuzzy Functions

Jul 2, 2026

Wentao Zhang, Liliana Hotsko, Woojeong Kim et al.

Instead of calling large language models for every fuzzy task, you can compile a natural-language specification once into a tiny reusable neural artifact that runs locally and cheaply—shifting from per-input problem solving to one-time function compilation.

This paper introduces Program-as-Weights (PAW), a method to compile natural-language function specifications into small, locally-executable neural adapters. A 4B compiler generates parameter-efficient adapters that run on a lightweight 0.6B interpreter, matching the performance of much larger models while using 50x less memory and running efficiently on consumer hardware like MacBook M3.

efficiencytrainingapplications

Reasoning LLM Improves Speaker Recognition in Long-form TV Dramas

Jul 2, 2026

Yuxuan Li, Lingxi Xie, Xinyue Huo et al.

Reasoning models can improve speaker identification in video by combining multiple modalities and contextual evidence, outperforming traditional audio-only approaches on challenging cases.

This paper tackles speaker recognition in long-form TV dramas by introducing DramaSR-532K, a large benchmark with 532K annotated dialogue lines, and DramaSR-LRM, a reasoning-based approach that combines audio, text, and visual information to accurately identify which character is speaking. The method works especially well on short utterances where voice alone isn't reliable.

Jun 22 – Jun 28(24)

Agentic Hardware Design as Repository-Level Code Evolution

Jun 26, 2026

Cunxi Yu, Chenhui Deng, Nathaniel Pinckney et al.

Hardware design can be automated using agentic AI that evolves code repositories with built-in validation and state management, though current benchmarks don't capture the full complexity of production chip design.

HORIZON is an AI agent framework that automatically designs hardware by treating it as code evolution in a git repository. The system uses a Markdown specification to guide an agent loop that modifies Verilog code, tracks changes through git operations, and validates designs against acceptance criteria.

agentsarchitectureapplications

Parameter Efficient Hybrid Transformer (PEHT) for Network Traffic Prediction via Dynamic Urban Congestion Integration

Jun 26, 2026

Abdolazim Rezaei, Mehdi Sookhak, Mahboobeh Haghparast

By combining parameter-efficient fine-tuning (LoRA) with multimodal fusion of urban context, you can build accurate traffic prediction models that use fewer trainable parameters without sacrificing performance.

This paper presents PEHT, a traffic prediction model that combines Transformers with urban mobility data to forecast cellular network demand. It uses LoRA to reduce parameters while a multimodal fusion strategy integrates congestion and mobility information, achieving better accuracy than existing methods on real telecom data.

Jun 15 – Jun 21(15)

Structuring and Tokenizing Distributed User Interest Context for Generative Recommendation

Jun 18, 2026

Ruizhong Qiu, Yinglong Xia, Dongqi Fu et al.

Combining graph-based user co-engagement patterns with semantic tokenization creates more accurate user interest representations for generative recommendation systems at scale.

This paper presents G2Rec, a framework that improves generative recommendation systems by better organizing user behavior and item information. It combines graph-based user interaction patterns with semantic tokenization to help recommendation models understand what users want next, without needing labeled user interests.

applicationsarchitecturedata

FlowEdit: Associative Memory for Lifelong Pronunciation Adaptation in Flow-Matching TTS

Jun 18, 2026

Harshit Singh, Ayush Pratap Singh, Nityanand Mathur

You can add lifelong learning to frozen TTS models by storing pronunciation fixes in a memory network instead of updating weights—enabling fast adaptation to new proper nouns without retraining.

FlowEdit enables text-to-speech systems to learn and remember pronunciation corrections for proper nouns without retraining. It stores corrections as edits in a memory network, then retrieves and applies them at inference time, reducing pronunciation errors by 93% while keeping the original model frozen.

Jun 8 – Jun 14(22)

Persona-Pruner: Sculpting Lightweight Models for Role-Playing

Jun 12, 2026

Jinsu Kim, Jihoon Tack, Noah Lee et al.

You can shrink language models for specific character personas by 50%+ while keeping 93.8% of role-playing quality, making multi-NPC applications practical without sacrificing character consistency.

This paper introduces Persona-Pruner, a technique that creates lightweight language models optimized for specific character roles by identifying and preserving only the persona-relevant parts of a full model. Unlike standard pruning that indiscriminately removes parameters, this method maintains role-playing quality while reducing computational cost—useful for applications with many NPCs.

efficiencytrainingapplications

Optimal Hidden-Target Learning for Online Inventory Optimization on General Convex Sets

Jun 12, 2026

Anthony Pineci, Yunzong Xu

A simple hidden-target-and-project strategy is provably optimal for inventory optimization with memory constraints, and viewing inventory as a one-dimensional queue dramatically simplifies the theoretical analysis.

This paper solves online inventory optimization—a practical problem where past inventory decisions constrain future actions—by maintaining a hidden target and projecting it onto feasible inventory levels. The method achieves optimal regret bounds on general convex capacity constraints, improving prior results and introducing a novel 'norm alignment' principle that simplifies the analysis.

Jun 1 – Jun 7(8)

Agentopia: Long-Term Life Simulation and Learning in Agent Societies

Jun 5, 2026

Xintao Wang, Sirui Zheng, Hongqiu Wu et al.

Long-term multi-agent simulation can teach LLMs social intelligence—agents trained on years of simulated life experience show better understanding of human-like social behavior and role-playing tasks.

Agentopia simulates 100 AI agents living together for 10 simulated years, learning from social interactions and personal growth. The framework trains language models using a 'life reward' signal based on agent well-being, showing that agents develop realistic social behaviors and that this training improves the underlying model's ability to handle social reasoning tasks.

agentstrainingapplications

Twelve quick tips for designing AI-driven HPC workflows

Jun 5, 2026

Jamie J. Alnasir

AI workflows on HPC systems need different optimization strategies than traditional scientific computing: focus on containerization for portability, smart job scheduling, explicit feedback mechanisms, and I/O efficiency rather than just raw compute throughput.

This guide offers twelve practical strategies for running AI workloads efficiently on HPC clusters. It addresses the unique challenges of AI workflows—which are iterative and data-driven—compared to traditional scientific computing, covering containerization, job scheduling, feedback loops, and file I/O optimization to help researchers build scalable, reproducible AI pipelines.

Papers

Jul 6 – Jul 12(15)

MulTTiPop: A Multitrack Transcription Dataset for Pop Music

Using AI-based Learning Assistants in Higher Education: A Large-Scale Descriptive Analysis

Jun 29 – Jul 5(16)

Program-as-Weights: A Programming Paradigm for Fuzzy Functions

Reasoning LLM Improves Speaker Recognition in Long-form TV Dramas

Jun 22 – Jun 28(24)

Agentic Hardware Design as Repository-Level Code Evolution

Parameter Efficient Hybrid Transformer (PEHT) for Network Traffic Prediction via Dynamic Urban Congestion Integration

Jun 15 – Jun 21(15)

Structuring and Tokenizing Distributed User Interest Context for Generative Recommendation

FlowEdit: Associative Memory for Lifelong Pronunciation Adaptation in Flow-Matching TTS

Jun 8 – Jun 14(22)

Persona-Pruner: Sculpting Lightweight Models for Role-Playing

Optimal Hidden-Target Learning for Online Inventory Optimization on General Convex Sets

Jun 1 – Jun 7(8)

Agentopia: Long-Term Life Simulation and Learning in Agent Societies

Twelve quick tips for designing AI-driven HPC workflows

ARDY: Autoregressive Diffusion with Hybrid Representation for Interactive Human Motion Generation

Pose-to-Biomechanics: Bridging 3D Human Pose Estimation and Biomechanical Attribute Prediction

LTM: Large-scale Terrain Model for Wildfire-prone Landscapes

MPFlow: Learning Budgeted Max-Flow Optimization on the Lightning Network with Deep Graph Reinforcement Learning

WebSwarm: Recursive Multi-Agent Orchestration for Deep-and-Wide Web Search

Accurate, Interdisciplinary and Transparent Structure-property Understanding with Deep Native Structural Reasoning

Breaking Database Lock-in: Agentic Regeneration of High Performance Storage Readers for Database Bypass

SkillCenter: A Large-Scale Source-Grounded Skill Library for Autonomous AI Agents

Rethinking Indic AI from a Lens of Cultural Heritage Preservation

The Large Cancer Assistant (LCA): A Model-Agnostic Orchestration Framework for Scalable Clinical Decision Support in Oncology

RSF-GLLM: Bridging the Semantic Gap in Multi-Hop Knowledge Graph QA via Recurrent Soft-Flow and Decoupled LLM Generation

Industry Classification of GitHub Repositories Using the North American Industry Classification System (NAICS)

From Fixed to Free Cameras: Calibration-Free View-Robust Vision-Language-Action Model

TestEvo-Bench: An Executable and Live Benchmark for Test and Code Co-Evolution

Will Scaling Improve Social Simulation with LLMs?

Reasoning effort, not tool access, buys first-try reliability in agentic code generation: an observational study

Automated grading of Linux/bash examinations using large language models: a four-level cognitive taxonomy approach

Q-GAIN: A Python Package for Machine Learning and Physically Informed Analysis Applications

Steerability via constraints: a substrate for scalable oversight of coding agents

Bringing Agentic Search to Earth Observation Data Discovery

Know Your Source: A Public Knowledge Store for Media Background Checks

HULAT2 at MER-TRANS 2026: Governed Multi-Agent Simplification for Spanish Easy-to-Read Generation

VisionAId: An Offline-First Multimodal Android Assistant for People with Visual Impairment, Featuring Personalized Object Retrieval

Are Performance-Optimization Benchmarks Reliably Measuring Coding Agents?

Optimal Resource Utilization for Autonomous Laboratory Orchestrators

PolicyGuard: From Organizational Policies to Neuro-SymbolicCompliance Review Engines

VLK: Learning Humanoid Loco-Manipulation from Synthetic Interactions in Reconstructed Scenes

Govern the Repository, Not the Agent: Measuring Ecosystem-Level Risk in AI-Native Software

Autoregressive Boltzmann Generators

Mapping Political-Elite Networks in Europe with a Multilingual Joint Entity-Relation Extraction Pipeline

Language-Based Digital Twins for Elderly Cognitive Assistance

LLM-Based Examination of Eligibility Criteria from Securities Prospectuses at the German Central Bank

A Multi-Fidelity Convolutional Autoencoder-Transfer Learning Framework for Guided-Wave-Based Damage Diagnosis Using Large Simulated and Limited Experimental Datasets

AI Healthcare Chatbots as Information Infrastructure: A Large-Scale Study of User-Reported Breakdowns

Designing Reward Signals for Portable Query Generation: A Case Study in Industrial Semantic Job Search

How Surprising Is Historical Italian to Language Models? Tokenization Tax, Comprehension Tax, and a Simple Mitigation

RSPC: A Benchmark for Modeling Stress and Psychiatric Conditions in Digitally Mediated Relationships using Psychiatrist Annotations

From Celebrities to Anyone: Characterizing AI Nudification Content, Technology, and Community Dynamics on 4chan

A Process Harness for Uplifting Legacy Workflows to Agentic BPM: Design and Realization in CUGA FLO

A cross-process welding penetration status prediction algorithm based on unsupervised domain adaptation in laser and TIG welding

AI translation of literary texts is "fine", but readers still prefer human translations

It's Complicated: On the Design and Evaluation of AI-Powered AAC Interfaces

Accuracy and Satisfaction in Multi-Turn LLM Dialogues for NFR Assessment

Large-Language-Model Discovery of Quantum LDPC Codes through Structured Concept Evolution

Semantic Browsing: Controllable Diversity for Image Generation

PsyBridge: A Hybrid Intelligent Framework for Multi-Dimensional Mental Health Assessment and Decision Support

EnterpriseClawBench: Benchmarking Agents from Real Workplace Sessions

TailorMind: Towards Preference-Aligned Multimodal Content Generation

AI Exposure Scores: what they measure, what they miss, and what comes next

Context-Aware Hierarchical Bayesian Modeling of IVF Laboratory Environmental Conditions

Multi-View Decompilation for LLM-Based Malware Classification

DataMagic: Transforming Tabular Data into Data Insight Video

Freeing the Law with LOCUS: A Local Ordinance Corpus for the United States

Reference-Driven Multi-Speaker Audio Scene Generation from In-the-Wild Priors

Data Intelligence Agents: Interpreting, Modeling, and Querying Enterprise Data via Autonomous Coding Agents

Risk Stratification for ICU Delirium using Pervasive Ambient Sensing Information

Correct Yourself, Keep My Trust: How Self-Correction and Social Connection Shape Credibility in Social Chatbots

ReproRepo: Scaling Reproducibility Audits with GitHub Repository Issues

Darshana Graph: A Parallel Commentary Corpus for Comparative Indian Philosophy, with Stylometric and Exploratory Graph Analyses

Analyzing and Encoding the Al-Mawrid Arabic-English Dictionary with the ISO Language Markup Framework and TEI Lex-0

RubricsTree: Scalable and Evolving Open-Ended Evaluation of Personal Health Agents across Health Memory and Medical Skills