For physics-based machine learning, learning representations in latent space (like JEPAs) works better than optimizing pixel-level predictions, and generic self-supervised methods can be surprisingly effective for scientific tasks.
This paper challenges the standard approach of training physics models to predict the next frame. Instead, it evaluates whether models learn useful representations by testing them on downstream scientific tasks like estimating a system's physical parameters.