An optimization method that accounts for the geometry of the data distribution, often converging faster than standard gradient descent.