A factorization where value and policy functions are expressed as products of goal-conditioned coefficients and learned basis functions.