An inference approach where a model generates an intermediate image before answering a question about it.