Decoder-only language models show similar gender bias problems as smaller models in translation tasks, but instruction tuning can reduce masculine bias and improve context awareness.
This paper examines how large language models handle gender in machine translation, where languages differ in how they mark gender. The researchers introduce a new measurement called "Prior Bias" to capture what gender a model assumes by default, and test decoder-only models (like GPT-style architectures) against traditional encoder-decoder models.