A component that converts images into a numerical representation that a language model can understand and process.