How did LLMs gain Vision?
Introduction LLMs were created as text-only chat bots as an artifact of their training paradigm: they learn to predict the next token (~word). Photos, videos and other richer media don’t have words and therefore were naturally excluded from the train...
May 5, 20253 min read28
