From Multimodal LLMs to Generalist Embodied Agents