Before my Ph.D, I was an Applied Science Intern at Amazon Robotics working on 3D spatial reasoning.
Prior to that, I double-majored at Johns Hopkins in Computer Science and Mathematics.
I'm interested in pushing the boundaries of vision capabilities of multimodal LLMs. Most of my
research involves post-training vision-language models (VLMs) for visual reasoning in 3D and
tool-use.
I'm best reachable via email at dmarsili at caltech dot edu.