Damiano Marsili

I'm a Ph.D student at Caltech advised by Georgia Gkioxari and Pietro Perona.

Before my Ph.D, I was an Applied Science Intern at Amazon Robotics working on 3D spatial reasoning. Prior to that, I double-majored at Johns Hopkins in Computer Science and Mathematics.

Email / Resume / Google Scholar / Twitter / LinkedIn / Github

Research

I'm interested in pushing the boundaries of visual reasoning capabilities of multimodal LLMs. Most of my research involves post-training vision-language models (VLMs) for visual reasoning in 3D and tool-use.

I'm best reachable via email at dmarsili at caltech dot edu.

News

[Dec. 2025] Released TWIN!
[Dec. 2025] Released VALOR!
[Feb. 2025] VADAR was accepted to CVPR 2025!
[Sep. 2023] Started my Ph.D at Caltech working with Georgia Gkioxari and Pietro Perona.
[Jun. 2023] Started an internship at Amazon Robotics working on 3D Spatial Reasoning
[May 2023] Graduated with a double major in Computer Science and Mathematics.
[Aug. 2020] Started my undegraduate at Johns Hopkins.

Publications

	Same or Not? Enhancing Visual Perception in Vision-Language Models Damiano Marsili, Aditya Mehta, Ryan Lin, Georgia Gkioxari arXiv preprint, 2025 project page / arXiv / code / bibtex
	No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers Damiano Marsili, Georgia Gkioxari arXiv preprint, 2025 project page / arXiv / code / bibtex
	VADAR: Visual Agentic AI for Spatial Reasoning with a Dynamic API Damiano Marsili, Rohun Agrawal, Yisong Yue, Georgia Gkioxari CVPR, 2025 project page / arXiv / code / bibtex

Website Template credits: Jon Barron