Damiano Marsili

I'm a Ph.D student at Caltech advised by Georgia Gkioxari and Pietro Perona.

Before my Ph.D, I was an Applied Science Intern at Amazon Robotics working on 3D spatial reasoning. Prior to that, I double-majored at Johns Hopkins in Computer Science and Mathematics.

Email  /  Resume  /  Google Scholar  /  Twitter  /  LinkedIn  /  Github

profile photo

Research

I'm interested in pushing the boundaries of vision capabilities of multimodal LLMs. Most of my research involves post-training vision-language models (VLMs) for visual reasoning in 3D and tool-use.

I'm best reachable via email at dmarsili at caltech dot edu.

News

  • [Feb. 2025] VADAR was accepted to CVPR 2025!
  • [Sep. 2023] Started my Ph.D at Caltech working with Georgia Gkioxari and Pietro Perona.
  • [Jun. 2023] Started an internship at Amazon Robotics working on 3D Spatial Reasoning
  • [May 2023] Graduated with a double major in Computer Science and Mathematics.
  • [Aug. 2020] Started my undegraduate at Johns Hopkins.

Publications

VADAR demo
Same or Not? Enhancing Visual Perception in Vision-Language Models
Damiano Marsili, Aditya Mehta, Ryan Lin, Georgia Gkioxari
In review, 2025
project page / arXiv / code / bibtex

No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers
Damiano Marsili, Georgia Gkioxari
In review, 2025
project page / arXiv / code / bibtex

VADAR: Visual Agentic AI for Spatial Reasoning with a Dynamic API
Damiano Marsili*, Rohun Agrawal*, Yisong Yue, Georgia Gkioxari
CVPR, 2025
project page / arXiv / code / bibtex


Website Template credits: Jon Barron