Sajjad Pakdamansavoji
I am an Applied ML Researcher and Engineer with 5 years of experience building generative and multimodal systems. My work focuses on transforming research in VLMs and diffusion models into reliable, real-world applications.
Currently, at Huawei Noah’s Ark Lab, I develop Embodied AI pipelines that bridge the gap between vision and action. I integrate diffusion architectures (DiT) and ViT with 3D representations into LLM, VLM, and VLA backbones. By fusing these technologies, I aim to enhance physical grounding for complex spatial reasoning and object manipulation.
Previously, at the Vector Institute, I turned LLM and VLM research into agentic systems. I architected multi-step reasoning workflows and RAG pipelines that significantly improved retrieval accuracy and safety. To support the broader AI ecosystem, I led workshops on advanced fine-tuning using LoRA and high-throughput deployment via vLLM.
At Trans-Plan, I built 3D perception systems for urban environments. I led the distributed fine-tuning of ViT-based foundation models for open-vocabulary detection, segmentation, and Tracking, boosting tracking accuracy (MOTA) by 28% and achieving 92% accuracy in trajectory classification. I also developed a real-time 3D visualization GUI to enable human-in-the-loop model refinement.
My early work at AVIR involved engineering a multi-modal translation system for Iranian Sign Language. I built a pipeline using MediaPipe and Attention-based Bi-LSTM models for video-to-text translation, optimizing the system to run at >30 FPS for low-latency, real-world use.
I hold an MSc in Computer Science from York University, and a BSc in Electrical Engineering from the University of Tehran. My academic journey has been recognized by prestigious awards, including the VISTA Scholarship, the Vector Scholarship in AI, the Lassonde Entrance Scholarship, and the Founder's Entrance Scholarship. I was also honored with the IEEE ITSC Best Application Paper Award and the Academic Excellence Award from York University.