Sajjad P. Savoji

Structured QA Generation with Large Language Models - Vector Intern Talks

In this presentation, we'll dive into the exciting world of Large Language Models (LLMs) and how to make them excel in constrained generation tasks. We've put together simple, yet powerful guidelines on prompt engineering to improve structure generation. Plus, we'll compare open-source models with their commercial peers to see which ones are best suited for this task. Get ready for a straightforward, insightful session on the latest in AI and language processing!

Digital Booklet: An Innovative Approach to Showcasing Open-Source Projects - Vector Intern Talks

Experience the future of open-source project presentation with Digital Booklet, a ground-breaking platform that reimagines how we showcase and publish open-source projects, workshops, and tutorials. Departing from traditional git-repository templates focused solely on reproducibility, Digital Booklet centers its approach on enhancing reusability and user experience. Leveraging the power of CI/CD, Jupyter book, and online computational resources like Google Colab, this project opens the doors to open-source demos for a diverse audience, spanning various levels of technical proficiency. Whether you're a seasoned pro or a newcomer, Digital Booklet is your gateway to a new era of innovation and accessibility in the open-source community.

Active Vision for Early Recognition of Human Actions

We propose a method for early recognition of human actions, one that can take advantages of multiple cameras while satisfying the constraints due to limited communication bandwidth and processing power. Our method considers multiple cameras, and at each time step, it will decide the best camera to use so that a confident recognition decision can be reached as soon as possible. We formulate the camera selection problem as a sequential decision process, and learn a view selection policy based on reinforcement learning.

ResNest: Split Attention Networks

It is well known that featuremap attention and multi-path representation are important for visual recognition. In this paper, we present a modularized architecture, which applies the channel-wise attention on different network branches to leverage their success in capturing cross-feature interactions and learning diverse representations. Our design results in a simple and unified computation block, which can be parameterized using only a few variables. Our model, named ResNeSt, outperforms EfficientNet in accuracy and latency trade-off on image classification.

Chained Tracker

Existing Multiple-Object Tracking (MOT) methods either follow the tracking-by-detection paradigm to conduct object detection, feature extraction and data association separately, or have two of the three subtasks integrated to form a partially end-to-end solution. Going beyond these sub-optimal frameworks, we propose a simple online model named Chained-Tracker (CTracker), which naturally integrates all the three subtasks into an end-to-end solution (the first as far as we know).

ISSUM Workshop

I presented a part of my master thesis at the ISSUM Workshop. Some of the details have been omitted for confidentiality. Will probably post a complete version once our publication is public.

Trackformer

A through description into the first tracker following a tracking-by-attention paradigm. In this video the model Trackformer and its corresponding paper Multiple Object Tracking with Transformers are explained.

ISSUM Workshop Closing Event

I presented a part of my master thesis at the ISSUM Workshop Closing Event. Some of the details have been omitted for confidentiality. Will probably post a complete version once our publication is public.