Visual Learning and Recognition

Weakly Supervised Object Localization

Implements a weakly supervised object detector which utilizes only image-level annotations and no bounding box annotations on the PASCAL VOC 2007 dataset.

Visual Question Answering

In Visual Question Answering (VQA), given an image and a question about it, our goal is to select an answer from a large pool of possible answers. We implement a transformer based architecture which uses pre-trained ResNet18 and RoBERTa to featurize input images and text.

Generative Adversarial Networks

Trained GAN’s with various losses on the CUB 2011 Dataset to generate realistic-looking samples of these birds.

Vanilla GAN

Least Squares GAN

Wassertein GAN with Gradient Penalty