Deep Learning for Computer Vision

This session introduces students to deep learning approaches that have revolutionized computer vision tasks.

Students will examine the evolution of convolutional neural network (CNN) architectures, starting from earlier networks like LeNet and AlexNet, and progressing to more sophisticated models such as ResNet and EfficientNet. They will learn how to leverage transfer learning by fine-tuning pre-trained models—trained on large-scale datasets like ImageNet—for specialized image classification tasks in data analytics. The session delves into the mechanics and architectures of modern object detection methods, including the R-CNN family, Single Shot Detector (SSD), and the increasingly popular YOLO (You Only Look Once) frameworks.

Special attention will be given to the YOLO model’s structure, bounding box regression, and its advantages for real-time analytics. In parallel, students will explore a range of image augmentation strategies—such as geometric transformations, color jitter, and advanced techniques like CutOut and MixUp—which are vital for enhancing the robustness and generalization of deep learning models when labeled data is scarce.

Through practical implementation, students will train and evaluate deep detection models, compare their performance with classical methods, and gain valuable experience in state-of-the-art computer vision workflows.

Required Reading and Listening

Textbooks:

Hands-On Image Processing with Python Mastering OpenCV 4 with Python Python Image Processing Cookbook Feature Extraction and Image Processing for Computer Vision, 4th Edition