An Introduction to Computer Vision

An Introduction to Computer Vision

A Five-part Tutorial on Deep Learning

We’re delighted and honored to present this five-part series of tutorials on Computer Vision created by Stanford’s Andrej Karpathy. Computer Vision is ubiquitous now, with applications in search, image understanding, apps, mapping, medicine, drones, and self-driving cars. Core to many of these applications are visual recognition tasks such as image classification, localization, and detection. Recent developments in neural network (aka “deep learning”) approaches have greatly advanced the performance of these state-of-the-art visual recognition systems. This series of articles provides a deep dive into details of the deep learning architectures, with a focus on learning end-to-end models for these tasks that span image classification to convolutional neural networks. Through this series, you will gain a good understanding of how to implement, train, and debug your own neural networks. You’ll also learn how to set up the problem of image recognition, the learning algorithms (e.g., backpropagation), and practical engineering tricks for training and fine-tuning the networks.

Links to each of the series installments:

Part 1: Image Classification, Section 1

Part 2: Image Classification, Section 2

Part 3: Convolutional Neural Nets, Section 1

Part 4: Convolutional Neural Nets, Section 2

Part 5: Convolutional Neural Nets, Section 3

Helpful Background

The series assumes some proficiency in Python. If you lack this background, don’t despair! Take the quick crash course, and you’ll be well on your way. Python is a great general-purpose programming language on its own, but with the help of a few popular libraries (numpy, scipy, matplotlib) it becomes a powerful environment for scientific computing. Python is a high-level, dynamically typed multiparadigm programming language. Python code is often said to be almost like pseudocode, since it allows you to express very powerful ideas in very few lines of code while being very readable. Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays. If you are already familiar with MATLAB, you might find this tutorial useful to get started with Numpy.

Finally, for another perspective on the principles of computer vision, see Brandon Rohrer’s excellent post, Introduction to Convolutional Neural Networks.

About Andrej Karpathy

As of Summer 2016 I am a Research Scientist at OpenAI working on Deep Learning, Generative Models and Reinforcement Learning. Previously I was a Computer Science PhD student at Stanford, working with Fei-Fei Li. My research centered around Deep Learning and its applications in Computer Vision, Natural Language Processing, and their intersection. In particular, I was interested in fully end-to-end learning with Convolutional/Recurrent Neural Networks architectures and recent advances in Deep Reinforcement Learning. Over the course of my PhD I squeezed in two internships at Google where I worked on large-scale feature learning over YouTube videos, and at DeepMind where I worked on Deep Reinforcement Learning and Generative Models. Together with Fei-Fei, I designed and taught a new Stanford undergraduate-level class on Convolutional Neural Networks for Visual Recognition (CS231n), the first Deep Learning course offering at Stanford. On the side, I blogtweet, and maintain several Deep Learning libraries written in Javascript (e.g., ConvNetJSRecurrentJSREINFORCEjst-sneJS). I am also sometimes jokingly referred to as the reference human for ImageNet (post :)), and I create those nice-looking conference proceedings LDA visualization pages each year (NIPS 2015 example). I also recently expanded on this with, which lets you search and sort through 20,000+ Arxiv papers on Machine Learning over the last three years in the same nice format.