IV. Computer Vision

Almost all animal species use eyes — in fact evolution has invented the eye many times over. Even very simple animals, bees for example, with brains comprising just 106 neurons (compared to our 1011) are able to perform complex and life critical tasks such as finding food and returning it to the hive using vision. This is despite the very high biological cost of owning an eye: the complex eye itself, muscles to move it, eyelids and tear ducts to protect it, and a large visual cortex (relative to body size) to process its data.

Our own experience is that eyes are a very effective sensor for recognition, navigation, obstacle avoidance and manipulation. Cameras mimic the function of an eye and we wish to use cameras to create vision-based competencies for robots — to use digital images to recognize objects and navigate within the world.

Technological development has made it feasible for robots to use cameras as eyes. For much of the history of computer vision, dating back to the 1960s, electronic cameras were cumbersome and expensive and computer power was inadequate. Today we have CMOS cameras developed for cell phones that cost just a few dollars each, and personal computers come standard with massive parallel computing power. New algorithms, cheap sensors and plentiful computing power make vision a practical sensor today.

This part discusses the process of vision from start to finish: from the light falling on a scene, being reflected, gathered by a lens, turned into a digital image and processed by various algorithms to extract the information required to support the robot competencies listed above.

  1. Light & color
    • Spectral representation of light
    • Color, color spaces, color gamut, color consistency
    • What is white? White balance
    • Gamma
  2. Image formation
    • Perspective imaging, calibration
    • Fisheye, catadioptric & unified imaging
  3. Image processing
    • Acquiring images from files, cameras and the web
    • Monadic operation
    • Diadic operations
    • Spatial operations: convolution, template matching, rank filtering
    • Morphology: image cleanup, skeletonization, hit-or-miss transform
    • Shape changing: cropping, resizing, warping, pyramids
  4. Image feature extraction
    • Region features: segmentation, thresholding, MSER, graph-based
    • Line features: Hough transform
    • Point features: Harris, SURF
  5. Using multiple images
    • Fundamental & essential matrix, estimation & RANSAC
    • Homographies
    • Dense stereo, rectification
    • ICP and plane fitting
    • Examples: perspective undistortion, mosaicing, image retrieval