Teaching machines to watch TV

by Karl Bates

Watching TV is one of the things that computers cannot do better than humans. They’re terrible at it, in fact. But Leslie Collins’ signal processing group in ECE is working to change that.

The “big data” potential is vast: Turning a moving picture into data that could be analyzed statistically in something approaching real time opens up all sorts of questions to ask of security cameras, Google Street View, environmental sensing images, just about anything visual.

Kenneth Morton and Leslie CollinsTheir interest in video analytics grew out of the Collins group’s work on landmine detection. Looking at a long-running stream of 3-D ground-penetrating radar scans as a military vehicle drove a bouncing and potentially mineladen road in Iraq, assistant research professor Peter Torrione got to thinking “It’s kind of a movie. We could probably get into video processing.” Unfortunately, the best algorithms developed so far take a couple of seconds per frame to make a statistical determination of whether there’s anything unusual in the visual data. Smooth video shot at 30 or 40 frames per second would quickly overwhelm such a system.

A big part of the problem for video analytics is that computers haven’t learned their shapes yet. The state of the art in detecting a human is developing, but can be thrown off by lighting, objects in the way, or the camera angle, Torrione said. Show the computer a car in any orientation, and it doesn’t have a clue.

With support from an industry partner, the group is now working to solve that in a step-wise fashion. Their proof of concept so far is a rather spectacular demonstration of a webcam and five PCs driving a red sports car at 130 miles per hour on a winding mountain road in the video game Need for Speed. With no more input than the video camera, the computers are interpreting what they see on the screen and sending controls to the PS3 game console to operate the virtual car’s steering, acceleration and brakes. It’s a remarkable advance for natural vision recognition.

The system makes these interpretations and decisions rapidly enough that the car doesn’t just go sailing off into oblivion, and it wins races against the gaming console most of the time.

--from Duke Engineering: Leading Research 2013