If you’ve ever filled out a CAPTCHA you’ve probably done some of that work yourself. In theory, those tests are meant to verify that you’re human, but Google has started using them to collect data for other products too.
Typing out this blurry word could help the character recognition algorithm in Google Books.
Captchas when they ask us to ‘Identify the squares with the car’
These skewed numbers are probably helping confirm an address in Google street view. The most recent CAPTCHA’s ask you to identify all the squares of a picture that have a car in it, at the same time that Google’s Waymo branch is trying to train self-driving cars. Even a simple task like setting a timer with Google Assistant can require an army of contractors manually annotating the data as a recent Guardian investigation showed. Sometimes users do the labeling themselves.
Facebook has some of the best facial recognition data in the world because they already have dozens of pictures of your face. You added them yourself.
Multiply that across billions of users and it’s all the data you need to build a facial recognition system, which can then start automatically tagging your friends in the next set of pictures you upload. Suddenly, Facebook has one of the most advanced facial recognition systems in the world, and they didn’t have to pay a dime for it. When researchers at Google were trying to build a depth-sensing camera, they went even further.
What they really needed were a bunch of videos where mobile cameras explored static space from different angles.
But where would they find that?
Google downloaded 2000 mannequin challenge videos, fed them into an algorithm, and a new kind of depth sensing software was born. Think about it, every minute, 500 new hours of content are added to YouTube. If you’re training an AI that’s a lot of videos to draw on. And there are no copyright restrictions on what you can use for training data. The same goes for websites, images, Wikipedia pages, it’s all just there for the taking.
James Corden’s Mannequin
This has been a huge driving force for the AI boom. These systems need lots of examples to recognize even the most basic patterns. That used to months of data entry, but now you can scrape everything you need from the internet in a matter of hours. And the people who made the mannequin challenge videos, they didn’t think they were encoding depth information. If the researchers hadn’t talked about their training system, it would feel like they’d done it all on their own.
The remarkable thing about AI systems is that even though they are built on a foundation of human intelligence, they regularly transcend that, and do something that surprises us or goes beyond what we thought was possible.
One fantastic example of this is the AlphaGo program, which was designed by DeepMind, which was Google’s AI lab in London. And in 2016 and 2017 it played and beat the human champions of the ancient board game Go. There’s one particularly famous moment is now known simply as move 37. It was a move that was so unusual, so counter to human expectations, that the matches commentators thought it was a mistake. But it wasn’t. It was a beautiful play, that completely undermined Lee’s match, and led to Alpha Go winning the game. And it was something that humans couldn’t teach. It was something that the machine had learned by itself.
Yes, it started from a foundation of human intelligence, but it went beyond that. This, I think is where people get so excited by AI, we are a long long way away from building computers that are as flexibly intelligent and sophisticated as humans, but we can still build algorithms and systems that exceed human intelligence, even in very specific domains.
But that’s AI at its best. The flip side is when an app needs a description of what’s in a photo, and the photo recognizing algorithm just doesn’t work. So you get a human being to fill it in, usually through a post on Mechanical Turk. That’s a very old trick, going all the way back to the machine that gave the site its name. The original mechanical Turk was this guy, a master chess-playing robot.
Hundreds of years before there was anything we would think of as a computer. The Turk could beat most chess players, playing so well that people thought it was a technological marvel. But, really it was just a trick. There was a human being inside, hiding under the table and directing the moves from below. It was a human being, dressed up like a machine. A trick no-one had thought of until then. And as Amazon can tell you, the trick still works.