Humans helping computers read books

The headline on C-Net was an eye-catcher: New tool screens spam, digitizes books.

The story is just as amazing. It’s about Carnegie-Mellon University student Ben Maurer’s clever use of human pattern-recognition power in the digitization of text scanned from old books. And it takes advantage of the familiar “Captcha” – that bit of obscured text commonly used in online forms to prevent spammers from making automated submissions.

Every once in a while, computer software has trouble recognizing a word from scanned text, as in this example:


By using the reCAPTCHA system, instead of an ordinary Captcha, a webmaster can not only ensure that humans — not spambots — are accessing a website, they can also help the book digitization program.

The reCAPTCHA looks like this:


To ensure accuracy, each reCAPTCHAconsists of two obscured words. One is the customary human versus machine test where the computer knows the right answer. If it is correctly interpreted, then the interpretation of the second obscured word is recorded. After three people have come up with the same intrepretation, it is deemed correct.

From the reCAPTCHA site:

About 60 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that’s not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day.

This is a nice example of using the wisdom of crowds, networked computing, and the ability of humans to do something computers aren’t so good at.