Skip to Content

One for the money, two for the show

You have probably  read in the press that Robert Soloway, aka Spam King, was arrested  ( on May 30th  with a charge of identity theft, money laundering and fraud via different  channels. Despite this arrest, the spam is still flowing as if nothing has happened.  Therefore, new anti-spam initiatives have been announced. You might have  noticed that I follow these trends a bit and want to share the latest with you:  reCAPTCHA. The sub slogan of this project is rather interesting: stop spam,  read books (Digitizing Books One Word at a Time). Let’s look at some background  info on this project.       The other  day, the University   of Ghent {code:html}announced{code} that it has joined Google’s Library Project. This is one  of the multiple projects digitizing physical books and, to make them searchable,  transforming them into text aka OCR. The OCR transformation isn’t flawless, as a  consequence of which the text may be misinterpreted. In order to prevent this,  a human must read the scanned text and correct the faulty transformation.                 Here’s where the reCAPTCHA comes  in. Instead of the classical CAPTCHAs, where humans need to decipher random  generated numbers or characters, an image of the incorrectly read OCR is shown.  A human must then interpret that OCR and enter the correct text. You may wonder  how the computer can check whether a human has filled in the correct answer if  it couldn’t interpret it in the first place. The nice thing about reCAPTCHA is  that not one word, but two, are provided. The second word is one for which the system  already knows the answer. If the human enters the second word correctly, it assumes  that the first word is also correct. The system will present the text to other  people too in order to double check. If all these people come up with the same  answer, the system will know the correct interpretation of it. For the record,  reCAPTCHA is currently helping the Internet  Archive  (    How does it work? image
You must be Logged on to comment or reply to a post.