From e-commerce websites to ticket booking portals, Captchas are everywhere. Challenging you to prove that you are not a robot based on different tests, Captchas have helped control spam traffic on the internet.
Captcha (or Completely Automated Public Turing test to tell Computers and Humans Apart) is designed to differentiate between humans and machines. Captcha used randomly generated text in a distorted image to distinguish between humans and machines in its early days. This was a tricky problem for computers at the time and prevented bots from leaving spam comments on websites or creating spam email addresses.
Due to the efficiency of the test, Captcha grew in popularity, and millions of people around the globe were solving Captchas to create emails or access the content they needed. According to Dr. von Ahn, 200 million Captchas were solved by people every day in 2006.
You can watch the video below in English and Hindi or continue reading the article.
Also read: What is AES Encryption? How does it work?
Recaptcha: Captcha for the digital era
Although Captcha was helping the internet to prevent attacks from bots, the human effort of solving these puzzles was not being utilised efficiently. And this was when Recaptcha was born. Rather than using randomly generated words, Recaptcha used words from old books that were being digitised and could not be understood by computers rather than using randomly generated words.
Recaptcha worked by showing its users a pair of words, both of which were from a book being digitised. One of the words shown to the user could be understood by a computer using Optical Character Recognition (OCR) software. In contrast, the other word could not be identified using the same. The word which could be identified using OCR is known as the control word and is used to test the correctness of the answer for the unidentified word and make sure that a person is human.
The computer assumes that if the user identifies the control word correctly, then the other word is also correct and that the user is a human. After this, the computer stores the answer to the unknown word in a database. This database can have different results and is known as the verification pool. The unknown word with different control words is shown to different users, and the results are compared with the answers in the verification pool. The word with the most matches is put in the verified pool, and the word which could not be identified using the OCR is now verified using Recaptcha.
This was a great innovation and Google bought Recaptcha in 2009. It used Recaptcha to archive books in Google books archives. It was also used by Google to improve Google street view helping Google maps to map addresses in Google Maps efficiently.
All this was great, but with the advancement in Artificial Intelligence and deep learning algorithms, computers were catching up with humans, and they could solve Captchas with great efficiency. In fact, data from Captchas were used to train convolutional neural networks, which had an accuracy of 99.8 per cent. This was a problem because computers could solve captchas with text images, and something new was required.
Also read: Should you trust Iris Recognition? Pros and Cons
Recaptcha V2: Easy on humans but not on bots
Looking at the flaws of Recaptcha, Google came up with Nocaptcha Recaptcha or Recaptcha version 2. This version of Recaptcha did not ask users to type the text they see in distorted images. Instead, all a user had to do was click on a checkbox.
It seems weird, right? Earlier, you had to identify text with lines and bad backgrounds, and now all you have to do is click on the checkbox and, you are not considered a robot. This is because when you click on the checkbox, Google uses advanced risk analysis based on your activity on the internet to determine if you are a bot or not.
Recaptcha V2 also came with an invisible version in which a user does not have to click on a checkbox; instead, the Recaptcha script runs when a button is clicked on a website.
If Google thinks that the user is not a bot then it redirects the user to the next page otherwise the user is shown an image classifying problem which is very hard for bots. Again, if the user solves the problem then they are redirected to the next page.
Although Recaptcha V2 was better than Recaptcha, it offered some friction to the end-user. If, for some reason, Google’s risk analysis thought that you were a bot, you had to find traffic lights in a set of images which is never a fun experience. This is where Recaptcha V3 comes into the picture.
Also read: How does public-key encryption work? Does it make the internet safer?
Is Recaptcha V3 perfect?
Recaptcha V3 is the newest version of Recaptcha and does not show any images or checkboxes to its users. This version of Recaptcha is entirely invisible and runs as a script in the background of a webpage according to the webmasters’ needs. It can be loaded with the page or when a button is clicked for login. Due to the better user experience, over 1.4 million websites are using Recaptcha version 3
Every time this script runs, it generates a score for the user on the webmasters’ website based on each user’s behaviour. This score ranges from 0.0 to 1.0, with the higher end of the spectrum (1.0) being human activity and the lowest (0.0) being a bot.
After seeing the activity on the page, the webmaster can decide what needs to be done. The webmaster can ask the user for two-factor verification or any other form of identification to let superstitious activity through or completely block such activity.
To generate this score, Google uses adaptive risk analysis. Due to the adaptive nature of the algorithm, it learns how humans interact with a particular website and assign scores accordingly. This makes it hard for bots to mimic humans.
That being said Recaptcha v3 is not unbreakable and can be bypassed using reinforcement learning.
Google encourages web admins to embed the Recaptcha V3 script in multiple pages so that Google can better analyse the traffic on the website, which brings up the question of how does Google analyse traffic in the first place.
Using a demo website running the Recaptcha script you can see your scores the script generates. If you open the site with your regular web browser which has your Google account logged in, you will get a high score of 0.9. On the other hand, if you use a VPN alongwith a browser, which does not have your Google account, your score can go down to 0.3. This shows that the algorithm could be biased towards users who use VPNs or adblockers to enhance privacy by giving them a lower score.
Not only this, Google told Fastcompany that the script also sends “hardware and software information, including device and application data, back to Google for analysis and that the service is only used to fight spam and abuse”.
Although Google claims that data collected from Recaptcha is not used for advertisements, it is still collecting data from its users by scripts running in the backgrounds of websites.
Looking at Recaptcha V3, one can say that it is a double-edged sword. Google has worked a lot on making bot detection on the internet better by creating a seamless experience for the users, but it collects data and invades user privacy to provide that experience
Also read: How to make your LinkedIn profile private?
A tech enthusiast, driven by curiosity. A bibliophile who loves to travel. An Engineering graduate who loves to code and write about new technologies. Can’t sustain without coffee.
You can contact Nischay via email: [email protected]