Who Will Solve Google reCaptcha?
How to fill up the inbox with spam? Create a contact form without captcha!
Did the form focused against bots Google reCaptcha step into your way? And did it ever stop you from automatically gathering information from a website? Captcha by Google can be worked around by machine, but there are only a couple options and not all of them are suitable or functional. There is, however, a solution and it might surprise you that it is not so surprising. So who will solve Google captcha?
Probably everybody who ever needed to process some data from a website or web application automatically without a public API, has probably used a scripting headless browser such as PhantomJS, CasperJS or Cypress. All worked great, until there was reCaptcha in the login dialogue (captcha by Google). What now?
When asking about a solution, Google itself advises us about following options how to try to work around the reCaptcha:
- Misuse the listening part for blind people (yes, really).
- Switch proxy servers and hope that during the first attempt, there will not be a pop-up with the title “Select all squares that include a palm tree.”
- Using human captcha solver (wait, what?!)
Let’s start by quickly going through the options and possible solutions.
1) Audio bypass
Brief description of the solution:
- We go to the login page.
- We fill out the login information.
- We click on captcha.
- From the DOM structure, we find out the path to the listening file.
- We send our web service to the given link with the audio file, it downloads it, runs it through ReBreakCaptcha and sends it back in the text form that we enter into the captcha.
- We send the form!
And where is the issue? At the time of its publication in 2016, the project ReBreakCaptcha became famous around the world and Google reacted to it. Subsequently on March 3rd, 2017, the project author wrote the last update that the project can no longer decipher the voice messages from Google.
2) Proxy, VPN
… just… no. ???? Using proxy or VPN can reduce the risk of the authentication dialogue popping up, but it just does not solve the issue itself. Moreover, this method requires the use of a big number of private proxy servers or private VPN networks.
3) Human Captcha Solver!
Honestly I was – probably just like you – surprised by the word “human” in connection with a captcha solution. The name is no joke, it really goes like this. Captcha is not solved by any neural network or sophisticated algorithm, but by a real human sitting on the other side of the planet.
What is CAPTCHA really?
Let’s look back in the past: The first form of captcha was “invented” in the year 2000. The term CAPTCHA is an acronym for „Completely Automated Public Turing Test to tell Computers and Humans Apart“. The target is therefore clearly defined: tell Computers from a human. It is worth mentioning that in the case of captcha, the positions are reversed in comparison with the original Turing test. The one who is evaluating the result is a computer and not a human.
Google has expanded the original purpose and is trying to use side effects of the test as well. The results of reCaptcha are used to identify objects, recognise text (that can be used to read traffic signs), annotating pictures or translating books. A great example of this application is Google Translate. Take a picture with your cell phone of a piece of paper with a text and Google will transcribe it into a text field. Unfortunately, there is probably nobody with such an extensive dataset as Google and therefore it is necessary to involve a human into our automated process.
2Captcha can help!
I introduce to you a service that can help you! 2Captcha is a service used for automatically recognising tests focused against bots. The individual captcha cases are resolved by – surprise – humans.
In case you are now unhappy with your job, for just a 1000 solved reCaptchas, you can get rewarded by the amount 1,02$.
As we are quite happy with our job, we are a lot more interested in how much we pay for a 1000 solved Captchas. That would be $3.
Let’s get down to the code
It will not be entirely without work. To crack captcha, we will need, apart from the above mentioned service, also JavaScript. Below, you can take a look at a sample of the source code. For those who do not like JavaScript or had no coffee this morning, I will quickly summarize the process in bullet points. So here goes the code.
// API key
// https://2captcha.com/enterpage
const API_KEY = 'XXXX';
// Find site key of a website
const googleSiteKey = document.getElementsByClassName('g-recaptcha')[0].getAttribute('data-sitekey');
// Helper parsing function
const extractTextFromResponse = response => response.status === 200 ? response.text().then(text => text) : false;
// Helper delay function
const delay = value => new Promise(res => setTimeout(res, value));
// Function for sending captcha we want to solve to the API
async function sendCaptcha() {
const captchaDataString = [
'key=' + API_KEY,
'method=userrecaptcha',
'googlekey=' + googleSiteKey,
'pageurl=' + window.location.href,
].join("&");
return await fetch('https://2captcha.com/in.php?' + captchaDataString)
.then(payload => extractTextFromResponse(payload))
.then(payload => {
if (!payload || payload.substr(0, 2) !== "OK") {
console.error("Payload is not okay", payload);
return false;
}
return payload.substr(3);
})
.catch(error => {
console.error("Something went wrong", error);
return false;
})
}
// Function that waits for a response
async function poolResponse(requestId, counter = 0, counterLimit = 3, waitTime = 20000, decrementWaitTimeBy = 5000) {
if (counter === counterLimit || waitTime < 0) {
console.error("Captcha was not solved in time.");
return false;
}
await delay(waitTime - decrementWaitTimeBy); // Wait some time
const dataStringRes = [
'key=' + API_KEY,
'action=GET',
'id=' + requestId,
'json=0'
].join("&");
return fetch('https://2captcha.com/res.php?' + dataStringRes)
.then(payload => extractTextFromResponse(payload))
.catch(error => {
console.error("Something went wrong", error);
reject(false);
});
}
// Start function
(async function () {
// Get request id of current captcha
const requestId = await sendCaptcha();
if (!requestId) {
return false;
}
// Wait for somebody to solve your captcha
const counterLimit = 3;
for (let i = 0; i < counterLimit; i++) {
const payload = await poolResponse(requestId, i, counterLimit);
if (payload === "CAPCHA_NOT_READY") {
continue;
}
if (!payload || payload.substr(0, 2) !== "OK") {
console.error("Captcha was not solved.", payload);
return false;
}
// Save
document.getElementById("g-recaptcha-response").innerHTML = payload;
break;
}
})();
And now the process. All we need is a few simple steps.
- We find out the site key (public key) that the website is using to display captcha. (Secret key – private key is then used by the given page for encrypted communication with Google server to validate whether captcha was solved successfully.)
- We send this information to the API of the service http://2captcha.com and wait for the response.
- The response is in the format `STATUS|REQUEST_ID`.
- In case it is `STATUS == 'OK'`, we wait for 20 seconds and after that we ask whether the captcha was already solved by one of the workers.
- The response type is either `CAPCHA_NOT_READY`, or `STATUS|RESPONSE_HASH`.
- In case the captcha is in the status `CAPCHA_NOT_READY`, we wait for half the time and repeat the action, three times at most.
- In case we receive a positive response, we save it to DOM and send the form.
Leave CAPTCHA for humans
In this article, we have shown 2 things. First, how easy it is to go around the everpresent Google reCaptcha and secondly, that it is best to use a human brain to do it. The reverse Turing test is best handled by the one for which it was designed – the human.