I'm trying to scrape a page with recaptcha widget using phantomjs but when I get the page it has no captcha image.
If I add an iframe element to the page, the image shows. The weirdest thing is that the image only appears if you make an iframe with specific content.
Here is the html code that I used to test (it's the normal code from recaptcha docs with the iframe element)
<form action= method=post>
<script type=text/javascript src=http://www.google.com/recaptcha/api/challenge?k=6LfUUtMSAAAAAOBuPTWtMAnAu3l9AS-iHZb6iFpp&error=>
</script>
<noscript>
<iframe src=http://www.google.com/recaptcha/api/noscript?k=6LfUUtMSAAAAAOBuPTWtMAnAu3l9AS-iHZb6iFpp&error= height=300 width=500 frameborder=0></iframe>
<br>
<textarea name=recaptcha_challenge_field rows=3 cols=40>
</textarea>
<input type=hidden name=recaptcha_response_field value=manual_challenge>
</noscript>
</form>
<iframe src=frame.html></iframe>
The iframe refers to the page frame.html and here is the specific code of it
<a><img src='http://c'></a>
If you tried to change the content of the frame.html a little bit you'll probably not get the captcha image.
The PhantomJS script that I used is this:
var url = 'http://127.0.0.1/php_api/recaptcha.html';
var page = require('webpage').create();
page.settings.userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0';
page.open(url, function (status) {
if (status !== 'success') {
console.log('Unable to access network');
} else {
var p = page.evaluate(function () {
return document.getElementById(recaptcha_challenge_image).src;
});
console.log(p);
}
phantom.exit();
});
This is the first time I use PhantomJS so is there something I'm missing?