I have seen some captchas being decode using javascript, php, etc. How do they do it?
For example, very popular megaupload site\'s captcha has also been
Take a look at PWNtcha
You can also read Breaking a Visual CAPTCHA
There are services for recognition. Such as 2captcha. This is a tool for solving php https://github.com/jumper423/decaptcha/
I was involved in a project to circumvent Captcha images on the TicketMaster website about 8-9 years ago for a third-party ticket seller. When an event went on-sale, like a concert, our network of machines would use multiple credit cards and mailing addresses to buy any and every seat possible in the first 10 rows.
Rather than generating new captcha's each time, TM had a limited pool of images they could re-use. We'd create a unique digital fingerprint (checksum) for each image, then simply attack it with some imaging tools (LEADTOOLS.com) (to remove extraneous elements, enhance contrast, etc) and then use OCR tools. It was surprisingly effective.
We were able to crack a great number programmatically, and we'd store the ones we couldn't crack for human processing. Sometimes they'd have a pool of 20K images, so at first we'd get maybe 60-70% automatically, but eventually we'd get 100% success because we could identify the images our humans processed (offline) based on looking up their hash in our database. (That is, we could check a captcha image against our database based on the hash we created and if we already had the solution we could just submit the answer immediately.)
Occasionally, they'd flush and replace their pool of captcha image images with a new set, but again, it would just take us a bit of time to get back up to a 100% rate. The fatal flaw with this particular system was that they recycled images, rather than programmatically generating new captcha images each time.
But the fact is, if the financial incentive to crack the capthcha is high enough, it doesn't take much to create a distributed platform where low-wage unskilled workers can sit around earning pocket change to crack them all day.
Inside India's CAPTCHA solving economy http://www.zdnet.com/blog/security/inside-indias-captcha-solving-economy/1835
See:
OCR and Neural Nets in JavaScript
Here John Resig (creator of JQuery javascript library) explains how exactly it is been done.
I'm an image processing specialist and CAPTCHA decoder, I've done many CAPTCHA resolving projects before.
OK, let's start CAPTCHA resolving steps!
Decoding any kind off CAPTCHA has 3 main steps:
Clear the CAPTCHA from any noise (using any image processing methods).
Note for captcha decoding fighter: If you want to have a good CAPTCHA, you should add a stronger noise. Use random noised background that has similar color of characters.
Easy step when they are separate and very hard when they're not.
*Note for captcha decoding fighter: If you want to have a good CAPTCHA, don't leave the character separate! Make them overlapping, do NOT use different colors for characters, decoders can split the characters very easily! (most of the developers are unaware of this and think it's better to use a colorful CAPTCHA!), the best one is making an overlapping string with black color. For an experienced CAPTCHA decoder, it's not a problem to decode a colorful CAPTCHA! It's just beautiful and not useful! :) Use random curved lines witch connect all characters to each other. *
After separation, we have a character set, (we don't have any string now, just have images and pixels), we should convert character images into string, But how?! There are several ways, if they are not rotated, and have fixed font and size (such as freeglobes CAPTCHA), you can define a pattern set, your program should loop throw the patters to find the best match for each image, if the characters is very different and needs a large pattern you should use a "Neural Network" to recognize the character. A neural network for CAPTCHA resolving, will takes a character, and we say the network what this character is, for example, we will give it an image of "A" and we tel the NN: it's "A"! , then it will "LEARN" this character and will save its learning into a database, This procedure called "TRAINING". So, when we ask a trained network for a new character again, it will return us the best match from it's learning database. Usually decoder specialists use the CAPTCHA itself to train the neural network. Be careful! Using appropriate data for training can make or break your results.
Note for captcha decoding fighter: If you want to have a good CAPTCHA, use any method witch a decoder can't recognize the characters, even with a Neural network. Deform the characters randomly, use many fonts instead of one and rotate the characters as well, etc.
Finally, we concatenate all single characters into one and return it as result.
Unfortunately, there are no fixed algorithm for solving any CAPTCHA, it means, new CAPTCHA needs new analysis and training. You can't make a CAPTCHA decoder to decode all CAPTCHA.
What should you know before starting:
1- Image processing fundamentals
2- General understanding of a Neural Network
3- Simple image processing functions (in any language)
For PHP:
imagecreate()
imagecreatetruecolor()
imagecolorat()
imagecolorsforindex()
imagesetpixel()
.
.
.
For .NET:
Bitmap type,
getPixel()
setPixel()
.
.
.
For JavaScript and HTML5:
You should know the Canvas very well.
Lastly: Note for captcha decoding fighter: If you are wonder about how someone can decode a CAPTCHA and want to prevent it from decoding, you should first be a CAPTCHA decoder yourself or hire someone knows the weakness and attacking algorithm very well!
Hope to help! ;)