Breaking Image CAPTCHA for fun
Frank Tse, Nexusguard
Agenda
1
CAPTCHA and web services
2
General CAPTCHA breaking method
3
Alternative form
4
Analytic and optimized method
Page§2
About us
§ We handle DDoS attack everyday
§ We face and fight with bots everyday § Research in cryptography, imaging and coding § Research both attack and defence methods
Page§3
CAPTCHA and web services
§ Puzzle for machine
Page§4
CAPTCHA and web services
§ Puzzle for human
Our target “super star” todayà
Security king ?
Security
Washing Machine
Smart Phone
Spaceship
Functionality
Ease of use
T-shirt
CAPTCHA in our eyes
Security
Security Professionals
Programmer
Functionality
Ease of use
End users
Slide-to-fit Captcha
§ The good
– Similar to ‘slide-to-unlock’ type authentication
– It’s user-friendly and with higher successful rate – Works fine with HTML5 without Flash
– I pick it because it responses to attackers
– Opportunity for advertisers and sponsors
§ The bad
– Heavy traffic loading ( ~30 Images)
– Easy to break by nature
– Single tier, single image transformation type
General CAPTCHA breaking method
§ Lock breaking – Bypass
– Skill
– Brute force
http://paxtonlocksmithing.com/blog/2012/02/20/credit-cards-used-to-open-doors/
http://toool.nl/blackbag/images/itl2000.jpg
General CAPTCHA breaking method § CAPTCHA breaking
– Bypass
– Alternative form
– Skill
– OCR
– Statistic
– Curve-fitting (FFT) – Analytic
– Brute force
– Database matching – Effective brute force
Some academic stuffs
§ Fast Fourier Transform (FFT)
– Calculate how ‘blur’ the image is
§ Histogram
– Distribution of data by frequency (photo lighting) – Used to detect artificial background
§ Longest path-finding
– Opposite to ‘shortest path’ by Dijkstra’s Algorithm – Used to detect how serious the image was twisted
Image CAPTCHA evolution
• No padding • No Blur
Ver 1
Ver 2
• Padding \00 • No Blur
• Padding \00 • Blur
Ver 3
Attack Method
Effectiveness
Alternative Form
Good
Good
Good
Simple Statistic
Great
Poor
Poor
Modified statistic
Great
Great
Poor
FFT
Great
Poor
Poor
Analytic (Path, BG)
Great
Great
Great
Alternative form
§ According to W3C Web Content Accessibility Guide (WCAG 2.0) aka ISO/IEC 40500:2012
§ Guideline 1.1 Text Alternatives
§ 1.1.1 Non-text Content: All non-text content that is presented to the user has a text alternative that serves the equivalent purpose, except for the situations listed below. (Level A)
§ CAPTCHA: If the purpose of non-text content is to confirm that content is being accessed by a person rather than a computer, then text alternatives that identify and describe the purpose of the non-text content are provided, and alternative forms of CAPTCHA using output modes for different types of sensory perception are provided to accommodate different disabilities.
§ Attack on the weakest alternative form
Alternative form
• GoogleVoiceAPI
• Pre-recordedfemalevoice
• Indicatesthedirectionofcorrect image
• Slideright/left
• Slideslightlyright/left
• Youareontherightimage
• Voiceisveryuser-friendly
• Voicecanberecognizedby Google Speech-to-text and convert to textJ
Image File Size
33000
31000
29000
27000
25000
23000
21000
19000
17000
15000
Set1 Set2 Set3
1 3 5 7 9 11131517192123252729
Optimizing the algorithm § The Key-space
– Traditional CAPTCHA: 1 out of ~36n – (0.00006 % for brute force when n=4)
– Slide-to-fit : 5 out 31
– 16% by blind brute-force
– Correct image at border (1-3 or 28-31) is about 7%
§ Use HTTP HEAD instead of GET
– Image size was included in header – Bandwidth saved for 99%
§ Get only partial of the whole image set
– Getting min of 5 sample images, 95% of answers are correct – All linear transformation can be solved by shortcut
Image File Size with \00 Padding
35000
30000
25000
20000
15000
10000
5000
0
1 3 5 7 9 11131517192123252729
Total Size Image Data \00 Padding
Contrast Detection
Contrast Detection § Rule #1
– Contrast of an image will reduce when it’s processed with lossy- compression
§ Rule #2
– Contrast is calculated by differences of adjacent image points
§ Rule #3
– Contrast didn’t care about color
§ Rule #4
– Image with higher sum of contrast is usually sharp
Contrast Detection
Inspected images
Contrast
Well, we make the correct image “not that contrast”
by lossy JPEG compression
Image File Size with \00 Padding & not that contrast
35000
30000
25000
20000
15000
10000
5000
0
1 3 5 7 9 11131517192123252729
Total Size Image Data \00 Padding
Image File Size with \00 Padding & not that contrast
500 300 100
-100 -300 -500 -700
1 3 5 7 9 11 131517192123252729
Image Data delta(Image Data)
Well, we make the ALL images
in similar size
by lossy JPEG compression with target size
// Generate JPG file with targeted file size
// jpg_size.py
target_size = sys.argv[1]
jpg_ql = 0
jpg_qh = 100
ε = 200 // bytes
steps = 10
while (steps >0):
current_quality = (jpg_ql+jpg_qh)/2
current_size = sizeof(jpg_compress(img, current_quality))
if ( abs(current_size – target_size) < ε ): break
if ( current_size > target_size ): jpg_qh = current_quality
if ( current_size < target_size ): jpg_lh = current_quality
steps-=1
output = jpg_compress(img,current_quality)
// Generate JPG file with ranged random target file size
μ = 80000 // mean of target size
σ = 400 // standard deviation of target size
target_size[i] = μ + σ*(random.random())
18.868K/20K
68.856K/70K
79.479K/80K
Org: 679.54K
Target size 80K w/ sd 400
80400
80300
80200
80100
80000
79900
79800
Image Data
Image Data
1 3 5 7 9 11131517192123252729
Analytic
§ Solution #1
- The background
- Background need to be filled when twisted
- Complementary color or patterns can be detected
§ Solution #2
- The boundary
- Twisted image got longer boundary § Solution #3
- The differences
- Side images are tended to converge to original
image,
- Σ(|Δ(img[i]–img[i+1])|)convergestominimum near correct image
- Compare data uses all colour data
Do You Have Any Questions?
Contact us at: contact@nexusguard.com
Page § 30