Monday, March 13, 2017

Monday tech: Testing and bypassing CAPTCHAs

Hello everyone
My name is Gil Cohen, and I'm the CTO of Comsec.
For more than a year now, I publish internal technical emails to Comsec's consultants once every week.

Now I'm strarting a new tradition and start to publish these posts in Comsec's blogs. 
This week's first post will talk about breaking and testing CAPTCHAs.
We all know the annoying CAPTCHAs that are intended to stop automated scripts from overloading web interfaces.
As Pentesters, you need to check whether a CAPTCHA was implemented correctly or not. From my experience, I’ve written a CAPTCHA checking methodology of 8 different ways to bypass CAPTCHA. You need to do the steps in the order they are written as the first steps are easier to test and faster to bypass the CAPTCHA with.

Here are the 8 steps of CAPTCHA hacking:

  1. Client side CAPTHCA value: The first test is to check whether poor CAPTCHA implementation transmit the correct CAPTCHA value to the client side.
    I’ve seen several examples of CAPTCHA picture names that contained the answers, CAPTCHA validation that is being executed at the client side in JavaScript, and other funny client-related cases.

  2. Correctness flag (in the client side and the session): This searches for poor implementation where CAPTCHA correction flag (a Boolean indicating that the user guessed the CAPTCHA correctly or not) is transferred to the client either in the URL (something like, in the cookie or anywhere else. In addition there are buggy implementations of CAPTCHA that actually doesn’t initialize the correctness flags between guesses. You guess a CAPTCHA once, the session in the server side marks your session as a valid one and never checks any additional CAPTCHA value. Rare but can happen.
  3. CAPTCHA reuse: In this step you search for CAPTCHA implementation that fails to correctly initialize the correctness flag. Let’s say you have a contact-us page, that has a CAPTCHA in it, that is being loaded in a different HTTP request (for a picture - A very common practice), and this other request initialize all of the server side variables.
    So you might be able to correctly guess one CAPTCHA value, intercept and drop any other CAPTCHA creation request, and just reuse the correct first value again and again and again.
    The bug is that the correctness flag should be set to “CAPTCHA was not guessed correctly” upon CAPTCHA validation, not (only) upon CAPTCHA creation.
    When the contact-us form is submitted, the server should validate it and immediately flag the current session is non-CAPTCHA valid, no matter what the result is.
    If this is not done right after the validation and is only done in the CAPTCHA creation request, you can reuse values.
  4. CAPTCHA Replay: This step is very similar to the previous one, except the technical part where you don’t reuse a CAPTCHA value and send an additional request from the application’s GUI, you replay a valid request and send an identical copy of it again and again. Why do you need both steps? Because sometimes client values can change and affect this behavior – for example ViewState can be changed between requests. In rare cases you’ll see that CAPTCHA reuse is not working but replay is, so you have to test both.
  5. NULL\Empty CAPTCHA values: This nice step searches for poor implementation of unexpected missing input. Let’s say you have a parameter of the CAPTCHA value that is sent from the client to the server, and you have a client Javascript code that validates that the user inserts a value.
    What happens if the CAPTCHA value is empty? (You intercept it and drop the value) And what happens if you remove the CAPTCHA parameter name and value altogether? Can you bypass the CAPTCHA mechanism this way?
    In this step you have to first intercept and drop any CAPTCHA creation request (see CAPTCHA reuse), and then send an empty or non-existing value to check the behavior.
    There can be cases of NULL equals NULL (CAPTCHA was never created and was never sent to the server) which is true, or the case that allowed me to bypass all of the CAPTCHAs of the Israeli government websites, where there was a Boolean CAPTCHA correctness flag that was mistakenly initialized to true value, and when no CAPTCHA value was sent, the validation was never executed and the variable remained with the initial true value that allowed CAPTCHA bypass.
  6. CAPTCHA Repository: In this step you check whether CAPTCHAs are generated dynamically and randomly, or is there a repository of CAPTCHAs. If there is a repository, you can know the value by other characteristics of the image – for example a value of a certain Pixel or the HASH value of the binary data of the picture. I once found a repository in a gaming company’s system.
  7. CAPTCHA Reading: In this step you exploit accessibility features to bypass the CAPTCHA. If for example you have  feature that reads CAPTCHA value, you can test whether characters are being read in an identical form or not. In this example you can see that the values 0, 8, 2, 5 and 5 was being read, and that the 2 fives are identical, meaning you can guess the Correct value by inspecting the audio flow.
  8. CAPTCHA Optical Character Recognition: The last step that you try if all other step fails, is to pass the CAPTHCA image to an optical characters recognition algorithm. If the CAPTCHA image is not complicated enough, you can try to extract the value from it.
In the next post I’ll discuss Google’s ReCAPTCHA 2.0.

Have a great week :)