Origins and Context
This particular idea is contrived from a level in a CTF I have been participating in recently, sponsored by the SANS Institute, called CyberFastTrack US. I can’t remember where exactly, but the idea came from a level somewhere in the middle of the competition. It is based on a concept that we can pull ASCII from a compiled file with the strings command and if the programmer carelessly used plaintext passwords or other sensitive data in their code, we can then redirect that back into the input arguments of the same program using a pipe to the xargs command. The result is a dirty little redirection/word-list/cracking attack that is incredibly efficient. This code vulnerability could be compared to the author distributing an extremely short word list with their, “secure,” program.
Note: With the exception of Dan J. Bernstein’s open-source hashing algorithm, I wrote all of the example code in the screenshots below. Everything is C syntax, compiled in GCC 4.8.5.
Let’s Get Down in the Weeds…
Here’s What We’re Working With
The files observed in Figure 1.1 were created using Nano, on Centos 7. The *.c files are C code, and the respective *.elf counterparts are those same files compiled with GCC 4.8.5. crackme.elf is the vulnerable file we will exploit, and the youcantcrackthis.elf file has the same functionality but instead includes a simple hashing implementation.
Now in Figure 1.2, please observe the variance in the code samples. Specifically, note the implementation of Dan J. Bernstein’s open-source hashing algorithm, and the change from a strcmp of command line argument and the plaintext “password123”, to that parameter and now the hash of “password123.” If you are not familiar with C code, the reason we are using a strcmp is that we have to compare numerical values, not strings. Strcmp is a standard function in C which subtracts the binary equivalent of one string from another. If they equal, we obviously have a return of 0. For that reason, you can see the “if ?(strcmp) == 0,” idea return a successful login. This is standard methodology and to my knowledge the most efficient means of comparison. Please understand that this is a basic example application.
Lastly, take a moment to review the hashing function, isolated in djb_hashexample.*, which can be observed in Figure 1.3 below. This simply proves that our comparison is to the actual valid hash of “password123,” and that it is a true hashing algorithm in the sense that obvious collisions aren’t present. The specific constants chosen by Bernstein, 33 and 5381, are logically used for their mathematical properties. The hashing algorithm is similar to a Linear Congruential Generator, which in short verbiage is a class that generates pseudo-random numbers. Further discussion can be found on StackOverflow: https://stackoverflow.com/questions/1579721/why-are-5381-and-33-so-important-in-the-djb2-algorithm
Where’s the Vulnerability?
Why is using plaintext in critical binary comparisons a HUGE issue?? It comes down to how compilation works. When a compiler has code that it needs to turn into a binary executable, it takes all of the logic and commands, such as if, or, class, int, main, and turns them into the common binary opcodes for a given processor. For example, an if statement would turn into a combination of a JE (jump if equal) statement or the opcode 0x74 on an Intel x86 processor, and the related addresses and code. It’s the related addresses and ASCII that we care about. The comparator never gets compiled into anything but ASCII, so if the programmer uses the password in plaintext, it then appears in the compiled binary in the equivalent ASCII representation. In Figure 2.1 you can see the comparison between the non-hashed, and hashed strings output. The pipe to pr is simply for column formatting.
Strings is a command included in most Linux distributions to extract ASCII from a compiled file. It ignores opcodes and pulls only what is human readable. IE: bytes which fall within the printable ASCII range: 32 (0x20) to 255 (0xFF). Please note the appearance of “password123,” on the unhashed side, and only the hashed value on the right. An attacker could just find the value here if they look close enough. We can now automate this process into an attack with xargs. Hashing allows us to disguise the password value.
Just to prove that both programs function identically, see the comparison in Figure 2.2.
Let’s Break it!
Now for the fun part. Let’s take a look at a dirty little trick to redirect this ASCII output back into the program as input. This will effectively test every single output from the strings command, segregated by a carriage return, run in its own instance of the example application. The strings syntax will not change, but please observe the xargs component. We pipe the data from strings into xargs, and it loops the following operation.
$ strings <filetobreak> | xargs -n <number of parameters> <filetobreak>
Once the application finally hits the password that is correct, it is the equivalent of running the following syntax. In our case, when the input is “password123,” we gain authorized access. Note that in the hashed version, it never finds the ASCII, and we are denied.
$ ./<filetobreak> <thepassword>
Hashing or otherwise encrypting/encoding critical data is a vital component of code security. The best case scenario is to never perform any sort of authentication client-side. If you can make secure SQL requests or by literally any other means pull data from a database with integrity, you avoid this concern entirely. With that said, hex analysis and therefore derived ASCII is often an alarming component of the software fuzzing and pen-testing process. I would ultimately classify this attack, if you even want to call it that, as an appendage of fuzzing. It is the responsibility of no-one but us to write secure code. Ultimately remember: an attacker only has to find one vulnerability in an application, we have to patch them all.
Remain vigilant–thank you for your time.
Patching and Mitigation Concepts
I’d like to finish with a quick consideration of mitigation concepts and a few patching ideas which have come to mind, aside from the obviously preferred method of avoidance.
(1) If we cant pull requests from a server, say it doesn’t exist, the next best solution is asymmetric encryption. While this is its own challenge, symmetric encryption could be defeated by an intelligent adversary quite easily. With symmetric encryption, that key is going to be embedded in the code, and it will likely be pulled with the strings output. It could then be used alongside some reverse-engineering to break the login form.
(2) Similar to how normal password security works, we could try to implement a timeout. The concern with this is that again, everything is local. If the attacker does perform dynamic forensics on the file, they can find out where the application is putting its temp files and just delete them. It’s a decent consideration but easily overcome.
(3) Xargs was the focus of this attack, but it really isn’t necessary if the attacker looks through the hex or the strings output and just notices your password or sensitive data. This would also counter the second mitigation strategy as the timeout and attempt-based lock would no longer be of any considerable relevance.