Foodie has eaten so much delicious food and decide to slim now >_<.

slimming

TL;DR: doit.py

The TAR archive contains two files; slimming which is an AMD64 ELF binary, and slimming_data which is more or less (but mostly less) random looking binary data.

Running the binary with no arguments prints a usage message:

1 $ ./slimming
2 Usage: slimming <input> <output>

And with arguments it converts a file to that random-y looking format:

1 $ echo foobar > /tmp/foo
2 $ ./slimming /tmp/foo /tmp/bar
3 Done!
4 $ phd /tmp/bar
5 00000000  30 6f 70 73  a9 43 c4 46  f3 30 d0 50  cc 33 fe 35  │0ops│·C·F│·0·P│·3·5│
6 00000010  c1 31                                               │·1│
7 00000012

After firing up IDA and looking at the binary for about five seconds I decide no, I’m going to blackbox this thing (It’s full of MMX ugliness).

So first off I create a python function to run a string through the program.

1 from pwn import *
2 
3 def blackbox(s):
4     write('/tmp/foo', s)
5     os.system('./slimming /tmp/foo /tmp/bar >/dev/null')
6     return read('/tmp/bar')

Tradition dictates that I must now trow A’s at it:

1 for n in range(1, 10):
2     s = 'A' * n
3     t = blackbox(s)
4     print s
5     print hexdump(t)

Output (formatted slightly):

 1 "A" ->
 2 00000000  30 6f 70 73  8e 43                                  │0ops│·C│
 3 "AA" ->
 4 00000000  30 6f 70 73  8e 43 ea 46                            │0ops│·C·F││
 5 "AAA" ->
 6 00000000  30 6f 70 73  8e 43 54 47                            │0ops│·CTG││
 7 "AAAA" ->
 8 00000000  30 6f 70 73  8e 43 54 47  dd 30                     │0ops│·CTG│·0│
 9 "AAAAA" ->
10 00000000  30 6f 70 73  8e 43 54 47  63 31                     │0ops│·CTG│c1│
11 "AAAAAA" ->
12 00000000  30 6f 70 73  8e 43 54 47  62 31                     │0ops│·CTG│b1│
13 "AAAAAAA" ->
14 00000000  30 6f 70 73  8e 43 54 47  62 31 f3 50               │0ops│·CTG│b1·P││
15 "AAAAAAAA" ->
16 00000000  30 6f 70 73  8e 43 54 47  62 31 4d 51               │0ops│·CTG│b1MQ││
17 "AAAAAAAAA" ->
18 00000000  30 6f 70 73  8e 43 54 47  62 31 4c 51               │0ops│·CTG│b1LQ││

There are few interesting points to notice already:

The last two points and a little bit of guessing is enough to de-obfuscate the output; First I guess that the output is xor’ed with a static stream before being written to the output file. Why xor? It always is.

So to recover the stream a known output is needed. Why is it that the first byte change by a lot and the second only by one bit? Since the program is doing compression it is natural to believe that it maintains some kind of dictionary of things that it has already seen; when it sees something it knows it just outputs a reference.

This model fits well with our data: The second byte changes from “literal” to “reference” and the first byte changes from the literal “A” to a reference. Also notice that references always encode strings longer that a single character (or else the last A would not be encoded as a literal).

Lastly the first pair will always be a literal since there will not yet be anything to reference.

I guess that “literal” is encoded as 0 — the actual value used internally in the program doesn’t matter.

The input strings that are of interest to me are those that first produce a new pair, that is “A”, “AA”, “AAAA”, “AAAAAAA”, …

Given the model I have guessed at, these strings must be encoded as:

There’s a pattern here: For n pairs the number of A’s encoded by the first n - 1 pairs is 1 + … + n - 1 = Tn-1 and the last pair then encodes a literal A.

Now we can recover the first, say 100, bytes of the xor stream:

1 stream = ''
2 for n in range(1, 51):
3     # T_(n-1) in closed form
4     m = (n * (n - 1)) / 2 + 1
5     t = blackbox('A' * m)
6     assert len(t) == 4 + n * 2
7     stream += xor(t[-2:], ('A', 0))
8 print hexdump(stream)

This is the result:

1 00000000  cf 43 ab 46  9c 30 b2 50  ad 33 8c 35  cb 31 98 4f  │·C·F│·0·P│·3·5│·1·O│
2 00000010  8d 31 8b 48  92 6d cc 4d  ba 64 cb 00  cf 43 ab 46  │·1·H│·m·M│·d··│·C·F│
3 00000020  9c 30 b2 50  ad 33 8c 35  cb 31 98 4f  8d 31 8b 48  │·0·P│·3·5│·1·O│·1·H│
4 00000030  92 6d cc 4d  ba 64 cb 00  cf 43 ab 46  9c 30 b2 50  │·m·M│·d··│·C·F│·0·P│
5 00000040  ad 33 8c 35  cb 31 98 4f  8d 31 8b 48  92 6d cc 4d  │·3·5│·1·O│·1·H│·m·M│
6 00000050  ba 64 cb 00  cf 43 ab 46  9c 30 b2 50  ad 33 8c 35  │·d··│·C·F│·0·P│·3·5│
7 00000060  cb 31 98 4f                                         │·1·O││
8 00000064

Looks like the stream is cyclic with a period of 28. I now define a more usable blackbox without the obfuscation:

1 def blackbox_inner(s):
2     stream = [207, 67, 171, 70, 156, 48, 178, 80, 173, 51, 140, 53, 203, 49,
3               152, 79, 141, 49, 139, 72, 146, 109, 204, 77, 186, 100, 203, 0]
4     t = blackbox(s)[4:]
5     return ordlist(xor(t, stream, cut = 'left'))

Running blackbox_inner('A' * 20) yields:

1 [65, 0, 255, 1, 254, 1, 254, 1]

Looks like the references are encoded as negative indexes into a dictionary of tokens.

Encoding “AAAAAAAAA” (nine A’s) must go something like this:

Of course the question now is “how is the table being build”. The token “AA” must have been added to the dictionary between outputting the 1st and 2nd pair.

Hypothesis: Each time a token is outputted a token consisting of itself and the letter just before it is added to the dictionary.

This model fits with the data, so I perform a few more tests to confirm or discard the hypothesis:

1 blackbox_inner('ABAB')
2   [65, 0, 66, 0, 255, 1]
3 blackbox_inner('ABABBAB')
4   [65, 0, 66, 0, 255, 1, 254, 1, 66, 0]

Oops, hypothesis discarded ☹. (The second “AB” would have added “BAB” to the dictionary, which was clearly not the case.)

Revised hypothesis: Each time a token is outputted a token consisting of the previous token and the first letter of the token being outputted is added to the dictionary.

Note that this model fits both the A-tests and the two tests above.

Testing:

1 blackbox_inner('ABABXABX')
2   [65, 0, 66, 0, 255, 1, 88, 0, 253, 1]

Yep, “ABX” (= “AB” + “X”) was in the dictionary.

Why is the reference 253? Because “BA” was added between the 2nd and 3rd pair:

1 blackbox_inner('ABABXABXBA')
2   [65, 0, 66, 0, 255, 1, 88, 0, 253, 1, 254, 1]

Two questions remain:

  1. What happens if the dictionary grows beyond 255 entries?
  2. Is the nth entry added to the dictionary before or after the nth pair is generated? That is, can a token reference itself?

The first one is easy to test:

1 s = ''.join(chr(x) for x in range(256)) + 'AAA'
2 blackbox_inner(s)
3   [..., 65, 0, 255, 2]

Aha, so the second component in the pairs is part of the dictionary index.

For the second question I design this test:

1 blackbox_inner('ABABABA')
2   [65, 0, 66, 0, 255, 1, 253, 1]

The last token added to the dictionary is “ABA” which is referenced in the last pair. This might seem a little odd; how would you know what the token is when you decompress? Actually it’s not a problem: when decompressing a pair which reference a token which was not yet added to the dictionary then the referenced token must be the one which are about to be added. That token is the previous outputted token plus the first letter of the current — but the first letter must then also be the first letter of the previous token!

In the example above, the last token references itself: the previous token was “AB”, so the last one must be “AB” + “AB”[0] = “ABA”.

I can now build a compression function:

 1 def compress_inner(s):
 2     d = []
 3     i = 0
 4     prev = None
 5     out = []
 6     while i < len(s):
 7         if prev:
 8             d.append(prev + s[i])
 9         j = i + 1
10         while j < len(s) and s[i:j+1] in d:
11             j += 1
12         token = s[i:j]
13         if len(token) > 1:
14             idx = d.index(token)
15             idxa = (255 - idx & 0xff)
16             idxb = (idx >> 8) + 1
17             out += [idxa, idxb]
18         else:
19             out += [ord(token), 0]
20         prev = token
21         i = j
22     return out
23 
24 def compress(s):
25     stream = [207, 67, 171, 70, 156, 48, 178, 80, 173, 51, 140, 53, 203, 49,
26               152, 79, 141, 49, 139, 72, 146, 109, 204, 77, 186, 100, 203, 0]
27     return '0ops' + xor(compress_inner(s), stream, cut = 'left')

And test it for correctness:

1 for _ in range(1000):
2     s = randoms(random.randint(1, 100))
3     assert blackbox(s) == compress(s)

And then, finally, a decompression function:

 1 def decompress_inner(xs):
 2     d = []
 3     prev = None
 4     out = ''
 5     for a, b in group(2, xs):
 6         print a, b
 7         print d
 8         if b == 0:
 9             token = chr(a)
10         else:
11             idxa = a
12             idxb = b
13             idx = (idxb - 1) << 8 | (255 - idxa)
14             if idx == len(d):
15                 token = prev + prev[0]
16             else:
17                 token = d[idx]
18         if prev:
19             d.append(prev + token[0])
20         prev = token
21         out += token
22     return out
23 
24 def decompress(s):
25     stream = [207, 67, 171, 70, 156, 48, 178, 80, 173, 51, 140, 53, 203, 49,
26               152, 79, 141, 49, 139, 72, 146, 109, 204, 77, 186, 100, 203, 0]
27     return decompress_inner(ordlist(xor(s[4:], stream, cut = 'left')))

Of course I test it too:

1 for _ in range(1000):
2     s = randoms(random.randint(1, 100))
3     assert decompress(compress(s)) == s

And the final step:

1 write('slimming_inner', decompress(read('slimming_data')))

The result, slimming_inner, is an extremely small 32bit ELF. It is the simplest crackme you have ever seen:

 1 $ objdump -dMintel slimming_inner
 2 slimming_inner:     file format elf32-i386
 3 
 4 
 5 Disassembly of section .text:
 6 
 7 08048054 <.text>:
 8  8048054:	83 ec 40             	sub    esp,0x40
 9  8048057:	ba 40 00 00 00       	mov    edx,0x40
10  804805c:	89 e1                	mov    ecx,esp
11  804805e:	bb 00 00 00 00       	mov    ebx,0x0
12  8048063:	b8 03 00 00 00       	mov    eax,0x3
13  8048068:	cd 80                	int    0x80
14  804806a:	eb 17                	jmp    0x8048083
15  804806c:	89 c2                	mov    edx,eax
16  804806e:	89 e1                	mov    ecx,esp
17  8048070:	bb 01 00 00 00       	mov    ebx,0x1
18  8048075:	b8 04 00 00 00       	mov    eax,0x4
19  804807a:	cd 80                	int    0x80
20  804807c:	b8 01 00 00 00       	mov    eax,0x1
21  8048081:	cd 80                	int    0x80
22  8048083:	31 c0                	xor    eax,eax
23  8048085:	8a 04 24             	mov    al,BYTE PTR [esp]
24  8048088:	83 e8 30             	sub    eax,0x30
25  804808b:	0f 85 b0 02 00 00    	jne    0x8048341
26  8048091:	31 c0                	xor    eax,eax
27  8048093:	8a 44 24 01          	mov    al,BYTE PTR [esp+0x1]
28  8048097:	83 e8 63             	sub    eax,0x63
29  804809a:	0f 85 a1 02 00 00    	jne    0x8048341
30  ...
31  8048309:	eb 00                	jmp    0x804830b
32  804830b:	c6 04 24 43          	mov    BYTE PTR [esp],0x43
33  804830f:	c6 44 24 01 6f       	mov    BYTE PTR [esp+0x1],0x6f
34  8048314:	c6 44 24 02 72       	mov    BYTE PTR [esp+0x2],0x72
35  8048319:	c6 44 24 03 72       	mov    BYTE PTR [esp+0x3],0x72
36  804831e:	c6 44 24 04 65       	mov    BYTE PTR [esp+0x4],0x65
37  8048323:	c6 44 24 05 63       	mov    BYTE PTR [esp+0x5],0x63
38  8048328:	c6 44 24 06 74       	mov    BYTE PTR [esp+0x6],0x74
39  804832d:	c6 44 24 07 0a       	mov    BYTE PTR [esp+0x7],0xa
40  8048332:	c6 44 24 08 00       	mov    BYTE PTR [esp+0x8],0x0
41  8048337:	b8 08 00 00 00       	mov    eax,0x8
42  804833c:	e9 2b fd ff ff       	jmp    0x804806c
43  8048341:	c6 04 24 4e          	mov    BYTE PTR [esp],0x4e
44  8048345:	c6 44 24 01 6f       	mov    BYTE PTR [esp+0x1],0x6f
45  804834a:	c6 44 24 02 2e       	mov    BYTE PTR [esp+0x2],0x2e
46  804834f:	c6 44 24 03 4e       	mov    BYTE PTR [esp+0x3],0x4e
47  8048354:	c6 44 24 04 6f       	mov    BYTE PTR [esp+0x4],0x6f
48  8048359:	c6 44 24 05 2e       	mov    BYTE PTR [esp+0x5],0x2e
49  804835e:	c6 44 24 06 2e       	mov    BYTE PTR [esp+0x6],0x2e
50  8048363:	c6 44 24 07 4e       	mov    BYTE PTR [esp+0x7],0x4e
51  8048368:	c6 44 24 08 6f       	mov    BYTE PTR [esp+0x8],0x6f
52  804836d:	c6 44 24 09 2e       	mov    BYTE PTR [esp+0x9],0x2e
53  8048372:	c6 44 24 0a 2e       	mov    BYTE PTR [esp+0xa],0x2e
54  8048377:	c6 44 24 0b 2e       	mov    BYTE PTR [esp+0xb],0x2e
55  804837c:	c6 44 24 0c 0a       	mov    BYTE PTR [esp+0xc],0xa
56  8048381:	c6 44 24 0d 00       	mov    BYTE PTR [esp+0xd],0x0
57  8048386:	b8 0d 00 00 00       	mov    eax,0xd
58  804838b:	e9 dc fc ff ff       	jmp    0x804806c

The flag is: 0ctf{AdD_15t_cHar_t0_7h3_eNd_wHEh_3xCept10n}


(To top)

Comments