Foodie has eaten so much delicious food and decide to slim now >_<.

slimming

TL;DR: doit.py

The TAR archive contains two files; slimming which is an AMD64 ELF binary, and slimming_data which is more or less (but mostly less) random looking binary data.

Running the binary with no arguments prints a usage message:

And with arguments it converts a file to that random-y looking format:

After firing up IDA and looking at the binary for about five seconds I decide no, I’m going to blackbox this thing (It’s full of MMX ugliness).

So first off I create a python function to run a string through the program.

Tradition dictates that I must now trow A’s at it:

Output (formatted slightly):

There are few interesting points to notice already:

• The output always starts with “0ops”.

• Bytes 5 and 6 are 8e 43 in all my tests.

• The output grows by increments of two. This suggests that the output bytes should be interpreted as pairs.

• The output grows slower as more A’s are added. This suggest we’re dealing with a compression program (which also fits well withe the name of the challenge).

• The output seems to be obfuscated on a per-byte basis, and the obfuscation is static. If not, the output would have changed for the different inputs.

• A given pair will change as the input grows. After a certain number of changes the pair stays the same. The second byte in the pair only changes the first time, and only its LSB is changed. The first byte changes much the first time, and then only a little on subsequent inputs.

The last two points and a little bit of guessing is enough to de-obfuscate the output; First I guess that the output is xor’ed with a static stream before being written to the output file. Why xor? It always is.

So to recover the stream a known output is needed. Why is it that the first byte change by a lot and the second only by one bit? Since the program is doing compression it is natural to believe that it maintains some kind of dictionary of things that it has already seen; when it sees something it knows it just outputs a reference.

This model fits well with our data: The second byte changes from “literal” to “reference” and the first byte changes from the literal “A” to a reference. Also notice that references always encode strings longer that a single character (or else the last A would not be encoded as a literal).

Lastly the first pair will always be a literal since there will not yet be anything to reference.

I guess that “literal” is encoded as 0 — the actual value used internally in the program doesn’t matter.

The input strings that are of interest to me are those that first produce a new pair, that is “A”, “AA”, “AAAA”, “AAAAAAA”, …

Given the model I have guessed at, these strings must be encoded as:

• “A”: [Lit "A"]

• “AA”: [Lit "A"​, Lit "A"]

• “AAAA”: [Lit "A"​, Ref "AA"​, Lit "A"]

• “AAAAAAA”: [Lit "A"​, Ref "AA"​, Ref "AAA"​, Lit "A"]

There’s a pattern here: For n pairs the number of A’s encoded by the first n - 1 pairs is 1 + … + n - 1 = Tn-1 and the last pair then encodes a literal A.

Now we can recover the first, say 100, bytes of the xor stream:

This is the result:

Looks like the stream is cyclic with a period of 28. I now define a more usable blackbox without the obfuscation:

Running blackbox_inner('A' * 20) yields:

Looks like the references are encoded as negative indexes into a dictionary of tokens.

Encoding “AAAAAAAAA” (nine A’s) must go something like this:

• 1st pair:
• Is “AA” in the dictionary? No.
• Output literal “A”: ('A'​, 0)
• 2nd pair:
• Is “AA” in the dictionary? Yes.
• Is “AAA” in the dictionary? No.
• Output reference to “AA”: (-1​, 0)
• 3rd pair:
• Is “AA” in the dictionary? Yes.
• Is “AAA” in the dictionary? Yes.
• Is “AAAA” in the dictionary? No.
• Output reference to “AAA”: (-2​, 0)
• 4th pair:
• Is “AA” in the dictionary? Yes.
• Is “AAA” in the dictionary? Yes.
• Is “AAAA” in the dictionary? No.
• Output reference to “AAA”: (-2​, 0)

Of course the question now is “how is the table being build”. The token “AA” must have been added to the dictionary between outputting the 1st and 2nd pair.

Hypothesis: Each time a token is outputted a token consisting of itself and the letter just before it is added to the dictionary.

This model fits with the data, so I perform a few more tests to confirm or discard the hypothesis:

Oops, hypothesis discarded ☹. (The second “AB” would have added “BAB” to the dictionary, which was clearly not the case.)

Revised hypothesis: Each time a token is outputted a token consisting of the previous token and the first letter of the token being outputted is added to the dictionary.

Note that this model fits both the A-tests and the two tests above.

Testing:

Yep, “ABX” (= “AB” + “X”) was in the dictionary.

Why is the reference 253? Because “BA” was added between the 2nd and 3rd pair:

Two questions remain:

1. What happens if the dictionary grows beyond 255 entries?
2. Is the nth entry added to the dictionary before or after the nth pair is generated? That is, can a token reference itself?

The first one is easy to test:

Aha, so the second component in the pairs is part of the dictionary index.

For the second question I design this test:

The last token added to the dictionary is “ABA” which is referenced in the last pair. This might seem a little odd; how would you know what the token is when you decompress? Actually it’s not a problem: when decompressing a pair which reference a token which was not yet added to the dictionary then the referenced token must be the one which are about to be added. That token is the previous outputted token plus the first letter of the current — but the first letter must then also be the first letter of the previous token!

In the example above, the last token references itself: the previous token was “AB”, so the last one must be “AB” + “AB”[0] = “ABA”.

I can now build a compression function:

And test it for correctness:

And then, finally, a decompression function:

Of course I test it too:

And the final step:

The result, slimming_inner, is an extremely small 32bit ELF. It is the simplest crackme you have ever seen:

The flag is: 0ctf{AdD_15t_cHar_t0_7h3_eNd_wHEh_3xCept10n}

(To top)