Skip to content
Go back

MD5 Algorithm

Published:  at  07:20 PM

The MD5 algorithm is cryptographically broken and unsuitable for further use. This article only seeks to examine the internal algorithm of MD5.

MD5 (Message Digest Algorithm 5) is a widely used hash function producing a 128-bit hash value. It has been found to suffer from extensive vulnerabilities, the most infamously is Flame malware. The authors of the Flame malware used an MD5 collision to forge a Windows code-signing certificate.

1. The MD5 Algorithm

MD5 input a variable length message and output a fixed 128-bit length digest.

We have to turn the input message into a fix-length message, we utilize the fix-length message to generate the 128-bit digest.

1.1 Padding the Input

The input message is broken into chunks of 512-bit blocks (this means if message is shorter than 512-bit, it will be turned into 1 512-bit chunk, if message is longer, it will be turned into multiple 512-bit chunk). This step is aim to turn all the message (regradless of its length) into a new input which length becomes an integer multiple of 512-bit.

Here is the padding algorithm:

  1. Append single bit 1 to the tail of the message
  2. Add enough bit 0 so that the length of the message up to 64 bits fewer than a multiple of 512.
  3. The last 64 bits representing the length of the original message, modulo 2642^{64}, store in little-endian.

For example:

If our original message is a which is 01100001 both in ASCII and UTF-8, its length is 8, then we do the padding:

  1. Append 1.

    The message turn into 011000011.

  2. Add enough bit 0.

    The current message length is 9, we need to pad to length up to 512 * k - 64, the smallest k is 1, so we need to pad 512 * 1 - 64 - 9 = 439 bit 0 to the message. The message turn into 011000011 | 000000...(total num is 439).

    the | only represent the seperation, just for clearly see the message’s data format.

  3. Add original message length in little-endian to the last 64 bits.

    The original message length is 8 (0x08 0x00 0x00 0x00 0x00 0x00 0x00 0x00) in little-endian.

    The message turn into 011000011 | 000000...(total num is 439) | 0800000000000000.

Here we get the padded message (several chunks of 512-bit).

1.2 Initialize the State

MD5 alghorithm maintain a 128-bit state, which can be seperated into A, B, C, D. This 128-bit state is initialized as fix number:

All the values are stored in little-endian.

1.3 Process Message Blocks

We use the previous step (padding the input into chunks) as blocks, and process it sequentially.

For example, we have 8 512-bit blocks (B_0, B_1 to B_7), then we process it one by one. After every process step, the new state of ABCD becomes the start state of next process step (the first state is initialized).

1.4 Process Single Message Block

Every block have size of 512, which considered as 16 words (every word have length of 32).

Each block will undergo four similar processes, which we call ‘rounds’.

Each round will undergo sixteen similar basic operations (because there are 16 words, one by one).

So each block will go through 64 basic operations. Here we will introduce the detail of basic opeartions.

1.5 The Basic Opeartions

The basic operation is based on word (which is part of a block with length of 32).

For each word, we can use a function to generate new state:

Temp=B+LeftRotate(RoundFunction(B,C,D)+A+M[currentIndex(i)]+T(i),s[i])\begin{aligned} \text{Temp} &= B + \text{LeftRotate}\Bigl( \text{RoundFunction}(B, C, D) \\ &\quad + A + M[\text{currentIndex}(i)] + T(i), \, s[i] \Bigr) \end{aligned}

Where:

LeftRotate(Value,ShiftAmount)\text{LeftRotate}(Value, ShiftAmount) is shift the 32-bit ValueValue to the left by ShiftAmountShiftAmount bits.

textRoundFunc(B,C,D)text{RoundFunc}(B, C, D) can be different formulas based on ii:

currentIndex(i)={imod160i<16(5i+1)mod1616i<32(3i+5)mod1632i<487imod1648i<64\text{currentIndex}(i) = \begin{cases} i \bmod 16 & 0 \le i < 16 \\ (5i + 1) \bmod 16 & 16 \le i < 32 \\ (3i + 5) \bmod 16 & 32 \le i < 48 \\ 7i \bmod 16 & 48 \le i < 64 \end{cases}

So we get the new state:

New State={{Ai+1}={D}{Bi+1}={C}{Ci+1}={B}{Di+1}={Temp}\text{New State} = \begin{cases} \{A_{i+1}\} &= \{D\} \\ \{B_{i+1}\} &= \{C\} \\ \{C_{i+1}\} &= \{B\} \\ \{D_{i+1}\} &= \{\text{Temp}\} \end{cases}

Through the basic operations, we cloud know that the core of MD5 is according the state A, B, C and D and utilize message word, Specific RoundFuncRoundFunc, constant and rotation function to calculate a new TempTemp number, then we get a new state, new state will be used as next operation’s initial state.

After undergo all the blocks (in blocks, undergo all the rounds), we get the final state of A, B, C and D. By concatinate the A, B, C and D, we get the MD5 result.



Previous Post
Top-p in LLM
Next Post
Content Delivery Network