The MD5 algorithm is cryptographically broken and unsuitable for further use. This article only seeks to examine the internal algorithm of MD5.
MD5 (Message Digest Algorithm 5) is a widely used hash function producing a 128-bit hash value. It has been found to suffer from extensive vulnerabilities, the most infamously is Flame malware. The authors of the Flame malware used an MD5 collision to forge a Windows code-signing certificate.
1. The MD5 Algorithm
MD5 input a variable length message and output a fixed 128-bit length digest.
We have to turn the input message into a fix-length message, we utilize the fix-length message to generate the 128-bit digest.
1.1 Padding the Input
The input message is broken into chunks of 512-bit blocks (this means if message is shorter than 512-bit, it will be turned into 1 512-bit chunk, if message is longer, it will be turned into multiple 512-bit chunk). This step is aim to turn all the message (regradless of its length) into a new input which length becomes an integer multiple of 512-bit.
Here is the padding algorithm:
- Append single bit
1
to the tail of the message - Add enough bit
0
so that the length of the message up to 64 bits fewer than a multiple of 512. - The last 64 bits representing the length of the original message, modulo , store in little-endian.
For example:
If our original message is a
which is 01100001
both in ASCII and UTF-8, its length is 8, then we do the padding:
-
Append
1
.The message turn into
011000011
. -
Add enough bit
0
.The current message length is
9
, we need to pad to length up to512 * k - 64
, the smallestk
is 1, so we need to pad512 * 1 - 64 - 9 = 439
bit0
to the message. The message turn into011000011 | 000000...(total num is 439)
.the
|
only represent the seperation, just for clearly see the message’s data format. -
Add original message length in little-endian to the last 64 bits.
The original message length is
8
(0x08 0x00 0x00 0x00 0x00 0x00 0x00 0x00
) in little-endian.The message turn into
011000011 | 000000...(total num is 439) | 0800000000000000
.
Here we get the padded message (several chunks of 512-bit).
1.2 Initialize the State
MD5 alghorithm maintain a 128-bit state, which can be seperated into A
, B
, C
, D
. This 128-bit state is initialized as fix number:
- A =
0x67425301
- B =
0xEDFCBA45
- C =
0x98CBADFE
- D =
0x13DCE476
All the values are stored in little-endian.
1.3 Process Message Blocks
We use the previous step (padding the input into chunks) as blocks, and process it sequentially.
For example, we have 8 512-bit blocks (B_0
, B_1
to B_7
), then we process it one by one. After every process step, the new state of ABCD
becomes the start state of next process step (the first state is initialized).
1.4 Process Single Message Block
Every block have size of 512
, which considered as 16 words (every word have length of 32).
Each block will undergo four similar processes, which we call ‘rounds’.
Each round will undergo sixteen similar basic operations (because there are 16 words, one by one).
So each block will go through 64 basic operations. Here we will introduce the detail of basic opeartions.
1.5 The Basic Opeartions
The basic operation is based on word (which is part of a block with length of 32
).
For each word, we can use a function to generate new state:
- is the current num of operation (from
0
to63
). - is the current 512-bit block (from
input[0]
toinput[15]
). - is a fixed 32-bit constant number.
- is the predefined left rotate number.
Where:
is shift the 32-bit to the left by bits.
can be different formulas based on :
-
If :
-
If :
-
If :
-
If :
So we get the new state:
Through the basic operations, we cloud know that the core of MD5 is according the state A
, B
, C
and D
and utilize message word, Specific , constant and rotation function to calculate a new number, then we get a new state, new state will be used as next operation’s initial state.
After undergo all the blocks (in blocks, undergo all the rounds), we get the final state of A
, B
, C
and D
. By concatinate the A
, B
, C
and D
, we get the MD5 result.