Endianness | Meursault's Blog

In computer science, endianness can be expressed as big-endian (BE) or little-endian (LE). Bi-endianness is a feature provided by many computer architectures that enables endianness to be switched.

Why are there different types of endianness?

The main reason we need to define big-endian and little-endian is because of the variety of systems and protocols. For instance, most data sent via network protocols is in big-endian order. However, PCs may not store bits in this order, which is a problem that programmers need to handle.

For a single byte (like a char), we don’t need to consider its endianness. Multibyte data (2-byte, 3-byte, etc.) will have the same endianness rules as the specific endianness.

Before we dive into endian rules, we have to know the least/most significant byte as prior knowledge.

1. Least Significant Byte

The least significant byte (LSB) is the byte containing the least significant bit.

For example, the least significant bit (LSB) of 0x12345678 is 0x78.

2. Most Significant Byte

The most significant byte (MSB) is the byte containing the most significant bit.

For example, the most significant bit (MSB) of 0x12345678 is 0x12.

3. Big-Endian

Big-Endian system stores the most significant byte of a word at the smallest the memory address and the least significant byte at the largest.

Big-Endian is consistent with the normal English reading order (Left-to-right).

If we want to store the number 0x12345678 in the memory address, here are what is BE doing:

Low     -->      High
+----+----+----+----+
| 12 | 34 | 56 | 78 |
+----+----+----+----+

4. Little-Endian

Little-Endian is more like Arabic or Hebrew language habit (Right-to-left).

For the previous example, the number is stored in:

Low     -->      High
+----+----+----+----+
| 78 | 56 | 34 | 12 |
+----+----+----+----+

5. Determine Endianness by Code

We could use simple C/C++ code and pointer to get your device’s endianness:

#include <iostream>

int main() {
    unsigned int i = 1; // Hex value: 0x00000001

    // Check the value of the first byte
    char *c = (char*)&i;

    if (*c == 1) {
        std::cout << "Little-endian" << std::endl;
    } else {
        std::cout << "Big-endian" << std::endl;
    }

    return 0;
}

For example, my Apple silicon macOS shows the following result:

$ ./endianness
Little-endian

6. Swap Endianness on Different Platform

When other platform’s endianness is different from current platform, byte swapping is required.

6.1 Swap 2 bytes

uint16_t swap_16bit(uint16_t value) {
    uint16_t swapped = 0;
    swapped |= (value & 0x00FF) << 8;
    swapped |= (value & 0xFF00) >> 8;
    return swapped;
}

6.2 Swap 4 bytes

uint32_t swap_32bit(uint32_t value) {
    uint32_t swapped = 0;
    swapped |= (value & 0x000000FF) << 24;
    swapped |= (value & 0x0000FF00) << 8;
    swapped |= (value & 0x00FF0000) >> 8;
    swapped |= (value & 0xFF000000) >> 24;
    return swapped;
}

7. Btye Generation Strategy

Depending on the stage of generating data, there are two types of strategy:

(Build stage) According to the target platform’s endianness, generate the data in the expected endianness.
(Load stage) If the current data’s endianness differs from that of the current platform, swap bytes while loading the data.