In computer science, endianness can be expressed as big-endian (BE) or little-endian (LE). Bi-endianness is a feature provided by many computer architectures that enables endianness to be switched.
Why are there different types of endianness?
The main reason we need to define big-endian and little-endian is because of the variety of systems and protocols. For instance, most data sent via network protocols is in big-endian order. However, PCs may not store bits in this order, which is a problem that programmers need to handle.
For a single byte (like a char
), we don’t need to consider its endianness. Multibyte data (2-byte, 3-byte, etc.) will have the same endianness rules as the specific endianness.
Before we dive into endian rules, we have to know the least/most significant byte as prior knowledge.
1. Least Significant Byte
The least significant byte (LSB) is the byte containing the least significant bit.
For example, the least significant bit (LSB) of 0x12345678
is 0x78
.
2. Most Significant Byte
The most significant byte (MSB) is the byte containing the most significant bit.
For example, the most significant bit (MSB) of 0x12345678
is 0x12
.
3. Big-Endian
Big-Endian system stores the most significant byte of a word at the smallest the memory address and the least significant byte at the largest.
Big-Endian is consistent with the normal English reading order (Left-to-right).
If we want to store the number 0x12345678
in the memory address, here are what is BE doing:
Low --> High
+----+----+----+----+
| 12 | 34 | 56 | 78 |
+----+----+----+----+
4. Little-Endian
Little-Endian is more like Arabic or Hebrew language habit (Right-to-left).
For the previous example, the number is stored in:
Low --> High
+----+----+----+----+
| 78 | 56 | 34 | 12 |
+----+----+----+----+
5. Determine Endianness by Code
We could use simple C/C++
code and pointer to get your device’s endianness:
#include <iostream>
int main() {
unsigned int i = 1; // Hex value: 0x00000001
// Check the value of the first byte
char *c = (char*)&i;
if (*c == 1) {
std::cout << "Little-endian" << std::endl;
} else {
std::cout << "Big-endian" << std::endl;
}
return 0;
}
For example, my Apple silicon macOS shows the following result:
$ ./endianness
Little-endian
6. Swap Endianness on Different Platform
When other platform’s endianness is different from current platform, byte swapping is required.
6.1 Swap 2 bytes
uint16_t swap_16bit(uint16_t value) {
uint16_t swapped = 0;
swapped |= (value & 0x00FF) << 8;
swapped |= (value & 0xFF00) >> 8;
return swapped;
}
6.2 Swap 4 bytes
uint32_t swap_32bit(uint32_t value) {
uint32_t swapped = 0;
swapped |= (value & 0x000000FF) << 24;
swapped |= (value & 0x0000FF00) << 8;
swapped |= (value & 0x00FF0000) >> 8;
swapped |= (value & 0xFF000000) >> 24;
return swapped;
}
7. Btye Generation Strategy
Depending on the stage of generating data, there are two types of strategy:
- (Build stage) According to the target platform’s endianness, generate the data in the expected endianness.
- (Load stage) If the current data’s endianness differs from that of the current platform, swap bytes while loading the data.