Reversing bit sequences

From WikiCoder

Reverse bits the obvious way

 unsigned int v;     // input bits to be reversed
 unsigned int r = v; // r will be reversed bits of v; first get LSB of v
 int s = sizeof(v) * CHAR_BIT - 1; // extra shift needed at end
 for (v >>= 1; v; v >>= 1)
 {   
  r <<= 1;
  r |= v & 1;
  s--;
 }
 r <<= s; // shift when v's highest bits are zero

Reverse bits in word by lookup table

 static const unsigned char BitReverseTable256[256] = 
 {
 #   define R2(n)     n,     n + 2*64,     n + 1*64,     n + 3*64
 #   define R4(n) R2(n), R2(n + 2*16), R2(n + 1*16), R2(n + 3*16)
 #   define R6(n) R4(n), R4(n + 2*4 ), R4(n + 1*4 ), R4(n + 3*4 )
    R6(0), R6(2), R6(1), R6(3)
 };
 unsigned int v; // reverse 32-bit value, 8 bits at time
 unsigned int c; // c will get v reversed

 // Option 1:
 c = (BitReverseTable256[v & 0xff] << 24) | 
    (BitReverseTable256[(v >> 8) & 0xff] << 16) | 
    (BitReverseTable256[(v >> 16) & 0xff] << 8) |
    (BitReverseTable256[(v >> 24) & 0xff]);

 // Option 2:
 unsigned char * p = (unsigned char *) &v;
 unsigned char * q = (unsigned char *) &c;
 q[3] = BitReverseTable256[p[0]]; 
 q[2] = BitReverseTable256[p[1]]; 
 q[1] = BitReverseTable256[p[2]]; 
 q[0] = BitReverseTable256[p[3]];

The first method takes about 17 operations, and the second takes about 12, assuming your CPU can load and store bytes easily.

Reverse the bits in a byte with 3 operations (64-bit multiply and modulus division)

 unsigned char b; // reverse this (8-bit) byte
 b = (b * 0x0202020202ULL & 0x010884422010ULL) % 1023;

The multiply operation creates five separate copies of the 8-bit byte pattern to fan-out into a 64-bit value. The AND operation selects the bits that are in the correct (reversed) positions, relative to each 10-bit groups of bits. The multiply and the AND operations copy the bits from the original byte so they each appear in only one of the 10-bit sets. The reversed positions of the bits from the original byte coincide with their relative positions within any 10-bit set. The last step, which involves modulus division by 2^10 - 1, has the effect of merging together each set of 10 bits (from positions 0-9, 10-19, 20-29, ...) in the 64-bit value. They do not overlap, so the addition steps underlying the modulus division behave like or operations.

Reverse the bits in a byte with 4 operations (64-bit multiply, no division)

 unsigned char b; // reverse this byte
 b = ((b * 0x80200802ULL) & 0x0884422110ULL) * 0x0101010101ULL >> 32;

The following shows the flow of the bit values with the boolean variables a, b, c, d, e, f, g, and h, which comprise an 8-bit byte. Notice how the first multiply fans out the bit pattern to multiple copies, while the last multiply combines them in the fifth byte from the right.

                                                                                        abcd efgh (-> hgfe dcba)
*                                                      1000 0000  0010 0000  0000 1000  0000 0010 (0x80200802)
-------------------------------------------------------------------------------------------------
                                            0abc defg  h00a bcde  fgh0 0abc  defg h00a  bcde fgh0
&                                           0000 1000  1000 0100  0100 0010  0010 0001  0001 0000 (0x0884422110)
-------------------------------------------------------------------------------------------------
                                            0000 d000  h000 0c00  0g00 00b0  00f0 000a  000e 0000
*                                           0000 0001  0000 0001  0000 0001  0000 0001  0000 0001 (0x0101010101)
-------------------------------------------------------------------------------------------------
                                            0000 d000  h000 0c00  0g00 00b0  00f0 000a  000e 0000
                                 0000 d000  h000 0c00  0g00 00b0  00f0 000a  000e 0000
                      0000 d000  h000 0c00  0g00 00b0  00f0 000a  000e 0000
           0000 d000  h000 0c00  0g00 00b0  00f0 000a  000e 0000
0000 d000  h000 0c00  0g00 00b0  00f0 000a  000e 0000
-------------------------------------------------------------------------------------------------
0000 d000  h000 dc00  hg00 dcb0  hgf0 dcba  hgfe dcba  hgfe 0cba  0gfe 00ba  00fe 000a  000e 0000
>> 32
-------------------------------------------------------------------------------------------------
                                            0000 d000  h000 dc00  hg00 dcb0  hgf0 dcba  hgfe dcba  
&                                                                                       1111 1111
-------------------------------------------------------------------------------------------------
                                                                                        hgfe dcba

Note that the last two steps can be combined on some processors because the registers can be accessed as bytes; just multiply so that a register stores the upper 32 bits of the result and the take the low byte. Thus, it may take only 6 operations.

Reverse the bits in a byte with 7 operations (no 64-bit)

 b = ((b * 0x0802LU & 0x22110LU) | (b * 0x8020LU & 0x88440LU)) * 0x10101LU >> 16;

Make sure you assign or cast the result to an unsigned char to remove garbage in the higher bits.

Reverse an N-bit quantity in parallel in 5 * lg(N) operations

 unsigned int v; // 32-bit word to reverse bit order
 // swap odd and even bits
 v = ((v >> 1) & 0x55555555) | ((v & 0x55555555) << 1);
 // swap consecutive pairs
 v = ((v >> 2) & 0x33333333) | ((v & 0x33333333) << 2);
 // swap nibbles ... 
 v = ((v >> 4) & 0x0F0F0F0F) | ((v & 0x0F0F0F0F) << 4);
 // swap bytes
 v = ((v >> 8) & 0x00FF00FF) | ((v & 0x00FF00FF) << 8);
 // swap 2-byte long pairs
 v = ( v >> 16             ) | ( v               << 16);

The following variation is also O(lg(N)), however it requires more operations to reverse v. Its virtue is in taking slightly less memory by computing the constants on the fly.

 unsigned int s = sizeof(v) * CHAR_BIT; // bit size; must be power of 2 
 unsigned int mask = ~0;         
 while ((s >>= 1) > 0) 
 {
  mask ^= (mask << s);
  v = ((v >> s) & mask) | ((v << s) & ~mask);
 }

These methods above are best suited to situations where N is large. If you use the above with 64-bit ints (or larger), then you need to add more lines (following the pattern); otherwise only the lower 32 bits will be reversed and the result will be in the lower 32 bits.

References