Our first exercise: "how would you compute the parity of a very large number of 64-bit words"?

We are invited to consider the brute force, O(n) solution:

def parity(x):
	result = 0
    while x:
    	result ^= x & 1
        x >>= 1
    return result

Parity is one if the number of 1s in the word is one, else it is zero. Hence this function keeps a running total of the number of 1s mod 2, XOR-ing each bit in turn starting from the rightmost with an initial value of 0. Each time a 1 is encountered, the value of result will flip, until we run out of word.

The first optimization we are offered is a "bit-fiddling trick" - we can erase the lowest set bit in a word with x & (x - 1). By replacing one-op-per-bit with one-op-per-set-bit we can reduce the complexity to O(k), where k is the number of set bits; a saving in best- and average-cases.

Next we are invited to consider the possibilities of caching (as we are dealing with "a very large number" of words). We cannot cache the parity of every 64-bit integer as this would take 10 trillion exabytes of space. However as the parity computation is associative we can cache the parity of all 16-bit subwords (a mere 65536 possibilities) and lookup the results of a quarter of the word at a time, before checking the parity of those combined results.

We do need to establish how to derive the indexes for our lookup table. To retrieve the 16 most significant bits is simple... we right-shift 48 places. It's not quite so easy to get the next 16 bits - we can right shift 32 places, but then we have the 32 most significant bits, not the 16 we are interested in. The solution is to apply a mask to filter out anything but the 16 bits we are currently interested in. 65535 is a little unwieldy to convey in binary but we can declare our bit_mask to be 0xFFFF, and bitwise-and it with our shifted bits to get what we're after.

def parity(x):
    MASK_SIZE = 16
    BIT_MASK = 0xFFFF
    return (PRECOMPUTED_PARITY[x >> (3 * MASK_SIZE)] ^
            PRECOMPUTED_PARITY[(x >> (2 * MASK_SIZE)) & BIT_MASK] ^ 
            PRECOMPUTED_PARITY[(x >> MASK_SIZE) & BIT_MASK] ^
            PRECOMPUTED_PARITY[x & BIT_MASK])

This reduces our complexity from O(n) to O(n/16), once we have populated our lookup table of course.

The final suggestion is to assimilate that the commutativity and associativity of XOR allows us to process multiple bits at once. The parity of a 64-bit word is equal to the parity of a 32-bit word XOR-ed with a 32-bit word, which is equal to the parity of its first 16 bits XOR-ed with its last 16 bits, and so on. Essentially we halve the size of our problem on each pass, and to hopefully nobody's surprise end up have ourselves an O(log n) solution.

def parity(x):
    x ^= x >> 32
    x ^= x >> 16
    x ^= x >> 8
    x ^= x >> 4
    x ^= x >> 2
    x ^= x >> 1
    return x & 0x1 

By utilizing these ideas individually or in combination we can surely calculate the parity of a 16-bit binary word in efficient ways we previously never thought possible, and surely nothing stands between us and fame and fortune now.