Rough Beast: June 2013

Abstract: Information is predictable over longer and longer stretches of the future with greater and greater precision by the following scheme.

Shannon Entropy or Data Entropy is the measure of uncertainty in a string of symbols. A fair coin toss has high entropy. That is, in a series of tosses of a fair coin with the number of head and tails equal each subsequent choice is never dependent on the history of fair tosses. If the coin is not fair, the history of previous tosses will be a better and better forecaster of the result of the next toss. If in every series of 100 coin tosses there are between 53 and 57 heads then there exists a 55% chance that on any toss the result will be heads with a variability of plus or minus 2%. If you throw 500 coin tosses and you have 250 heads and 250 tails then this history is 'forcing' the future coin tosses to have more tails in any ongoing series of tosses. Notice that the very next toss has precisely a 55% chance of heads just as the very first toss had. But in a population of tosses the number of heads will "return to the mean".

Card counting in Black Jack is another example. Even with a multideck shoe, the variability in the predictions about the frequency of the appearance of face cards and regular cards decreases as cards are dealt. In the limit, when only one card remains we already know what the last card must be with 100% confidence and total predictability is the lowest entropy. But whether the first card dealt or the last, the information entropy gets lower as any sequence of cards over an interval appears and improves our odds of predicting next possible hands.

For packet switching machinery, some methods exist for lowering entropy already. A cache memory and code branch prediction of the CPU complex lowers the energy (at the same frequency) for predictable information in future calculations. In this way, the ability to forecast the behaviors of the software help forecast the kind of information -- lowering information entropy or uncertainty in the OS and user code. We have not considered (yet) how we can reduce entropy in Layer 1 and Layer 2 bit fiddling 'directly' with enqueueing and dequeueing headers and payload.

Improving 'directly' the manipulations of the information stream is what I call a drill bit through time. The parts of the machine dedicated to 'moving the program counter' is the drill-head. Guesses about the next series of steps -- better guesses on a forward looking series of predictions -- bring the future towards us with incrementally small but still important ability for preparations made now about how to read the information with lower and lower energy per bit in the incoming information stream. While the drill-head is getting better or worse on cache hit/miss and branch taken/not-taken guesses about the future from the state of the art in hardware, standard software tricks are improving OS and compiler tricks. These good tricks are now in large part 'forced moves' as open source levels the playing field by giving everyone the same standard tool kit of good tricks.

This means that the physical entropy per bit of information (total watts - watts moving bits plus heat loss due to friction) is lowered as a smaller and tighter sub-machine or ensemble of such machines attacks the problem of guessing what happens next in the bit stream. That is, the cost of making guesses about the bit streams future content gets lower and lower as the ensemble of compute means becomes better and better at purchasing chucks of the future for lower and lower cost. In physical entropy terms, we can lower the frequency and/or the voltage and therefore the power to achieve the same result. Our total energy of operation is lower for the same work energy (we ground away at the same sequence of bits) with lower entropy.

Think of a head to head competition between two variations of a drill bit through time. Drill Bit "A" and Drill Bit "B" each fit in the drill head and preform otherwise identical work. "A" has more moving parts (or spreads the work load more poorly or the standard kinds of inefficiencies in computing machinery) than "B" and there is a cost for silicon and watts required to operate "A" w.r.t. "B".

"B" is more efficient than "A" because it has it has fewer transistors and circuit interconnections (we can cram more into each reticle) and because it requires fewer total state bit transitions (smaller and less energy with energy per state transition held constant). Because of these two factors (and a rank order of other good tricks yielding diminishing investment returns) "B" purchases 'chucks of time' with less energy and therefore lower entropy.

We are getting better and better at drilling through the bedrock of time.

Rough Beast

Rough Beast

I'm Following...

Followers

Blog Archive

About Me

Thursday, June 06, 2013

Time's Drillbit

Tuesday, June 04, 2013

A Theory of Everything

hail in NM