21Jan2002 The information paradigm, a lighthearted philosophical text enquiring into the nature of life and the informational reality it finds itself embedded within. Ch.0: Layout. What this text is meant to do : define the human experience as fundamentally one arising as a result of humanity's fundamental nature as an information system. Many of us are being told that we live in the information age, though if we ask a person what information actually is, we will often get a whole range of answers, each with its own nuance. It's the description and instructions on the label of the can of chicken soup. We send it to each other when we speak, through the air when we're nearby, or via electromagnetic means when we use telephones, or on flattened bits of dead trees when we use the postal service, or by changing patterns of light reflecting off our heads when we smile or pull a face or gesticulate at someone. It's the stuff which makes the content of a library different to the content of a high-rise apartment. It's the stuff which my computer messes up for me with greater speed and precision than I could previously mess up with pens, paper and filing cabinets. It's the stuff our brains spend all day thinking about. It's the how to wash directions on our shirt label, which we don't have since we ripped the label off due to its constant irritation of our neck. Its NOT in the instruction booklet which came with our video recorder, which still thinks the time is midday - guff inside VCR manuals is raw data, but it's not information to a lot of people, since it is data in a format they can't put it together in a useful relationship which then enables them to record The Matrix on channel 31 at 20:30 hours next Wednesday night so they can watch Sneakers in the foreground. All of these answers are right in one sense or another. What I tend not to hear is, that information is the stuff which makes matter interesting, information is the stuff in our DNA which makes us different to chickens, fungus or viruses, information is the stuff that our cells sit around all day acting upon and generating, information transactions are what binds the ecosystem together, information is the stuff in mathematical equations, information is what flows amongst our economies, making it possible for them to work at all (as opposed to elections, which are largely irrelevant) and in corporate economies, money (an information-depleted scalar quantity) is what determines how ownership/control decisions are made, and that the behaviour and nature of information is fundamental to the way these systems work, fail, or evolve. I will attempt to describe and cover some of these things not so much from an experimental point ov view, and instead try to present it from an experiential point of view - the personality we have is to some extent a product of our experiences, after all - and this book is a product of a personality which thinks it is immersed in an information system we call daily life and which tries to understand it in an information systemic sense. Ch 1 starts off from a mathematical level and proceeds to a molecular level to express some general insights about information, then goes on to define it in terms of concepts known to physics and chemistry. Ch 2 goes on to describe the total information content of a system. Ch 3 then explains what information processing systems are and what they do. Very simply. Feynman's example of compressing a syringe. Rocks in or out of a bucket. Ch 4 describes information systems in general and Ch 5 then explains how life fits this general paradigm. Living systems are information processors and their code, if one assumes it to be a partly finished product of a self-beneficial process of optimisation, should exhibit certain optimisations. The modularity of proteins (each protein does one or very few specific functions). The parallelism of ribosomes. The error tolerance and intelligent failure modes of DNA. The modularity of the circulation system (hence, replaceable organs but specific organ failures will kill you. Distributed organs=blood, immune system) Ch 6 explains then about the information systems we've built, which is useful for comparitive reasons... we've decoded, emulated and obsoleted ourselves! Ch 7 discusses consequences of this. Ch 8 explores the concept of death in information systemic terms. Massive information loss. Ch 9 then describes what the logical extension of information systemic approaches to living systems implies in terms of getting off this planet (it must happen): humanity is only a step on the evolutionary road, and in the long term is well doomed. Erosion of the Gaian codebase and the extreme difficulty of rewriting it to replace extinct species. Note also that proteins represent modularity in programming (has benefits in terms of maintainability... I cannot think of a monolithic system in nature, though polycistronic genes, or genes with extensive post-processing, might represent the remnants of the way an early inimimalist information and energy metabolism used to run). which is part of the reason geneticists are able to tinker with it at all. Ch 10 explores the likelihood of information systemic evolutionary paths elsewhere being essentially similar end-results of darwinian evolutionary selection, Ch 0: We humans conceive of ourselves in a number of different ways, depending on which culture we find ourselves immersed within. These cultures contain different tools which enable us to perform this feat of self-conception. Some of us concieve of ourselves in terms of our interactions with other people or in terms of their roles within various forms of social organisations. Others do so in terms of the religious or political beliefs which they have absorbed or concluded from their analysis of various problems and synthesis of various facts into a useful framework. Yet others might think of themselves in ways which would never occur to the rest of us, since we lack the interpretative framework within which to understand it from their particular point of view. These people become the architects of their own realities and invite other people inside their personal models of the world to observe the outsider's reaction. I will, for the duration of this book, consider myself as such a person, and try to equip you with the tools to understand the universe in the manner I do. Methods of acquiring an identity vary in their effectiveness as tools for forming an accurate knowledge of ourselves. Since we are complex organisms, and since we form complex societies and interact in complex ways with the complex ecosystem of which we an integral part, that identity is likely to be very different across the population, and the effectiveness of the identity as a functional tool for relating to the world will also vary. I don't think this means there's anything intrinsically divisive about this propensity for difference. It's natural. There are many more ways to be different to something else than to be the same as something else and as such it should be expected. In fact such diversity of identity concepts (the Germans have a great word for it, weltanschauung - world-view) has considerable advantages in the long term, from an information completeness and robustness point of view. However, these differences sit on top of what I consider to be a magnificent and fundamental framework of similarities. In much the same way as some people are at first shocked and then liberated by becoming aware that human beings are also animals, I felt similarly shocked and liberated by arriving at the conclusion that, like many simple and complex devices, and in common with all other biological forms, human beings are, in a fundamental way, information systems. Not only do we consist of information but our fundamental nature is defined by the laws of information, its nature and the means by which it is transformed. To be told this is, initially, for many people, shocking, insulting and demeaning to their humanity. They interpret the position to mean we are "nothing more than computers". They are correct in making this statement, but they think that to be a computer would imply that they would be without emotions, or creativity, or self-awareness, as are those deterministic, electronic, data-processing devices which we currently mass-produce and which do their `thinking' by pushing electrons through various meticulously configured pieces of metal and contaminated rock. This is not the case at all. All living systems are computational in their fundamental nature. What they do, by various means, is compute how best to get themselves reproduced. Humans evolved in such a way as permitted them to possess computational infrastrucutre - brains - with processing power surplus to this basic requirement, and, with that requirement fulfilled, the processing effort could be directed to other tasks, for example, coming to an understanding of the basic nature of information, and our own fundamental existance as information processing entites. I have come to an acceptance of my status as an information system. I am not morose about it, in fact I'm rather jubilant to know that I'm on the end of millions of years of Darwinian improvements and increasing computational power. Attaining this awareness has been an illuminating journey. It equipped me to cure myself of a disease with which I had been infected for nearly two decades (specifically, a very widespread strain of religion which had its origins in Rome about two millennia ago); it equipped me to understand why I age, why language is littered with logical operations and why maths is lossy. It has had some unusual influences on the way I understand things to be, but these have led to what I consider to be delightful insights about my place in the universe. Ch.1 : Information in numbers. Many people put mathematics on a kind of pedestal. They quite understandably pay homage to its astounding descriptive and predictive power. Whilst it is an astonishing and extraordinarily useful tool for thinking about the universe, and is significantly responsible for the rise of science, technology and engineering, it is nevertheless a language - a symbol system into which information can be embedded for transmission and storage. As such one should find the language of mathematics obedient to those laws which govern the behaviour of information - such as, certain information transformations are lossy, which is to say that once I perform information transformations, some of the information present in the data I had to begin with is no longer there in the finished product. Copying is an example of an information transformation, and it always brings with it errors in the copy, though certain steps can taken to ensure the error rate is arbitrarily low. If I take a colour road map and photocopy it on a black and white photocopier, I lose the ability to discriminate between all the colours (which are now shades of grey). If it happens that some of the cartographic information is encoded in the colours, I lose some of the carogtraphic information as well, and will be more likely to get lost if I use the copied map to navigate. It also bears thinking about that the original colour map is itself a crude copy of the actual streets, parks and so on, which it claims to represent - it leaves out individual trees, houses, potholes, and changes to the landscape after the map was compiled and printed. Copying is not the only example of an information transformation. I can take a street map and extract specific kinds of information from it, but only at the expense of losing other information. I can count up the number of streets and measure their lengths, then take an average, which will be a number in all likelihood different to the actual length of any of the streets. I also need to remember that a map is a *sample* - there will always be streets on it which go beyond the map boundary, so I will have sample bias in my average, because in the map, no street can possibly be longer than the distance of one corner of the map to the opposite corner, even though an actual street might go well beyond the map boundaries. These losses and inaccuracies are artefacts of our handling of the available information - in this case, sampling, and averaging. This doesn't just apply to maps, it applies to many other systems. At the risk of beign accused of having rocks in the head, I'll show another example here, which I used to have to think about when I trained as an explosives shotfirer in a picrite quarry. Say you build a machine to weigh a bunch of different rocks to the nearest gram. It generates a list of weights and counts the rocks ... which is why you build a machine to do it. You definately don't want to count rocks all day yourself. Suppose itis a accurate, precise machine and will weigh rocks down to a hundredth of a gram, but can't handle rocks bigger than 100 grams. The machine will calculate an average, which means, it adds all the measured weights up to one number. During which process, you lose all the information about each rock's actual individual weight. Then it will divide by the summed number of individual rocks it measured - which means it takes that summed generalisation about weight for any rock and distributes it across all of the rocks. Taking an average is a mathematical way to derive a description of a certain kind of relationship between all the measured rocks. That relationship, specifically, is the relationship of what we know about their mass to what we know about how many rocks there are. It makes sense to do this. You save subsequent computational effort by not having to write down the description of every rock to get a general idea about the properties of all of them, you gain speed of acqusition of information about a group of rocks, but at the cost losing specific information about any specific rock. Also, trust me on this, the big guy on the weighbridge at the exit end of the quarry has no interest whatsoever about each specific rock which is sitting in your ute when you drive onto the weighbridge plate, nor does the bulldozer driver who smears them out onto the topsoil prior to spraying them with bitumen to make a road. You can add information to the group of rocks, however. If you need a specific size you filter them through a screen mesh with a specific regular hole size, which adds size-specific information to whatever rocks made it through the filter (these rocks are smaller than these holes) and also adds information to whatever didn't make it through the mesh (these rocks are bigger than these holes). You pay more for filtered rocks, not because each rock is any different after filtering, but because you know more about all of the filtered rocks- that is, they fit a particular size range. It's important, this way you're not going to end up with lumps, from rocks bigger than required, sticking out of your road surface. You might notice the weighbridge operator asks you how much your truck weighs, so that the information extracted by the weighbridge operator about how much your truck AND load of rocks weighs together, can have the irrelevant information about the weight of your ute removed, before they bill you for the weight of the load of rock you have in the back of the ute. Of course, if they don't ask, you'll be cheated out of some money and some rock, since you'll also be paying for your truck's weight worth of rock without getting any rock for your payment. Let's look at the average itself, which might say that each rock weighs 27.72883 grams. That's mathematically impeccable. However in an informational sense, it's misleading. You don't learn anything about the relationship between the rocks beyond the first four digits, since the measurements were never made to this level of precision. The information is smeared across a lot of rocks and an artefact of this smearing is that some of it becomes too finely smeared to even be believable. You might think your average rock weight is at least 27 grams, and at most 28 grams, right? Nope. With measurement there are sometimes some unusual measurements and you also lose information about them when you take an average. In your measurement, a rogue chunk of styrofoam will weigh about 1 gram, and a rogue chunk of railway track will weigh 91 grams, but the average is still 27.72883 grams per chunk of stuff weighed. It is also valid to say most of the rocks are 28 grams to the nearest gram - you don't want to lose that .7 grams, but in doing so you add .3 grams worth of error into what you know about all the rocks, which isn't much since it's spread over many many rocks. It also happens that, if you have measured the weight of only a few rocks, it is very unlikely that ANY of the rocks actually weighs in at the average weight in reality. You lose a lot of information when you take an average, which is why averages, and statistics in general, are notoriously abused in the media (people have written books about this, and this is the origin of the term Lies, Damn Lies and Statistics). The average person in the street does not exist. Some people closely approximate to the average person might exist. Millions of people in the street are discrete and different in uncountably different ways and to treat them as averages is not only to demean and ignore their diversity but also to fail to understand them. I'm going to dive in here and play with some numbers, but this is not meant to be a lesson in how to do maths, it's meant to demonstrate what I consider to be the information loss intrinsic to mathematical operations. ------------math in here When we wish to inform someone that, according to our opinion, there's twenty seven of some things about which we have a conceptual awareness, we speak, or we write, 27, not 27.728830000000, even though both are right. There's no point adding the zeros, because they don't give us any more information - and we mentioned before, due to the crudeness of our measuring tools, which only measured to the nearest gram, everything after the decimal point was untrustworthy anyway - additional zeros would tell lies about our accuracy. Significant figures are a big problem for some people, because they are not told when they are learning mathematics that what they are dealing with is a symbolic language which describes relationships between quantities, and since it it a language, it is an information transmission system, and therefore obeys the rules which determine the nature of information. We are generally taught how to *do* maths, not how to *undrestand* maths (in the language instinctive sense). It takes a while before we intuitively understand that maths is a language, designed to describe the information contained in relationships between numbers, spaces, and other quantifiable things - and that while the quantities and operators which face us on the page do not mean anything of themselves, the meaningful relationships they describe for us are basically informational in nature. Mathematical equations are a concise, minimalist, unambiguous method for describing the information embedded in the relationships between numbers, and describing the functions which govern these relationships. This number here relates like so, to what that number does to this other number. 6 = 2 x 3 Informs you differently to 6 = 1 x 6 Similarly, y > 4 - 1 informs you differently to y > 5 - 2 However, most of our math operations are lossy in the informational sense... the item you get by doing the math contains, of itself, no information about how you obtained it. A trivial example of an information-preserving mathematical transformation is identity, where you do nothing to your number whatsoever. Another example is reciprocation, where you divide one by some number (say, 4) of your choosing; the answer (1/4) will give you back your original number when you reciprocate it again, and the answer to the first reciprocation, say 1/4, has embedded within it symbolism which suggests that the number is itself a product of a reciprocation operation (it fits the form of any reciprocated number - it has 1 divided by some number written in the actual answer). Of course if you forget that the operation you performed was a reciprocation, then 0.25 will not tell you anything about how it was arrived at. The values represented by our numbers, of themselves, have no history. It is worth noting that if you have an answer to a mathematical question, and its answer is less than one (1, remember, is a bit - a chunk of information which answers the simplest kind of question, which is a yes/no one where yes=1 and no=0) but more than zero, then our symbolism for maths necessarily forces one to write your answer in terms of a relationship between two pieces of quantity information, not as quantity information itself. A fraction, such as 1/3, is a way of writing a relationship between two numbers, but it is not actually a measurable counting number itself. You could spend the rest of your life and waste a lot of paper writing it all down in decimal too, 0.3333333 (and so on). An example of a reversible logical operation is NOT, where if you stick in a 1, you get out a 0, if you send a zero into a NOT gate, you get out a 1. (AND isn't a reversible operation because for each answer an AND gate gives you you have one of two possible states which gave rise to the answer, and the answer says nothing about which state it was.) Again, if someone just tosses a 1 in your direction and says nothing about how that 1 was arrived at, then the mere presence of a 1 isn't going to tell you that exists becase someone did a NOT(0) on it at some point in history. These are information-preserving transformations, in the sense that if you have only the operation and an answer, you can regenerate your original conditions using only the function and the answer. There are even information-generating transformations, such as fanouts, where the presence of a single bit of data is used to generate the presence of two bits of data. Most of our operations, even the common ones we wrestle with every day, By way of example let's take a really simple sum like so: 3 + 2 = 5 and dissect it for its information content. Since many of us are totally rote-trained to do this sum, it might be best to do it in terms of zeros and ones, that is, in binary, which is a mathematical number system comprised entirely in terms of answers to the question "Is there a quantity (as opposed to a non-quantity, or, put another way, as opposed to the absence of any quantity at all)" and the only possible answers to that question are yes (1) and no (0). In any number system, the _base_ of that number system is the number of different quantities which can be encoded in a single position of a number. We humans tend to use base ten which means, in any position, we can have up to ten different quantities: a digit can be empty of value (0) or can contain nine different quantites, 1,2,3,4,5,6,7,8, or 9. You are all familiar with these symbols and the relationships between them, and you handle them every day whenever you measure out quanties of ingredients prior to cooking them, for instance. The position in which you find a symbol is also important, and that importance is also determined by the base. When you write a series of digits down from left to right in base two, or in base ten, the least significant bits are always at the right-hand end of the number. If a waiter makes an error with these numbers when you pay your restaurant bill, these are the ones about which you do not worry. It's when the waiter makes an error with the most significant (and usually, most information dense) numbers at the left hand end that you make a scene at the counter. As such we use number systems which, as one reads them digit-by-digit from left to right, exhibit an information density which collapses exponentially. Digits on the right are piddlingly less significant than digits on the left. The rate of collapse is determined by the base (also known as the radix) of the number system employed. The base-10 number 1009 has a big digit on the right, a 9, which is by itself contains much more information than a 1, but of course the 1 on the left, boring and information-depleted as it is, encodes lots more information by being in the very significant 10^3 position on the far left of three other digits. It also tells you why a 0 is an important number; it has no intrinsic quantity, but not only can it encode "nothing" but it can encode it in specific places, for example, in the previous number it tells you there's no tens and no hundreds. Off the top of my head I cannot think of a number system with digits encoding a linear information distribution. Well, actually, I can, but I don't think I'd go so far as to call it a number system. By way of demonstration, I will sit here and encode the quantity 27 in this very primitive way: I will ask a very primitive question 27 times. That question concerns the basis of measurement of a quantity itself, that is, identity, which can be phrased as : Something is always equal to itself. Identity is a rigorous benchmark, and something either is, or is not, equal to something else. So the question, although it could be any yes/no question, is, "Is this item a can of beer?" What this actually means in the physical world, where I might be counting identical beer cans, is actually a complex pattern recognition job of seeing and feeling and inspecting something and comparing it to my mental checklist of the properties typically exhibited by whatever I think is a beer can. "Is this a can of beer?" "Is this a can of beer?" "Is this a can of beer?" . (21 more iterations of the same question) . "Is this a can of beer?" "Is this a can of beer?" "Is this a can of beer?" Of course in asking this question I ignore all the different varieties and states of beer can which I might observe, and throw all of that information away, keeping only the information about the presence of absence of the beer can. I'm looking out across the floor of my kitchen after a party, and many items are strewn about the room. Suppose I draw a diagram detailling the location of all the things I found which exhibited the identity of a can of beer, and for each identified beer can I write "1". It happens I get this: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 If we condensed that down from four lines to one line (doing this, incidentally, reduces the amount of information you need to aim your eyeballs to find all of the lone 1's later on, but we'd then know less about the distribution of beer cans across the floor) it would look like so: 11 1 1 11 1 1 11 1 111111 11 1 1 11 1 1 1 1 And then if we removed their clumpiness (which denudes us of even more information about the beer can distribution) it would look like so: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 There's no hidden message. To extract 27 from the above, you just have to go to all the tedious effort of counting up all the ones. When you count them and write 27 you encode this primitive quantity data splattered around in the paragraphs above, into a neat, quick, dense symbolism which encodes a lot of rapidly readable information about quantity, but which has nothing to say about the distribution or clumpiness of the beer 27 beer cans we counted, or even that they were beer cans at all. So much for counting up to 27 and determining what information you've lost in acquiring that number. Let's do some maths. Translating "3 + 2 = 5" into the less familiar lexicon of binary, it becomes: 11 + 10 = 101 What have we done here, exactly? Well, for a start, in going from left to right and doing the addition, we've lost one binary bit of information. We had four bits on the left and we did the sum and have three bits on the right. If we did the operation in base 10 we'd have lost 3.32 bits of information. All this of course excludes any accounting for the information lost when, in deriving the answer, we throw away not only a digit, but also the operator. When we use the base 2 number system, which is the smallest usable base as far as I know, the original quantities, the three and the two, are now encoded as answers to the following potentially infinite set of questions, read individually from right to left: <---- Is there a 2x2x2? Is there a 2x2? Is there a 2x1? Is there a 2x0? Notice that although the answer can still only ever be a yes or a no, the significance (if you like, amount of information extracted) of each question doubles each time you go left (of course, in base ten, significance becomes ten times greater each time you go left). That is, each question is twice as important as the last one. Notice also that it matters where the numbers are relative to each other. Any number encodes, in its position and quantity information, not only the value of itself as a digit, but its value in context to all the other adjacent digits. The answers are: 8 4 2 1 For 3: No No Yes Yes For 2: No No Yes No For 5: No Yes No Yes The numbers 2 and 3, represented in binary, contain the same quantity of information - that is, they each contain answers to the same questions. That the base-2 number "10" gives a NO answer to the question "Is there a 1"? doesn't mean it contains less information than the base-2 number "11". What we refer to as the _base_ of the number system is the value that actually imposes a given information density upon a numeric symbol set, by virtue of the significance (amount of data represented by a given digit) it imposes on the positions of the symbols. For the numbers you and I usually work with, the significance increases exponentially as you read the digits in a number from right to left. Base-2 has the most slowly exponentiating information significance per digit: Described in base ten, its information density within the first twelve digits increases only to a little more than two thousand primitive bits 1 1 1 1 1 1 1 1 1 1 1 1 2048 1024 512 256 128 64 32 16 8 4 2 1 Of course, base-10, which we all tend to use from day to day, has a massive increase in significance within the first twelve digits... to a hundred million primitive bits of data, or, put another way, the answers to a hundred million yes/no questions. 1,000,000,000,000 Hexadecimal is even *more* information dense, since each digit has not ten potential states but 16. Since mathematicians had already used up all the Greek letters they could get their hands on, Hex (the name by which hexidecimal is often referred) has a pretty familar looking symbol set: the first six letters were pinched off the alphabet and deployed thus - 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F where F = 15 in familar base-ten script. Incidentally, the reason Benford's digit distribution law works for the number systems we tend to use is exactly because we use number systems where, as you progress along the digits from right to left, the symbols encode progressively more information per symbol, in an exponentiating way. The Benford's Law equation has a log term to account for this information exponentiation embedded within the way we assign significance to our digits. Benford's Law for a base-n number system, with raw quantity information encoded in any of D possible digits in a format where the most significant digit is on the far left of the number, is: Proportion of numbers starting with symbol d = log x 1 + (1/d) n This is interesting: If we only have 2 symbols in our number system, zeros and ones (1 holds place and quantity, whereas zero holds place and the absence of quantity) all we can expect is that all of our binary numbers will start off with "1" 100 percent of the time, as all binary numbers do when used to count a quantity. There's no point starting with a zero, if you have one bit more than enough information to fill all the previous digits with 1, then you'd write another 1 on the left and convert the existing 1's to zero, - you can't pack any more information into the existing number of digits, which is why you write the new (and most significant) 1. We use a exponentiating system of digital numeric quantity description, after all. Benford's law applies to all sorts of fractal systems, such as lengths of tree branches, catchment areas of stormwater drains, cross sectional volumes of lung tubing (alveoli?) which are self-similar at all scales. They don't know or care about their quantities, what they do is get described by mathematicians using a particular kind of information distribution in the numbers systems they use. Claude Shannon, whom history might eventually recognise as the father of information theory, did care about this sort of stuff and mentioned in his landmark 1948 paper "The mathematical Theory Of Communications" that there is about 3.32 bits of information in a decimal digit, but I want to use a system where there is only one bit of information per digit. Anyway, we have these binary symbols and want to add them up, that is, collect them all as a single quantity instead of two quantities. At the absolutely primitive level, we strip down the symbol significance and count the ones. 1 1 1 + 1 1 We are left with 1 1 1 1 1 What does the + do ? It's an operator, something which describes an information transformation, which tells us how the information distribution will change. In this case, for the benefit of acquiring information about the primitive information of both groups when combined into a single group, you lose not only lose information about the original content of each group (three bits of primitive information in the first group, two bits of primitive information in the second group) and also lose information about how many groups there were, since there is nothing in the number 5 to suggest that it arrived as a result of a + operation previously performed on some other digits to the ones you're currently adding up. The = symbol is another operator, which means identity. It's not a transformative operator, describing an information manipulation, instead it's a relational one, describing the relationship of groups of information on each side of it. However it is much abused in mathematics; We commonly write 3 + 2 = 5 when in strict, information-preserving usage, we should only write 3 + 2 = 3 + 2. Order sometimes matters too, so writing 3 + 2 = 2 + 3 is right in the quantity sense but not in the sense that you're trying to preserve the order of the numbers. When order really matters, we tend to use brackets and do things inside of them first. The common usage of = has a different, very reductionist flavour, however. It implies information loss of one sort to obtain information of another sort. When you have a full equation 2 + 3 = 5 you have the information loss direction encoded as well as the identity of the primitives which made up the mathematical phrase which goes on to generate something you can encode in a 5. As you might expect of a number system wherein individual digits encode a bunch of information along with the fact that there are no other digits and you have no idea where the actual digits came from anyway, the simplest term with the fewest symbols and fewer operators (often on the right hand side of the equals sign) is usually what we describe as the answer. Interestingly, when we expand a sum with infinite terms, we write the most information-poor term on the left of the =, and pile up a potentially infinite number of terms on the right of it. Going from the quantities 2 and 3 the plus operator compels you to arrive at 5; Nothing in 5 compels you to go back to 3 and 2. Once you have the 5, if the 2 and 3 were later made unavailable to you, say, during an audit, you would have no idea whatsoever how you got that 5, if 5 is all you have left. There's an huge and unknowable number of sums which will provide five bits of primitive quantity information and you can't choose between any of them; for sums with many different numbers, this information-hiding in the summed term forms the basis of what is called the knapsack problem, which has significant usage in cryptography. Back to our simple sum. Suppose someone gave you a hint: they said some things were added to give you 5. It's a big clue, there is a significant amount of information embedded in the + (add) operator, which narrows down the possibility-space to something smaller but still infinitely huge, a + b +...+ c + d + x + y = 5 But suppose they didn't specify how many times the operator was used, they just said, above "some things were added to give you 5". Hopeless! A squillion different things add up to five. For a simple operator like + you can narrow down the possibility space even more by asking how many times it was used: Oh, once, someone tells you. So x + y = 5. We can draw the relationship (which is a line, actually) of numbers x and y which, when subjected to the + operator, will add to make 5, losing their position information on the line and condensing down to a quantity (five, as it happens) which embeds within it no particular fingerprint of its original components. Given the above data that there were two terms and a + operator we have enough information to know ALL of the numbers which have an information-loss property under the influence of the + operator such that they will leave five primitive bits of information as a residue, but we still don't know specifically what y and x actually are. You know what they are since you've got them a few paragraphs above, but you can't tell that from just looking at a 5. Just like money. You don't know what it did before it got to you. What do our numbers actually encode then? I'll do this in reference to my last paycheck, which was $152 The data on my paycheck is written in a numeric symbol format which: 1) Encodes a vector (direction in which you should read the digit positions) I would be more happy if I were paid 251$ than $152. 2) Encodes a significance co-efficient defining the relationship between numbers in a given position (this is called the _base_). In hexadecimal where each digit is 16 times more significant than the last one,, my paycheck is: $256 $16 $1 1 5 2 Which means I am paid $256 + $80 + $2 dollars in base ten. 3) Encodes a number of symbols each encoding chunks of quantity information itself. Convert all of the figures in your paycheck to say, 1, or zero, for further clarification. $000 is a really unpleasant paycheck. Furthermore, a paycheck for $152 dollars written out in primitives as : 1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1 1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1 1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1 1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1 1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1 +1+1 although mathematically impeccable and legally tenderable, is likely to irritiate the crap out of your bank teller when you try to cash it. 4) Encodes an assumption that all the digits there, is really all of them. Which is to say, our number symbol system encodes zero redundancy and has no inbuilt error checking, which has given rise to the entire accountancy industry. When digits are inserted or removed, lots of significance data is mis-assigned. Zero was a big help when it was invented, because you could finally talk about quantities which did not really exist, but to which other numbers had a relationship, so you could write down digits and preserve their positional information, which in any case is partly related to the radix used. 5) It also encodes an assumption about the what base it's in. By looking at a number you can figure out what base it is in only if you look at many many numbers and see how many different symbols there are. Of course if you know already it's done in base ten, then you know how the magnitudes work. If you don't, then you have to work it out. My $000152.000 paycheck can turn into $1520.00 or $15.2000 if we change the significance positioning of the digits. It can also suffer other mutations, such as losing digits and hence transmuting from $152 to $15 or $52. The behaviours exhibited by number systems are themselves artefacts of the behaviour of the information they actually encode. There's terrific jokes about the nature of informational errors in mathematics. Teacher: "With 1.5 you remove the decimal point to make 15. Where is the decimal point now, Michael? Michael: "On the duster." Teacher: "What is half of 8?" Michael: "Two zeros, one on top of the other." Its parallel in linguistic circles is: Teacher: "Michael, B-R-I-X does not spell "bricks". Michael: "Well, then, what DOES it spell?" When we write about a symbol like 324, we have a LOT of information encoded in that symbol set, which we are not consciously paying any attention to whatsoever when we do the math since we learned how to _use_ it and forgot how to _understand_ it. Numbers and equations encode information, and information is what we're dealing with when we do the sums. Our mathematics is a symbol processing system which shows us quantities of information and also shows us the relationships between quantities of information. Simple functions will lose you the information you added into them, or add information into the answer which you did not expect. If you have a -ve number and square it, you will lose the -ve sign. On the other hand, if you take that squared number and square root it, you could have potentially had either a +ve number or a -ve number to begin with but you have no way to know other than your perceived reality that you tend not to see any -ve quantities of things around the place. Squaring loses information about sign. My final shot on this subject of information loss pertains to numbers on the Argand plane, which are referred to as unreal, or complex numbers. They embed within them some very strange relationships which are lost irreversibly when subjected to multiplication by themselves. A complex number called i is written out in full as ________ i = \/ (-1) and it does not have any real, quantifiable existance. You cannot buy a jar with i grams of olives in it. Why? Well, first, you cannot have -1 olives. When one puts a - sign in front of a number, one immediately understands it to be negatively relative to something else... a -ve number is not a countable quantity, you can only infer it from what other quantities are missing somewhere else. If I lose two kilos of body mass since last week, then my weight compared to last week's weight is -2kg. Second, you cannot square-root an olive without attracting the attention of people standing nearby... besides that, you'd have to first find something which, when multiplied by itself, gave rise to an olive. So this complex number thing, called i, has two pieces of information in it. First, it's a surd, a number with a functional behaviour stuck on it, in this case the behaviour is that it's actually two numbers, which are both the same but which, when multiplied together generate -1. Such a pair of numbers does not exist but such a relationship has to be coded this way given the constraints of our symbolism. Second, it's signed. It's negative, which implies certain behaviour when multiplied with, or added to, other numbers. Now if you saw a normal number like 3, and multiplied it by itself twice, that is, you cubed it, you'd get a nice number, 27. 3 x 3 x 3 = 27 <--- (three cubed) These normal numbers will produce what you started out with if you reverse the operation you just did. If you take 27 and cube root it, you get three back, which is what you started out with. Note that you have to cube root it, if you square root 27 you get a little bit more than 5, which is obviously not the 3 we started out with. ____ But try cubing i. \/(-1) squared is -1; -1 squared is 1. Ok, great. Now if you cube root this 1 you just got, you get 1, not i. You have lost all that whacky root and sign information even though you kept your quantity intact and did the exact reverse of what you did when you cubed it. It appears, then, that what information you lose from a mathematical transformation depends in part on what kind of numbers you feed into it, and also on the information-transformative nature of the mathematical operator itself. Maybe this all sounds pretty useless. Mathematicians will scream at me and say, "Well yes, but how much information did you lose, smarty pants?" There is actually a calculus of information for all symbolic systems, including not just languages with alphabets comprised of symbols but also number systems comprised of symbols, such as mathematical sums and equations. Claude Shannon's work in 1948 enables us to quantify the information loss or gain in a mathematical sentence. In "Mathematical Theory of Communications" he stated that: "If the base ten is used the units [for measuring information] may be called decimal digits. Since log M = log M 2 10 ------------ log 2 = 10 3.32 log M 10 a decimal digit is about 3.32 bits." What this means is if I write a number like 3783, then since it contains 4 digits, and each digit is a symbol from a number system with radix 10 (base ten), therefore each digit contains 3.32 bits of information so the whole number contains 4 x 3.32 bits, which amounts to a little more than 13 bits of information at the symbolism level. Actually for lossless encoding you'd say it had 1 --------- bits per digit, and thus you'd need 14 bits to encode it. log 2 10 What it also means is that the number 0, which is mathematically considered to not even be a natural number, also contains just as much information in it as do any other the other digits whenever it is used in an n-radix system, though the information content of a zero varies depending on the radix R employed, like so: 1 ------- = bits of information per digit. log 2 R What this means is, if you use a 0 in base-2, it only holds one bit of information, if you use a zero in base-10 it holds 3.32 bits, if you use on in octal it encodes for 3 bits of information and if you use a 0 in hexadecimal encodes 4 bits of information. It also has the conseqence that you cannot encode any information per symbol in a number system of radix r < 2 : a symbol must encode one bit per symbol, no less. You therefore have quite a lot of information loss when dealing with primitive number operations using numbers with a lot of digits. Suppose in base ten, I take 1,000,001 and remove a million from it. It's a sum a child can do, and gives you a total of 1. 1000001 - 1000000 = 1 I start out with 14 x 3.32 bits of information in each set of digits on the left and end up with 3.32 bits of information on the right. So it cost 43.2 bits of information to arrive at the 3.32 bits of information in the answer on the right. If I took 1000 from 1001, I'd still get one, but I'd have lost fewer (2 x (4 x 3.32) bits, or about 26.5) bits of information to get it. Whacky huh! I figure this explains neatly why it takes longer to explicitly calculate large sums than it does to calculate small ones, even though on the surface it is apparent to humans immediately that when you take a million from a million and one, you are left with one, just as when you take ten from eleven. But that's because we probably cheat and notice that the two numbers are discrepant by one straight away, which is a pattern we've learned to look for since it has the payoff that you can crunch a couple of big numbers quickly. Buut, hey, what's this fractions of a bit stuff? How can you have a fraction of a bit, or a bit which doesn't know if it's 0 or 1? Well, I think it's what happens when you try to impose the information embedded in numbers based in one radix into numbers based in a different radix. The rinds of cake mix don't cease to exist just because they don't fit into your cookie cutter. We're not specifying that we have to know what state the bit is in, just that its existance is there, and a fraction of a bit is like any other fraction, it implies a information quantity relationship. I don't know if a fractional bit could be 0 or 1. If we take the number 4095 in base then and convert it to base 16, giving you FFF (base 16): we have lost information here too. It took 3.32 (bits per base-ten decimal digit) x 4 (digits) = 13.28 bits bits of information in the first number, and the second, FFF, has three hex digits (all at four bits per digit) which contains only 12 bits. And of course we don't know anything about where the FFF might have originated if that's all we're given. However, what's nice about radix conversion is that if you then convert back to your original radix you acquire all your original information again. I'll stop here, but this paradigm works all the way from simple algebra all the way up to tensor calculus, Galois theorem, phase spaces, relativity and even systems in which numbers are not even real (you saw that above for the cube rooting of i-cubed). Maths is a powerful, rigorous language, and it allows us to cook up some pretty complicated sentences and this then lets you look to see what these sentences actually mean in the real world. Nonetheless, it's a language all the same, and what languages do is encode and transmit information, which includes 1) raw information "there were six chickens", 2) information about information "there are six chickens last time we looked", and 3) information about relationships between information "there were six chickens last time we looked and that's one more chicken between last time we looked and the time before that. Sentences full of self-descriptive information become self-referential and complex - commonly, humans can apply about four levels of attribution to items referred to within a single sentence. We use languages so transparently that we forget to pay any attention to the ephemeral nature of information, the actual stuff they encode and describe. We sieve out interesting information and casually lose the rest. On a side note, I think it is possible to quantify the information content in mathematical sentences. Such sentences, called equations, describe the relationships between numbers; normal numbers, such as 0, 1, 3, 7 and 15, embed within them the answer to a series of yes/no questions, which is why binary numbers are most commonly used to represent them in a digital computing environment where yes and no are easily translated into two states for a transistor, namely, on or off. The operators will embed within them a truth table which describes unambiguously the information the operator produces given what information the operator is fed. The truth table contains a certain number of primitive symbols describing pattern matching for input and the nature of the subsequent output. You count *all* of the primitives in the truth table (0's and 1's all encode information in this case since taking any of them away ruins the truth table), and the number you get is the primitive information content of the operator. These numbers are easily computed for Boolean logical operators and flip-flops and memory devices. For something like addition, in binary, this can be quite large, since although the act of adding a bunch of primitive data elements is pretty simple and logically not demanding, the + operator knows nothing about _how many_ times it will need to be invoked in order to do a complete sum, so within the truth table for binary addition is a kind of escape clause where the quantities to be added, themselves determine how many times the addition operation will take place. Numbers define within their quantities the amount of information processing required to mathematically manipulate them. This is demonstrated straightforwardly with a cheap desktop calculator - it takes longer to do operations on large numbers than it does to do the same operations on small numbers. What's nice about these truth tables is that they don't change when you use them to transform other information by the methods they specify. -------------------math ------------math in here I dealt with rocks when I studied geology. They're actually pretty interesting on their own, since without even trying, they do in fact record an awful lot of information about what was happening both before and during the period they were formed, what happened to them afterwards (were they buried, heated, squeezed, sheared, melted, cooled slowly or quickly, what things lived in it, what was Earth's magnetic field doing at the time, etc) - you'd ordinarily never think the strata was stuffed so full of ancient history. Information embedded in rocks is fundamentally important for all sorts of reasons, mainly related to getting various minerals and energy which enable us to live a lifestyle with metals, mineral products, hydrocarbons, and various kinds of industrially useful and aesthetically captivating crystalline minerals. I also studied chemistry, mainly to avoid dealing with rocks... rocks didn't really grab me, the way biochemistry did much later. The earth was certainly a data processing system, with plate tectonics and vulcanism and erosion and a hundred other processes which transformed, say, a few hundred million tonnes of old forest into a deposit of almost pure carbon, but it didn't seem to be going anywhere with it, or learning anything, or accumulating knowledge in a manner which led it to do things any differently later on. The processes of sedimentation, igneous vulcanism and metamorphism, the three progenitors of information-bearing rock, had not changed for aeons. I couldn't convince myself that the rocks were learning anything new, or doing anything differently, even though they'd had a long, long time to try out novel chemistry. Chem was a difficult, and interesting subject, but I took away from it more questions than answers. I had this feeling as I looked at the funny pictures on the blackboard that aside from the electron-pushing, generation of heat and much pronunciation of crunchy Germanic descriptive terminology, there was something else going on here. Reactions happened, stuff A and B were changed into stuff C with unpleasant side-reaction product D which stuck obstinately to the wall of the glassware necessitating hours of cleaning later on, but there was something fundamentally different about the products with respect to the reagents. The relationships between their atoms were changed, and the behaviour of the product was in most cases entirely different to the behaviour of whatever materials had gone into making it. It was obvious to me that, when faced with a homologous series of molecules, say, the aliphatic hydrocarbons (methane, ethane, propane, butane, and so on), that there was something more going on than the simple addition of a carbon and a pair of hydrogens as one progressed along the series. That something was the change in their information content. Not only were bigger molecules more complex but the number of alternative configurations you could put them into grew extremely quickly as the number of atoms increased... the number of alternatively configured molecules a specific molecule was NOT grew rapidly. Hmmmm. The main thing I didn't learn in chemistry was what we were doing in terms of changing the information content of the molecules themselves. Why was it that the more complex the molecule, the longer it took to unambiguously write down its name or depict it in my notes, and the longer it took to synthesise it? Chemists have many different ways of describing molecules, and most of them are a kind of shorthand. You can take a molecule like mescaline (a phenylethylamine found in certain cacti) and in its most information denuded form write symbols which describe its empirical formula C H 0 N 11 18 3 But this could really describe lots of actual molecules, which would have the same number and kinds of atoms but bonded to each other differently. A more informative description, to a chemist who knows the shorthand, is (MeO) PhEtNH 3 3 Which means there's an aromatic six-carbon ring, three methoxy groups stuck on it, and an aminated (not animated!) ethyl group also stuck on it. This is more useful but also potentially describes several molecules, depending on where things are stuck on the ring - there's six places for attachment on such a ring, and four items to stick on it, so depending on how you shuffle them around you get quite a lot of variation. For a more useful description we turn to either a naming scheme designed to unambiguously describe the molecule, or we draw pictures of it. IUPAC came up with a naming scheme to unambiguously describe molecules, and it's a mouthful to use. Mescaline is known in this scheme as 3,4,5-trimethoxyphenylethylamine. Given that description a chemist can draw a picture which looks something like the actual molecule: (add picture) Buuut, that picture assumes certain things. Since the C-C bonds can be rotated about, then depending when you look, you're apt to find the molecule spatially configured like this instead: (add pictures of different conformational states) You can use a description called ORTEP to describe where the atoms will probably be at such-and-such a temperature. Interestingly the hotter the molecule, the more diffused those atoms are likely to be. (show ORTEP picture for a molecule) There is a reason for the increasing complexity of the description. Molecules, apart from containing atoms and energy, also contain information. The more complicated the molecule, the more information it contains. Additionally, if a molecule is very complicated then not only does it come from a large family of possible related alternatives, but it is increasingly rare, insofar as it is one configuration amongst potentially millions of others very much like it but not exactly the same. Take, for example, the carbohydrates, which is the chemical superfamily including in its membership sugars, many of which are present in your body. Carbohydrates (as distinct to hydrocarbons - ever eaten a wax sandwich?) have hydrogen and oxygen in a 2:1 ratio. Now, not every molecule answering that description is a sugar - you could easily include acetic acid (the smelly stuff in vinegar), phloroglucinol or lactic acid (stuff which makes your muscles hurt when you exercise) in that description. By dint of a molecule being a sugar, it delineates that it is NOT any of these other types of molecule. More generally, if you discover that you could assemble n distinct molecules from the information given in some particular empirical formula, then the molecule that you do create represents one chemical configuration amongst n possibilities. The actual molecule not only represents itself, but also represents the absence of all of the other possible molecules it might have been. The value for n becomes large quickly, and gets increasingly larger at rates which quickly outstrip any increase in the number and kind of atoms in a geven empirical formula, because for each additional atom you add to the empirical formula, you add greater and greater loads of additional permutations and combinations to the list of what the actual molecule could possibly have been instead of whatever it actually is. The number of possible molecules you can build out of a specific set of specific numbers of different atoms corresponds to a number of configuration-specific information states, each of which has an information value proportional to the number of possible isomeric configurations obtainable from the empirical formula. Consider a simple molecule like Bromine (not *too* closely... it's an acrid-smelling red vapour at room temperature). It has a simple empirical chemical formula 2(Br) and the constituents of this formula describe two states, Br2 and 2Br, since if you biff the bromine molecule (called Br2) with enough energy it will fly apart into two bromine atoms (2Br). The reverse applies, there is an equilibrium between the two processes. The empirical formula describes only quantities of types of atom, and says nothing about wether they're bonded or not. You can say that the pair of bromine atoms in the empirical formula can occupy one or the other of two states - chemically bonded or not chemically bonded. Each of these states represents one binary bit of information, much like a 1 or 0 in an electronic data system, or like a light bulb (on or off). We say binary bit because we have two states to talk about for the bromine atom. If we had a different system with three states, that would be a ternary system and for a system with four states, that'd be a quaternary system. For a system with ten states, we have a decadic system, such as our numerical system, where we have any one of ten symbols (one state of ten possible states) filling each digit position. Even if we restrict ourselves to a specific empirical formula and connections between the atoms, there are configurations which differ only in certain aspects of their symmetry despite all the atoms being connected together the same way. This includes the diastereomers, enantiomers, and certain kinds of conformational isomers (for example, what are called the chair and boat configurations of benzene - each of which is, chemically speaking, just as benzene-ish as the other). As for stereoisomers - these molecules are made of atoms identically connected up, except you discover that there are two types of molecules, insofar as when you separate them out into their specific handedness (left or right handed forms) each will rotate plane-polarised light in opposite directions and their crystals will look the same but the exact opposite of each other. The description might change, the spatial configurations might change, and it is still the same molecule, chemically, but only in a left-handed reaction setting. Put it in the right-handed version of the same reaction setting and the reaction will proceed differently or not at all. There are also molecules which when you do chemcial reactions, act as if they were two different molecules, that is, their chemical bonding description actually changes in real time into something else for a little while, then almost immediately, changes back into the usual state (this is called tautomerism). One needs progressively more symbols to describe more accurately the more complicated molecules, that is, need more information per message. If we think back a little to the two-state bromine system (where the states were, association or dissociation), then if each bromine molecule can store information by the presence or absence of a covalent bond between its constituent atoms, any Br2 can store one bit of information. It has one actual state out of two possible states, both of which have empirical formula 2(Br). Information theoretician Claude Shannon came up with the following relationship for the bitwise (two state) information content per-state in a ten state system Since log M = log M log M = 3.32 log M 2 10 2 10 -------- log 2 10 This 3.32 is the reciprocal of (log 2 to whatever log base you're interested in). In bromine's case, the presence or absence of the bond signifies a 0 or 1 bit. In more complicated systems, systems of isomers should exhibit conserved numbers of bonds, if it is in chemical bonding that we can assume chemical information is stored. Shannon's information description can be generalised to systems with potentially huge numbers of states. The number of possible molecular [onfigurations physically permitted to a bunch of atoms in an empirical formula is some number, call it [p. [p therefore represents a radix, which determines how many possible states are permitted per symbol in some symbol system where a physical molecule is considered a symbol. (for example, hexadecimal has 16 symbols and is a radix-16 number system, where each symbol must be one of 16 possible symbols; the number of bits in a hexadecimal digit = 4 because log 16 10 ------- = 4 log 2 10 So: referring in base-2 to the molecular information content, then we have b base-2 bits per molecular configuration state: b = log 2 [p As [p, the number of possible molecules which could have been configured out of the atoms in an empirical formula, increases, we can see that the quantity of information in a molecule increases, though the number of available isomeric "states" has to double before you can store one more bit per molecule (it needs to double because we're describing the system sing bits, which have two states). Some relationships are given below: Number of possible isomers number of digital bits per actual isomer 2 1 4 2 8 3 16 4 32 5 64 6 128 7 256 8 512 9 1024 10 Fairly obviously, the number of possible isomers varies with the power of 2 raised to the number of bits stored in a given isomer; Since one normally thinks in terms of what a molecule is, rather than what else it could be within the constraints dictated by the atoms from which it is made, this relationship is generally not obvious. In any chemical system a complex molecules are more and more improbable, and can contain more and more information as their complexity increases. This gives rise to some interesting things. First, given a quantity of molecules you can say how much information you have in the material. If I have a mole of ethanol, I have 6.023x10^23 molecules of ethanol. Ethanol's empirical formula is C2 H6 O. If provided with just these atoms, there are many configurations by which chemistry could be made to covalently satisfy all of the atoms in this formula. Ethanol occupies only one of these possible chemical states. In the set of ten alternatives below I have generated what I call whole compositional isomers, which includes the set of isomers generally, but also includes sets of more than one molecule which can be created given only the atoms in the empirical formula, and which as a group possess the same number of bonds as is possessed by ethanol. In the set below I have tried to ensure that all the atoms are bonded to other atoms in one molecule, that is, there's no opportunity to lose the information contained by the presence of a chemical bond between two atoms. I might have counted wrongly, but I think there are 10 such states, and I write them here in order of increasing number of discrete molecules per state. Interestingly as we move into more and more fragmented states we see that more of the molecules which make it up are gaseous and rather lacking in possible alternative configurations, and some are highly active chemical species. 1 1 ethanol CH3CH2OH (total eight covalent bonds in it) 1 methoxymethane (dimethyl ether) (total 8 covalent bonds) 2 1 methane + 1 formaldehyde (total 8 bonds) 1 ethylene oxide + 1 dihydrogen (8 bonds) 1 ethylene (ethene) + 1 dihydrogen monoxide (8 bonds) 1 vinyl alcohol H-CH=CH-O-H and 1 dihydrogen (8 bonds) 1 acetaldehyde plus 1 dihydrogen (8 bonds) (the two above are tautomers) _ 3 1 ethyne (HC=CH) one dihydrogen and 1 dihydrogen monoxide (8 bonds) 1 ethynol (acetylene alcohol) + two dihydrogens (8 bonds) 1 ketene (CH2=C=O) and 2 dihydrogens (8 bonds) For the sake of example if I have ten possible compositional isoemric states, [p is 10 so the information content b in bits per actual molecule is: b = log 2 10 b = 3.32 bits. Note that this only applies to ethanol and dimethylether! So 3.32 bits x avogadro's number is a enormous lot of data, something on the order of 2 x 10^24 bits. Dividing that by 8 gets 2.5 x 10 ^23 bytes and dividing it by 1024 successively to push it into units with which people can grapple, this is: 2.38x10^17 megabytes, 2X10^14 gigabytes, or 2.27 x 10^11 terabytes per mole of ethanol. Ethanol has a molecular weight of (2 x 12) + (6 x 1) + (1 x 16) = 46.07 grams per mole. So ethanol's information density in bytes per gram is: 1 ----- x (2.27 x 10^11) bytes /gram. 46.07 = 4708 Megabytes/gram. One of the things about this configuration is that the information is highly redundant : you could reconfigure into other molecules, or deconfigure into constituent atoms, almost all of the molecules here in your mole of ethanol, and still not lose the information intrinsic a single molecule of ethanol. It is instructive to note that to move a megabyte down the phone lines local to me here in Sydney costs 20c at local rates. I should therefore be charged $9416 to get a kilogram of ethanol delivered to me, if I ignore the information content of the label, the lid, and the bottle it comes in. It also means that if the delivery takes one hour, the information (4708 Gbyte) came across at 163 megabits per second (3600 seconds in an hour) which is some orders of magnitude faster than the 56,000 bits I can get through the phone line. As far as performance for price is concerned, a bottle of vodka beats an optic fibre. My personal view on this is that 20c a megabyte is way, way too much. Doing the information calculus for dimethylketone gives us the same information content since it is a one-molecule, conventional chemical isomer of ethanol. If ethanol came in L and R forms, these forms would have equal information content too (this would need to be accounted for in the number of states available to ethanol) though each would represent a different state, so [p would need to be adjusted accordingly. As is, given DME or EtOH, we have the same number of molecules and the same number of isomers so the information content is the same. Of course, the heats of formation for ethanol and dimethyl ketone, and their respective densities, tell us something else too: firstly, the cost in joules per bit of encoding information in either of these two molecules, and also the information density of each. Nice as it is to be able to say these things, I'm not convinced that constitutional isomers are really the whole picture. If we assume that a pair of electrons involved in a covalent bond are indistinguishable from any other pair of electrons involved in a covalent bond, and then relax the requirement to keep 8 bonds, [p becomes considerably larger since we get many additional states, though considerably more primitive. If we also relax the requirement for the atoms to exist in the sort of chemically configured states we tend to commonly find in nature, we can decompose this even further to a system totally denuded of any chemical information whatsoever, and on the way we get even more states. Some of these states are fleetingly present in various kinds of transitional chemical environments (flames, interstellar space, reaction intermediates). Many of these states are degenerate and indistinguishable from each other chemically. The way to count these is not obvious, since although it would be easy to group, say, hydrogen atoms into pairs using standard combinatorial maths, 6C2 tells us we can make 15 different 2-hydrogen-atom groups given 6 hydrogen atoms, but they're all chemically identical. I'll give you a taste of these below: 1 state with no bonds: (no chemical information) (see also below) C C O H H H H H H we can express this as 8C1 = 1 state with no chemical information with 8 members; however, they're defined as disordered. This is fortunate since there's about 8P8 = 40320 ways to group these linearly, though to do this would imply order, and these atoms distinctly lack any order, so we must ignore them when counting information content. The problem is made more tractable since we can consider only the atoms involved in bonds, and ignore those not involved. 5 chemically distinct states with 1 bond : O-H [+ unordered atoms H H H H H C C] H-H [+ unordered atoms C C H H H H O] C-H [+ unordered atoms C H H H H H O] C-C [+ unordered atoms H H H H H H O] C-O [+ unordered atoms C H H H H H H] (note: this is (5P2)/2 because each pair is degenerate: O-H = H-O) We have at least 15 States with 2 bonds H-H H-H [unordered C C H H O] : note this is 1 state not 3. The other ways of distributing two bonds between four hydrogens contain the same information as the one indicated. C-H H-H H O H H C C-H O-H H H H H C C C O-H H H H H-H C-C H-H H H H H O C-C H-O H H H H H C-H C-H H H H H O C=O C H H H H H H H-C-H C H H H H O (singlet methylene) H-C-H C H H H H O (triplet methylene) C-O-H H H H H H C C-C-H H H H H H O H-O-H C C H H H H C=C H H H H H H O C-O-C H H H H H H I can think of at least 41 states with 3 bonds but there's certainly others: C-O C-H H-H H H H C-O H-H H-H H H C C C H-H H-H H-H O C C H H-H H-H O-H C-H H-O-H H H H C C-H C-H H-H O H H C-H C-H O-H H H H C-H H-H C H H H O C-H O-H C H H H-H H-C-H O-H H H H C (singlet methylene) H-C-H O-H H H H C (triplet methylene) H-C-H H-H H H C O (singlet methylene) H-C-H H-H H H C O (triplet methylene) H-C-H C-H H H H O (singlet methylene) H-C-H C-H H H H O (triplet methylene) H-C-H C-O H H H H (singlet methylene) H-C-H C-O H H H H (triplet methylene) C-C H-O-H H H H H C-C O-H H H H-H H C-C H-H H-H H H O C-O-C H H H H H H C-O-C H-H H H H H C-C=O H H H H H H C-C-O H-H H H H H C-C-H H H H H H-O C-C-H H-H H H H O H-C-C-H H H H H O C=O H-H H H H H C C=O C-H H H H H H C=C H-H H H H H O C=C O-H H H H H H H-C=O H H H H H C H-C-O H-H H H H C H-C=O H H H H H C C-O-C-H H H H H H C-C-O-H H H H H H C#C H H H H H H O H--C--H H H H O C | H C---C H H H H H H \ / O And I'm not even going to try it for 5, 6, or 7 bonds, I know I wont even get close. So on we go until we fill the total number of states. [p is tractably calculable, least for smallish molecules. There's a mathematical construct called a generator function which gives the actual number of possible states in a system, though it doesn't tell you what they all are; the main danger with using it is that it includes some chemical configurations which cannot exist, including, say, a C bonded to another C with four covalent bonds (one has to subtract these silly states out of the system after the generator has done most of the work). For ethanol, wherein we have between zero to eight (inclusive) bonds distributed amongst: two carbon atoms of valence up to four, two oxygen atoms with a maximum valence of two six monovalent hydrogen atoms, the generator function looks like so: 2 Carbons: alone or with Oxygens: alone 6 hydrogens, bonded or one to four bonds: or with 1-2 bonds not bonded (1 + x + x^2 + x^3 + x^4)^2 x (1 + x + x^2)^1 x (1 + x) ^6 This expands to an atrocious polynomial of the 14th degree: When x is made equal to 1, then the co-efficients of the equation's terms whose powers are less than or equal to the number of bonds (in this case, up to and including 8) should be added to give us the precursor to [p we seek for subsequent insertion into b = log 2 [p It should be noted that, excluded here from the list of compositional isomers are the following combinations, plus reasons for their exclusion: 0) Ions. Their electrostatic bonds are defined as not covalent in character. 1) Various permutations on a set containing 2H2 with a very strained cyclic C=C double-bonded epoxide (total 8 bonds) It might be synthesised by reduction of a cyclic acetylene oxide if one exists. I found no mention of such a molecule "ethyne monoxide?" anywhere. 2) 3H2 and an extremely unlikely triple-bonded pair of carbons with a bridging oxygen (ethyne monoxide?). It has 8 bonds but I doubt it could ever form and I found no mention of it. 3) 2H2, H2O and a extremely unlikely pair of carbon atoms bonded quadrupally to each other (total 8 bonds) : carbon is tetrahedral and when triple bonded, the remaining single bonds point away from each other and are separated by the atoms and the existing bonds. The two next most highly configured molecules which atomically add up to the empirical formula for ethanol, namely, formaldehyde and methane, will reveal to us how much more information there is in an ethanol molecule than there is in formaldehyde and methane taken as a pair of chemically discrete entities. It's actually quite revealing. Now, this sounds like a contradiction, bit it isn't. One might think that since their atoms and (8) bonds add up to enough bits and pieces to make an ethanol molecule, a formaldehyde molecule and a methane molecule should together possess the same information content as does a molecule of ethanol, but of course a methane or a formaldehyde on its own should collectively have less information in them since the range of compositional isomers for each of these molecules is less, taken separately. How much less is interesting! Methane (CH4) is one of 10 configurational alternatives if we relax the constraint about bond number preservation. CH4 (4 bonds) H-C-H H-H H-C-H H (3 bonds) | H H-H H-H C (2 bonds) H-C-H H H (singlet methylene) H-C-H H H (triplet methylene) H-H C-H H C H H H-H (1 bond) C-H H H H C H H H H (0 bonds) Formaldehyde (CH2O) also represents one state of 12 states open to the atoms which comprise it: H C=O (4 bonds) H H-C-O-H (3 bonds) H-C=O H H-C O-H (2 bonds) C-O H-H C=O H H C-O-H H H-C-H O C-H H O (1 bond) H-H C O H H C-O H H C O (0 bonds) If we constrained each molecule to its original number of bonds, [p=1 and therefore formaldehyde and methane each have a chemical information content of b = log 2 1 or 0 bits per molecule! These molecules are amongst the most informationally restricted we can have without resorting to radicals, ions, elements or subatomic particles. It's worth noting that exactly these sorts of information-restricted molecules with few atoms, few bonds and few alternative configurations available to them (for example NH3, CH4, H2, H2O) were the precursors used in the Miller-Urey experiments, which in 1954 demonstrated that by energising these extremely simple molecules with anything from electric discharges, electromagnetic radiation (ultraviolet, visible, infrared, X or gamma!), to acoustical shock waves or even energized fragments of atoms, such as alpha particles or electrons, or for a week or so, would give rise to all sorts of complex molecules. When energy was pumped into the Miller-Urey system it enabled these simple molecules to pool their collective information space and thereby, out of necessity, concatenate and combine and become more complex, thereby embedding more information into fewer, more complex molecules, by dint of these more complex molecules existing as one of several alternatives instead of a given molecule with no alternative configuration. The situation in a primordial earth would have been more complicated, bringing with it the presence of far more elements including some metals (metal ions are commonly known for their tendancy to catalyse chemical reactions). Returning from that digression, we have our two molecules, methane and formaldehyde which, when combined, would recreate the information space populated by ethanol and all the isomeric bond-conserved alternatives to it; however, in this case the methane and formaldehyde are informationally and chemically on their own, and therefore occupy, collectively, two chemically discrete states (one state available to each), NOT one of ethanol's ten possible isomeric states... if methane and formaldehyde were taken together we'd be talking about their combined information content, which is not the same as the plain sum of their own individual information contents. We notice that twice 0 does not give us back ethanol's 3.32 bits! So we can calculate an information differential between the information content in (one ethanol molecule) and (two of ethanol's alternative states taken separately). Ethanol has 3.3219 bits per molecule. Given bond constraints, b(ethanol) - {(b)methane + (b)formaldehyde)} = 3.3219 bits. So there is 3.3219 more bits of information in an ethanol molecule than there is in a chemically uncombined methane and a formaldehyde molecule. If some magic reaction combined the methane and formaldehyde to form ethanol or dimethylether, that reaction would have increased the information content of the system by this number of bits, because the possible number of compositional isomeric configurations has increased from none to 10. The specific product molecule is "more informed" because it is less likely against its new, larger backdrop of possible configurations. Does this make sense? Well, no, because I have considered only the bond-conserved compositional isomers of ethanol. Really, the information difference lies in the difference for the total [p available to all of the molecules. If I knew that huge [p for ethanol in its entirety (from the sum of all those states and more which I couldn't work out) I could calculate b for it; [p for formaldehyde is 12; [p for methane is 10, and [p for ethanol is at least 40. Let's also look at the information change for ethanol when it is totally information depleted, that is, deconfigured down to its constituent atoms, as would happen if we heated it to some obscene temperature at which no covalent bonds could exist, that is, plasma temperatures. This technique is used in certain pollution disposal technologies, where complex molecules are totally denuded of chemical information, by feeding them slowly into the bore of a gas jet of an induction plasma torch which has a flame temperature in the vicinity of 10000 Kelvin. I want to avoid talking about combustion in this example because I don't want to add new oxygen atoms to the total number of available information states in the system - I simply want to talk about totally stripping a molecule back to its component atoms with no heed to the satisfaction of their valency requirements. Normally in a plasma torch the atoms recombine into things like oxides and such when the exhaust gas cools down. Place one ethanol molecule in such a device and heat the bejeezus out of it, and you can look spectrometrically into the plasma and observe that there are only signatures for elements (I will ignore ions here) which of course are the two carbons, six protons and an oxygen, all of which have a total of one mono-atomic chemical state available to them, namely C C H H H H H H O (gas) they can store no configuration information chemically because, under these conditions, the bonds which would encode such information cannot survive. We can write a pretty denuded chemical description of this system; we have ironed out any possibility for isomers or molecules, there are only atoms and no chemical relationships between them. There is one possible state so [p =1 : 2 carbons 1 oxygen 6 protons The chemical information content of this system is zero since b = log 2 1 As soon as you let it cool down it'll start to form polyatomic products; if you do this in an inert environment (say, argon, or a vacuum) the system is necessarily constrained to produce only the things in the list of states open to the ethanol. If you do this synchronously to n ethanol molecules you have to account for the possible interaction of the atomised components of all the additional ethanol molecules and your empirical formula will be n (C2H6O) since you'll have a much wider range of configurations available to your system. We can use this mono-atomic dissociated state as an information-free reference against which to compare the information content of the simple products. Looking at the energy change going from these monatomic elements to ethanol will generate you a figure describing how much energy it took you to store some number of bits in a molecule. >>>>> calculate this: see what the heat of formation is. Other things being equal, lighter atoms, with higher valences, will have greater numbers of configurational states open to them, so materials made of them will tend to have a higher information content. This is good. It means complex, chemically based organisms don't have to be really heavy. Life as we observe it seems to be made mainly of elements which have high ratios of valence to atomic number. It also explains why carbon chemistry is a natural informationally rich enough platform where complex chemicals and living systems can evolve. Looking at the periodic table in an information-systemic manner tells us a lot about the chemical information content of allotropes of elements on their own. Sulfur can exist elementally as S4, S8 rings, or polymers of various lengths, whereas oxygen can only exist as O2 and O3 with its chemical bonding characteristics satisfied. The information content of in these cases is more and more determined by how many atoms you have rather than how many ways they can combine. For a bulk metal, where the chemistry exists in a sea of distributed chemical bonding (a "gas" of electrons) it's hard to differentiate how many bonds are shared by what atoms, and in any case you have lots of metal atoms so [p is very large. So for these I would like to look at the information content of their nuclei, by looking at their isotopes. In this sense, even light elements contain information. H exists without, with one or with two neutrons on it, helium will have one or two neutrons in it (so can encode one bit), lithium can have 3 or 4 neutrons on it (can encode one bit) and so on. In contrast, beryllium (N=4) , fluorine (N=9), aluminium (N=11), phosphorus (N=15), scandium (N=21), manganese (N=25), cobalt (N=27), arsenic, yttrium, niobium, rhodium, iodine, cesium, praeseodymium, terbium, holmium, thulium, and gold are all elements with which you cannot encode information at a nuclear level, so none of these are used for radioisotope dating, which extracts information from the proportion of isotopes in a rock, unless used in conjunction with the presence of other elements which have decayed to these mono-isotopic elements. Let's see what we might encode in the gas Xenon. Back in 1962 we thought it had 9 isotopes all of which are stable (though these days we know many more isotopes exist, with varying half-lives). Someone tells me they'll send me some chemically constrained information, and sends me a tube of gas. Then I discover with my scientific instruments that they have generously provided me with 1 atom of say, xenon-124). How much information was encoded with one atom of inert gas? It's 1962 and we knew there are 8 other possible kinds of xenon in existance (some more probable than others) and I have been sent one of them. I don't know which one of them, but that doesn't matter. I have been sent 1 xenon atom which can only be any 1 out of 9 possible isotopes, so [p =9 and the information content therefore is: b = log 2 [p If I did this experiment in 1962, our noble-gas cheapskate sent me 3.16 bits, but I don't know what any of this means outside of the fact that I was sent 1 particular xenon atom and not any other kind of xenon atom. If I wound the clock forward to 2001, where we know many more (say 16) isotopes of Xenon exist than _we knew_ existed in 1962, then surprisingly, without even doing anything to the Xenon atom in my tube of gas, the information content of that atom has increased! This is because outside of that tube, there have been more discoveries made about the number of isotopes of Xenon known to exist, so as far as our knowledge about the specific xenon atom in the tube is concerned, the probability of the xenon atom being a specific isotope is now reduced by some amount, making that fact that it _is_ a specific isotope a more useful thing to know (since if I were to measure it again, the measurement would be more tricky because I'd have more isotope alternatives to choose from). [p has grown to 16, so b=4 bits. Our xenon-gas cheapskate has taught us something interesting: you can learn more about some systems without even directly interacting with them. Above we mentioned that it is possible to calculate the change in information content by calculating the difference in information content for the products and reactants. Let's do this for a slightly more complex molecule. Take a simple reaction where you start with a bunch of identical monomers, and polymerise them using something like a Ziegler-Natta catalyst. With polymers, I might have mentioned, you have a range of products, with a distribution of molecular weights centred around some average... you don't generally get identical molecular weights in the products because the chain length extension of a given, growing oligomer is partly random in nature. Generally length increases as polymerisation reaction time increases because you permit more monomers to add themselves to the end of the nascent chains. The molecule I'll use here for exemplar purposes is polyvinylchloride. Chemically the polymer description is for an average unit length of n monomers. For common homopolymers, you might have n = 20,000... these are very long molecules. However, like I mentioned in the example about rocks and information loss in averaging, this average ignores the actual lengths of each polymer. It might be that if you actually measure the lengths, you end up discovering that the smallest polymer is only 15,000 units long, and some of them are actually 25,000 units long, and that in-between you have all other possible n-lengths of polymer. We should state that there's no n=1 length (monomer) product left unreacted, and will also assume that the polymers are totally straight-chain linear products, ignoring tacticity and any funny branched or cyclised products. In addition, since the monomers are two-carbon units, which lengthen the polymer chain by a total of two covalent C-C bonds per monomer added, we're increasing the molecular length so that although we can have any n between the set range, all of the molecules, regardless of n, will have even numbers of C and Cl atoms on them! We started off with millions of monomers, with only a few chemical configurations available to them (for example, only their own cis- or trans- stereospatial configurations). Say we used dichloroethylene, which has two possible configurational isomers (because the haloatoms and hydrogens can't rotate about a double bond) shown below: Cl Cl Cl H | | | | H-C=C-H and H-C=C-Cl cis- trans- Even if we only use _one_ of these in the reaction, say the cis- form, there are still two configurational states open to this particular bunch of atoms (which has the empirical formula C2H2Cl2), so the information content encodable by them is one bit, represented as 0 and 1 by the presence of cis- or trans- configuration. For this example I will ignore all the other bond conserved states like states like (Acetylene + Cl2) or (chloroacetylene and HCl). We do the reaction, and totally eliminate the monomer by chemically incorporating it into polymer molecules, and create 10,000 new possible configurations in which the polymerised monomer componentry can exist (n ranged from 15,000 to 25,000, remember?). We have just changed [ for this system from 2 to 10,000. a gain of 9,998 states. In effect we have generated a number system (based on lengths of molecule) where we have a set of ten thousand numerals available to us! So long as we only pay attention to the backbone chain length (and ignore the squillions different crinkly spatial configurations, and varieties of stereospatial additions which can occur per each addition of another monomer, which would in real life be attainable by the polymer) and especially ignore all the possible isomers of a single polymer molecule, then the information content (in bits) for a given polymer, short or long, in this system, is: b = log 2 10000 Which is very big, something like 13.32 bits per molecule. If this system were numerical in nature, and not chemical, this would be amenable to run-length encoding and is a very uncompressed information storage system! Actually if we compare it to information polymers like DNA, which explicitly encode information in changes in their chemical sequence using n possible monomers in sets of m to give n^m possible states per codon, this polymer spread represents a system with n=1 and m=10000, thereby encoding any of 10000 states, explicitly, as the length of the molecule. For comparison DNA uses n=4 and m=3 and only encodes a maximum of 64 states per trimer (codon), then uses lots of trimers (trinucleotides). To encode 64 states in base-2 would need 6 bits, or in base-1 (that is, using length alone as your code) would need 64 primitive entites (we can't call these entities bits here because these only have one state!) all of different length. However because DNA uses a quaternary instead of a binary system (four possible symbols A,T,G or C, instead of two, which you can guess are 0 or 1) you encode 6 bits worth of binary data using only 2 bits of state for each monomer, three times, so you only need half the length of DNA to encode the same information as is explicitly embedded in a given length of our example halogenated polymer. So how much information is there in a mole of this polymer if we assume that they are all of length n = 10000? That is, we have 10000 sets of C2H2Cl2 and whereas in real life you get syndiotactic or atactic versions of the polymer product, in this case I will simply assume they're all chained together like so in isotactic format: Cl Cl Cl Cl Cl Cl | | | | | | ---C---C---C---C---C---C--- | | | | | | H H H H H H | n | n | n | Since C2H2Cl2 has a molecular weight of (12 x 2) + (35.45 x 2) + (1 x 2) each unit n has a weight of 70.9 and a polymer with 10000 of these weighs 709,000 AMU, which is a pretty heavy molecule. A mole of these weighs a shade under three quarters of a tonne (709kg) and possesses the following information content: 13.32 bits of information per molecule x 6.023x10^23 molecules, roughly equallying 8 x 10^24 bits 9.56x10^17 megabytes 9.33 x 10^14 gigabytes 9.12 x 10^11 terabytes per mole To get this down to a bytes per gram figure, we need to divide by molar weight in grams. 9.12 x 10^11 terabytes per mole -------------------------------- = 1286414 Tbyte/gram. 709000 grams per mole This is noticably greater information density than for a gram of ethanol, which was a much less informative 4.708 gigabytes/gram. Polymers are much more information dense than monomers. The change in bitwise information content, delta-b(polymer), from monomers to polymers in this system is : delta-b = (polymer) log 2 - log 2 10000 2 <----[p for dichloroethylene Interestingly, if you have a putative monomer with only one chemical state open to it, this implies that you can't polymerise it, since the log term on the right becomes undefined because log to the base 1 of any number is zero. Anything you could encode using such a (putative) monomer would have to be done in specifying the number of monomer molecules. A monomer can therefore be defined as a molecule which has enough alternative configurations available to it to enable it to be polymerised. This explains why you can't make polymers with backbone atoms other than polyvalent atoms without going to some pretty extreme lengths. We can say the change in information content per additional polymer unit is delta-b(n+1) = log 2 - log 2 n+1 n With each additional monomer added you add a lot of possibilities to the entire system, so as n increases, delta-b(n+1) increases, more slowly with the increase in n. So straight-chain homopolymers are inherently information rich, even if they are, from a configurational point of view, linear and boring. ---------------------- Lets look at two information-rich heteropolymers, namely polypeptides and polynucleotides, which have significant information-handling roles in living systems such as ourselves. We will ignore [p for the individual monomers, which I'm sure are massive and unwieldy, and instead look at [p for the encoded _sequences_ on each. Given our new tools we can compare them for information density. An unmodified peptide, fresh off the ribosome (or for that matter, fresh off the peptide synthesiser) has 20 possible monomers, these are the essential amino acids. For a given length n, this means that a peptide has n 20 possible states. Plugging various values for n into this system we get n states 1 20 2 400 3 8,000 4 160,000 5 3,200,000 6 64,000,000 7 1,280,000,000 which is pretty huge... there are more than one and a quarter billion possible heptopeptides. If we look at DNA, we see there's a different information content, since if we ignore the codon system and look at it entirely as a homopolymer, we get a system with only 4 states per homopolymer so this gives us, for a polymer of length n, n 4 possible states. n states 1 4 2 16 3 64 4 256 5 1024 6 4096 7 16384 which is nowhere near the more than one and a quarter billion possible heptopeptides. For different homopolymers of length n, we can compare information content, using b = log 2 [p and in this case constrain [p to the radix of the system, which, on this planet, is 20 for peptides and 4 for nucleotides. The nucleotidyl numbers are nice and clean powers of 2. Ribosomes and tRNA do radix conversion. Wow! length [p(oligopeptide) b(oligopeptide) [p(oligonucleotide) b(oligonucleotide) 1 20 4.32 4 2 2 400 8.64 16 4 3 8000 12.96 64 6 4 160000 17.2 256 8 5 3200000 21.6 1024 10 6 64000000 25.93 4096 12 7 1280000000 30.25 16384 14 These numbers tell us that, for example, a heptopeptide can be any one of 1,280,000,000 other heptopeptides, and to encode in bits the same quantity of information as is encoded in this 7-amino-acid peptide, you'd need 30.253496664 binary bits. It's nice to know that 2^30.253496664 gives us back a number equal to the possible number of possible peptides of length 7. Similarly a heptonucleotide encodes 14 bits worth of information. 2^14 tells you how many possible such encodings you can do in a 7-mer strand of DNA, which is 16384. Polypeptides therefore, length per length, contain much more information than polynucleotides. So how can we fit the information for a polypeptide into a polynucleotide? Well, we cheat a little bit, and use more nucleotides for each state encoded in the polypeptide. Proteins are significantly shorter than the genes which encode them. The number of possible states in the peptide chain calls the shots as regards how much DNA is required to encode it, hence how the genetic code will be built. If in evolutionary history there was a time when there were only 8 essential amino acids, we would have a system where for a polypeptide you had a very restricted number of available states, and a given polypeptide of length n would have the following state table : n 8 n states 1 8 2 64 3 512 4 4096 5 32768 6 262144 Subsequently you could encode all the possible amino acids using only 2 DNA bases, because number of states per 1 amino acid = 8 = (2 x number of states per 1 DNA base) As it currently works, we use the following code system: n 3 20 mapped onto 4 , where for a polypeptide of length we encode in a 3n length polynucleotide, and which we know as "The Genetic Code." It is one of several million ways to pull off the task, and it is something of a mystery a why it ended up the way it did... for example, it's very clumpy, that is, amino acids tend to be encoded by similar codons, though there's no mathematical reason why they need to be distributed in this way. Here it is: (adapted from Henderson's Dictionary of Biological Terms, 11th edn, which I note erroneously assigns AAA and AAG to asparagine, omitting lysine) TTT phe TCT ser TAT tyr TGT cys TTC phe TCC ser TAC tyr TGC cys TTA leu TCA ser TAA --- TGA --- TTG leu TCG ser TAG --- TGG trp CTT leu CCT pro CAT his CGT arg CTC leu CCC pro CAC his CGC arg CTA leu CCA pro CAA gln CGA arg CTG leu CCG pro CAG gln CGG arg ATT ile ACT thr AAT asn AGT ser ATC ile ACC thr AAC asn AGC ser ATA ile ACA thr AAA lys AGA arg ATG met ACG thr AAG lys AGG arg GTT val GCT ala GAT asp GGT gly GTC val GCC ala GAC asp GGC gly GTA val GCA ala GAA glu GGA gly GTG val GCG ala GAG glu GGG gly It is interesting that the most information-rich and energy-expensive molecules in the DNA code are things like tryptophan, and these also tend to have low redundancy in the DNA code. I arrange these in what I consider to be increasing order of [p below; I notice that S is tetravalent in all these cases so can be treated as C for [p purposes. Amino acid empirical formula/weight [p b redundancy in code glycine C2H5NO2 / 75.05 4 alanine C3H7NO2 / 79.0 4 serine C3H7NO3 / 105.09 6 aspartic acid C4H7NO4 / 133.10 2 asparagine C4H8N2O3 / 132.12 2 cysteine C3H7NO2S / 121.16 2 threonine C4H9NO3 / 119.12 4 proline C5H9NO2 / 115.13 4 valine C5H11NO2 / 117.15 4 glutamic acid C5H9NO4 / 147.13 2 methionine C5H11NO2S / 149.21 1 glutamine C5H10N2O3 / 146.15 2 leucine C6H13NO2 / 131.17 6 isoleucine C6H13NO2 / 131.17 3 lysine C6H14N2O2 / 146.19 2 histidine C6H9N3O2 / 155.16 2 arginine C6H14N4O2 / 174.20 4 phenylalanine C9H11NO2 / 165.19 2 tyrosine C9H11NO3 / 181.19 2 tryptophan C11H12N2O2 / 204.23 1 I have my suspicions that there might have been a system of 16 amino acids, with a state table built for a system of length n of 16^n, which is still tremendously diverse, and it could have been comfortably encoded in a DNA system using only two bases per amino acid, since 4^2=16; It would have meant that to copy DNA, all other things being equal, would take only 2/3rds of the time it currently takes, and also only 2/3s of the resources and energy; genes would be 2/3rds the size of the current ones, and such a system would be 3/2 times faster to read than the current one, but there would have been certain problems insofar as any errors in the DNA would necessarily mess up the protein for which they encode. A living system attempting to undergo an evolutionary transition from a 2-position genetic system to a 3-position genetic system would face a catastrophic event as it would necessarily introduce massive numbers of (frame shift) errors into the resultant proteins. However it is interesting to note that although 66 percent of the amino acids encoded originally might bear no relationship to the originals, 33 percent would be read as they originally were, provided that in the new system the third base position was ignored. It happens to be that in the present system, there are eight amino acids (valine, alanine, threonine, leucine, serine, leucine, glycine, and arginine) that are at least fourfold position-3 invariant, and almost all of the rest, except for methionine and tryptophan, arise from a translation system which treats the third base as if it had only two values... so you might say in several cases, the 3rd base IS ignored (Francis Crick noticed this and proposed the wobble hypothesis to explain its workings). In a system where there is significant chemical or physical similarity between side chains on peptides, or where only a few peptides are critical for enzymatic function, this might not be an insurmountable transition. What kind of 4-base, 2-position-per-codon system might be a precusor to the system currently employed? Since we can only be confident about amino acids for which there was at least 4-fold redundancy, I think it might look like this: U C A G U ? Ser ? ? C Leu Pro ? Arg A ? Thr ? ? G Val Ala ? Gly What do we notice? No cyclic amino acids (trp, tyr, phe and his), no amino acids containing sulfur (met and cys), and the amino acids asn, asp, glu and gln are also gone. Observe that all but one of these amino acids (arginine) are on the cheap end of town, with regard to their [p. Perhaps life in such a system lacked freely available, information-rich molecules to incorporate into itself and had to function with this restricted set. Ok, fine, but I suspect this is mainly due to something else; In this system above, most of the amino acids encoded by U and A in our existing system are not encoded for at all. Nor are there start or stop codons, ATG, TAA TAG TGA, which are disproportionately endowed with T and A. Under such a regime, perhaps gene expression switching systems had yet to be implemented and it was more beneficial to any replicator to simply be in a constitutively active state, able to immediately take advantage of whatever bases happened to appear. If we then assume the genetic code ever operated without the benefit of A and U, that is, instead of operating a 4-base, 3-position system we operate a 2-base, 2-position system, we get this: C G C Pro Arg G Ala Gly It is noteworthy that in making an evolutionary forward transition from this 2-base, 2-bases-per-codon sort of system to the 4-base, 2-bases per codon system proposed earlier which includes U and A, we add no frameshift errors, so adding new bases generates a code system which is backwards compatable with the previous system. This holds true if additional pairs of bases are added. Given that many different nitrogenous heterocyclic ring systems exist other than the purines and pyrimidines currently used in DNA and RNA (for example pyrazine, benzimidazole, indole, quinoline, imidazole, and piperazine) and given that making a transition from a 2-position system to a 3-position system is an informationally very error-prone step, why do we not instead have a DNA system which operates using six bases, say, A, T, C, G, X and Y in 2 positions? We'd get 36 possible codes (below), which is more than enough for the 20 proteins we encode in the present system, and any existing ones in a 2-position system would maintain their original function: A T G C X Y A AA AT AG AC AX AY T TA TT TG TC TX TY G GA GT GG GC GX GY C CA CT CG CC CX CY X XA XT XG XC XX XY Y YA YT YG YC YX YY I think the answer lies in error-tolerance. In making a transition from a 2-position system to a 3-position system, evolution would select harshly against those systems complex enough and developed enough to be susceptible to significant amounts of errors such as would be introduced in such a transition, so any organism, or for that matter any molecular replicator, successfully making such a transition would bring with it a significant tolerance for errors. It would not do this deliberately, of course, but it would nevertheless exhibit the property as an accidental artefact of the way it was encoded. Error tolerance is a significant advantage if you're a replicating data system competing from profligacy against systems which lack error tolerance. In addition, the 3-position, 4-base system has, on average, 3-fold error tolerance, in comparison to 0 error tolerance for a 4-base 2-position system. A 2-position, 4 base system is inadequate for 20 proteins; and even if it did successfully function with 15 proteins and a single stop codon, it would be very brittle to errors : due to the total lack of redundancy in the code, any error in the DNA would certainly give rise to an error in the peptide. Changing from a 4-base 2-position system to a 4-base 3-position system also neatly avoids the problem of having to evolve any new genes or biochemical pathways for the synthetic routes required to produce one's own new kinds of new nucleotides- any existing software (genes) for this purpose will continue to be adequate. After all, living systems have a considerable materials and energy investment in the wetware in which they run their software. As it is, though, we have 20 states per amino acid and are constrained to fitting this into a system with only 4 states. What happens, as has been mentioned, is that living systems use more than one nucleotide per amino acid. We need at least 20 states which we can fit into whatever system we want to encode the protein. 1-base per codon DNA has only 4 and 2-base per codon DNA has only 16 so we need three bases per codon, which gives us more than 3 times the code space we need. In fact we could encode 32 amino acids with almost double redundancy (50% error tolerance). This compression from DNA to peptide is lossy; given a peptide we can think of several possible DNA sequences which would encode for it, since the genetic code is degenerate (several trimers, or rather, codons as they are called, encode an amino acid). However, the peptide system is very brittle. It has no error tolerance at all. DNA using a 4-radix 3-position monomer has significanct error tolerance subsequent to the number of states it can be in versus the number of states actually encoded for in the system. 64 DNA states encode 20 amino acids which means, on average a code for a peptide will have two others encoding for it, (64/20 is slightly more than 3) so from the peptide's point of view, it has triple redundancy. It is curious to note that nature has not chosen to spread this redundancy around equally across the 20 different kinds of amino acids, so some are better protected from errors at the DNA level than are others. Another question arises. If proteins are so much more information-rich than DNA, why not store the genetic code in protein format? We'll probably never know the answer, though it is known that we can damage DNA as we presently know it and can some of the time expect that the encoded protein data is not functionally changed, or that such damage can be repaired, and that these advantages do not accrue to proteins, after all, damage a protein and it stays damaged. I can't see any reason why protein chemistry couldn't be the basis of some kind of long term molecular information storage system - a living system could conceivably get away with a single-stranded homopeptide with information encoded on it by, say, the state of chemical modification are of the peptides themselves - for example, which sugars, isoprenyl molecules or other prosthetic groups are stuck on the peptide backbone - however, that's obviously not the way nature played it out. Why it turned out that DNA was the information storage molecule and not peptides, we'll also probably never know, but its obvious that to encode 4 bases on a sugar-phosphate backbone was a logistically simpler feat than encoding 20 bases into a backbone, since a DNA system also needs fewer synthesis pathways to operate than would an equivalently powerful protein-based encoding system in the early stages of evolution. In DNA, you only need those pathways required to synthesise and polymerise four bases, a phosphorylated sugar, and their precursors; with proteins you need equivalent molecular and information infrastructure byt for twenty different amino acids, some of which (say, histidine, tyrosine and tryptophan) exhibit structurally of similar complexity to DNA's heterocyclic bases. This brings us to the observation that the simplest way of encoding one protein in another protein is to just copy the existing one. Proteins are fundamental to the process of copying DNA and RNA and making other proteins, so why couldn't it just be implemented that way? There is an immediate possible disadvantage: as soon as a lone primordial protein stumbed across the configuration required to catalyse the assembly of other proteins like it, immediately, all the available raw materials for protein synthesis would be consumed in the manufacture of more of this protein, probably to the exclusion of proteins capable of doing anything else. Wouldn't it? Well, no, unless this protein was absolutely error-free in its reproduction of itself, and this is very unlikely. The error-prone nature of replicating information systems guarantees variation over long periods of time, which in turn guarantees evolution. Which is a good thing - there would be planets covered in megatonnes of identical self-replicating molecules, autocatalysed into existance by the happenstance appearance of the first of their ilk - and no planets anywhere upon which spacefaring life could evolve to find other planets so afflicted. The same problem applies to (tautologically named) nucleotide-based replicators. Of course, in speaking about nucleotides and peptides I reveal a kind of information-polymer centrism which is quite rampant throughout biochemistry. There are catalogues stuffed full of a whole range of cofactors, vitamins, and other smallish molecules which, as far as we know, do not exhibit the grand skill of evolutionary adaptation over time, but which are just as important to its operation. Lipids, ions, porphyrins, ketones, in fact anything which isn't a part of the sugar/phosphate/heterocycle data storage engine could be put into this category. From a macropolymer point of view one might consider the purpose of nucleotide-based life to be to get itself replicated by means of employing proteins and a bunch of other molecules. It seems to do this very well and is a fair comment. However, one could postulate an equivalent small-molecule point of view, stating that DNA and living biological replicators do their reproduction, adaptation and evolution-through-time trick simply for the purpose of keeping this library of small molecules extant. This argument mirrors the wry observation that from the farmer's point of view, cultivated maize exists to sustain the farmer, but from the cultivated crop's point of view, the farmer exists simply to assist the propagation of certain cultivars of maize. To say any of these things is to be correct but is also to miss the point. The point is, these systems exist to embody information transmission and transformation. Each operates in tandem on behalf of the information encoded within them. The end to which they operate is another subject. ------------- There is another way to speak about this. I talk about bits per (molecule of possible molecules). What about the number of molecules you need per bit? I think there is a reciprocal relationship between entropy and information here. If we consider log 2 [p as the bits per molecule then it might be useful to ask about molecules per bit. That is, instead of saying how many bits do I encode in one molecule given a range of permutations available to it, I can say how many molecules I need to encode a cardinal number of bits. We could encode on our ethanol system a maximum of 3.32 bits. We might only need 1 ---- of an ethanol molecule to encode a binary bit. 3.3219 On the other hand, to encode the same binary bit we'd need only 1 ----- of a polydichloroethylene polymer in the system discussed above. 13.32 This describes an information density per molecule, which might be useful for the comparison of molecules and their information content. (apply for: 4) an enzyme 5) an isomerase: Does an isomerase (such as Glucose-1,6-bisphosphate isomerase) actually change the information in a molecule? No, it does NOT change the quantity of information in a substrate molecule, (since [p(substrate) and [p(product) should be equal) but it does change the actual information itself. Bonds have moved around, after all, so we have changed the state of the system, though not the bitwise quantity of information it carries. But didn't the enzyme add information to the reaction? It did, of course, and this information was reclaimed when the reaction was finished. It is interesting to map the information content of the system as we watch the isomerisation occur. Course of reaction | delta ([p) w.r.t enzyme | delta ([p) w.r.t. subst -------------------------+----------------------------+-------------------------- [p(subst) + [p(enz) | 0 | 0 | | => [p (subst + enz) | [p(subst) + [p(enz+subst) | [p(enz) + [p(enz+subst) | | Since [p(subst)=[p(prod) | | => [p(enz ) + [p(prod) | 0 | 0 Note that [p(substrate+ enzyme) is massively larger than either [p for the enzyme or the substrate alone. What this also means is, that for the period during which the substrate is bound to the enzyme, both [p(substrate) and [p(enzyme) are temporarily expanded by astronomical numbers of new states. If we ignore their combined [p for a moment, then with respect to the increase in [p for the enzyme, the increase in [p for the substrate is much larger - it gains access to the huge [p suite intrinsic to the enzyme, whereas the enzyme only gains access to the much smaller [p intrinsic to the substrate. The same calculus can be applied to a real system, say, ethanol and a theoretical enzyme which deprotonates it and turns it into an aldehyde. An enzyme which does this exists and is called ethanol dehydrogenase. H H H H | | | | H-C-C-O-H --edh--->H-C-C=O + H (leaves out NAD+ and NADH) | | | H H H Tautomerism is interesting and informationally realted to isomerase function, insofar as it happens with no requirement of any additional information from an enzyme. It can also be dealt with by the Shannon approach. If a molecule has permitted to it [p states, instead of saying that only one of these states can be occupied, we can say that more than one of these states can be occupied and calculate the information content accordingly. It happens that acetaldehyde exhibits keto-enol tautomerism: spontaneously changing into vinyl alcohol (right) and which turns back into acetaldehyde (left) H H H-O | | | O=C-C-H <=> H-C=C-H | | H H We proceed as usual; First determine the possible permutations for the empirical formula (in this case, C2H4O) which possess the same number of bonds as the tautomers, and as far as I can tell, they are: The ketone (seven bonds) The enol (seven bonds) Cyclic single bonded 2-carbon with bridging oxygen + 2 hydrogens on carbons (7 bonds) Acetylene + water (7 bonds) Cyclic double bonded carbon with bridging oxygen + 1 dihydrogen (7 bonds) Cyclic triple bonded carbon with bridging oxygen + 2 dihydrogens (7 bonds) Methanol and a CH2 (7 bonds) Ethynol and dihydrogen (7 bonds) There are other configurations available to these atoms but we lose or gain a bond somewhere so I'll discount them for this molecule. Methane and carbon monoxide (6 bonds) Formaldehyde and CH2 (6 bonds) This time we have to treat one of the molecules as two since it can be considered to effectively occupy two states at room temperature. States occupied = 2 Possible states = 8 and calculate from there. So the information content of this molecule is: b = log 2 [p b = log 2 8 b = 3 bits; however the tautomer occupies 2 out of 8 states so we actually have 7 states available in which to store information (if we consider two of the states indistinguishable or quantum-mechanically indeterminate). b = log 2 7 b = 2.807 bits. Another consequence of information theory is that complicated molecules have greater information content and are thence more difficult to make than simple ones, which is something any synthetic organic chemist will tell you... as complexity of the product increases, yields go down, synthesis times go up, number of unavoidable wasteful side reactions increases, reactions which will do specifically what you want but only that (and not something else to your precious intermediate) become more difficult to choose, and so on. This is what astounds me about living things - they routinely, with great specificity and efficiency, synthesise insanely complex molecules which humans cannot. To assemble polymers, polymer scientists and nature exploit modularity: they have a bunch of monomers (ethylene, amino acids, nucleotides, etc) lying around, and pre-determined ways to assemble them, so they only have to change the numbers and kinds of monomer to increase the polymer information content enormously, rather than find a specific way to synthesise each polymer. These tools are invariably catalysts, about which I will have more to say later. Rigorously deriving [p for a given molecule is something I'll leave up to the hard core chemical math heads, but it comes down to the sum of all the possible chemical bonding and physical configurations of the atoms in a molecule or group of molecules, provided only an empirical formula and conserving the number of bonds present in the molecule in which you are interested. There is some software from Germany, Molgen, which will generate all the possible structures from a given formula but it does not calculate compositional isomers for a given formula. Given that even small numbers of atoms can combine to produce enormous numbers of different molecules given the constraint that they all be in the one molecule at the time, the removal of this constraint as is done when determining the number of compositional isomers to enable us to determine the information content of a molecule, would generate a much larger space of possibilities. One other thing: the products of complete combustion, such as HCl, CO2, H2O, NO2 and so on, when looked at individually, and in terms of their ability to hold information in the chemical sense, are very denuded in comparison to the molecules from which they originated. Combustion is not an information preserving transformation as is as isomerism. >>this may be shite. Check with Lisa Israel Another interesting thing to note is, the more complicated the molecule, the more and more closely it approaches what physicists call a black body - which is a theoretical item which absorbs all of the energy which falls onto it. This is because the molecule embodies within its structure more and more ways to absorb energy. What it does with that energy once it has absorbed it is dependant on several things. It might re-radiate it at a different frequency, or it might actually vibrate itself to pieces (which of course changes the molecule and its ability to absorb any more energy). What it won't do is reflect it with no change. Simple molecules tend to ignore most of the radiation thrown at them, and this actually helps characterise them. Big, complicated molecules become harder and harder to characterise - trying to get an IR signature from a protein crystal is possible but slightly uninformative, insofar as the protein is made up of many similar amino acids all giving off very similar signals, preventing you from knowing much about what specific part of the molecule was responsible for what part of the signal. For large homopolymers you have to treat them statistically, by their average molecular weight. It is an even more nasty job for heteropolymers, such as peptides or DNA, though certain kinds of reactions have now been developed which enables you to know what parts are where. Something else is worth noting here, and that is that we can finally get a grip on what it really means when we speak of entropy. Looking in the thermodynamics texts for a decent definition of this has not turned up a lot of satisfying entries, so I'll stick my neck out a bit and postulate it. When we combust (oxidise) a block of carbon I take a system which has two total possible configurations. I take a block of coal, which is chemically carbon and which is also a solid, that is, its atoms are localised, so we know a lot about the location of the carbon atoms there in the block - they don't randomly disperse themselves around the room like gas molecules do. I take a gas, oxygen, which is almost invariably found in the dioxygen form O2. We start in a system where we have 1) information that the carbon atoms are localised 2) information that there are delocalised carbon atoms 3) infomation that there are delocalised dioxygen molecules. Chemically this system has two states on one side, where we have C and O2. (There is also phase information, which we can ignore for this example, but it should be mentioned what phase-information is. Suppose you have a mole of water molecules. In the vapour phase, they are known to take up 24.5 litres of space at 25 degrees C. If this space was in a great big syringe and you did work on that vapour by pressing the syringe plunger until the total volume of the syringe was halved (to 12.25 l), then you have raised the information content of that gas because you know twice as much about where it is because the molecules have lost 12.25 litres of space they could possibly be in! Further pressure and cooling would convert it back to 18 grams of liquid - a total of 0.018 litres, which compared to water vapour is a very concentrated deposit of water indeed. What has changed? Phase - solids, liquids and gases are all different states of matter, charaterised by how well we know the locations of the constituent matters. Making the transition from gas to liquid we know more about where the gas molecules are because they're now localised in a smaller volume of liquid, though we don't know where they are relative to each other because molecules in liquids are characterised as being free to move relative to each other. Making the next transition to a solid we increase the information content still further by fixing the molecules next to each other in time and space - their positional relationship is not randomly changing all the time like it was in the Brownian maelstrom of the liquid phase. Expose one to the other, and I heat them up (making them vibrate with increasing violence) and eventually get them over the threshold required for them to react. Combustion is an information-lossy reaction.... if you take some of its products you can't say much about what they were before the combustion happened. Combustion in some circumstances is also incomplete, which means that not only do you not convert all the carbon to oxide, but there's two oxides you can make. Also, you produce soot, which is simply uncombustedcarbon, finely granulated. On the product side of the reaction we have information that the carbon atoms have delocalised, in the following way: It has fragmented into small chunks of soot (some of which are fullerenes) It has oxidised into CO and delocalised as a gas It has oxidised into CO2 and delocalised as a gas The oxygen molecules were delocalised, in the gas state, to begin with, so you haven't really lost any information about where they are, though we have combined them with other atoms, so there will be information change there. We have also opened up a range of positional and chemical possibilities to the carbon atoms. They can now be dispersed around the room, incorporated into fullerenes or soot, or given two new possible states to occupy in a chemical sense, partly oxidised to CO or totally oxidised to CO2. All the atoms concerned, therefore, chemically speaking, have the opportunity to occupy new states as a result of doing the reaction. The carbon especially so. The oxygen is also given new states it can occupy, in various stoichiometric combinations with the carbon. We had two states, C(solid) and O2(gas) to begin with, now we have C(solid) C(many fullerenes and soot configurations) CO CO2 We assume incomplete combustion so oxygen has had one possible state taken away from it - the right opportunity for existance as a free dioxygen molecule. In this entropy-increasing reaction, the total number of configurational states on the product side is larger than the number on the reactant side. It can contain more information because [p is larger, but extracting that information will be harder since you have to extract it from more possible states and hence be able to differentiate between them, which may require time and energy. In this example, there are many more possible states as it happens, since we turned a solid into a gas, and a gas is a system with a lot of possible states. In addition, it is very difficult to co-erce the system into a configuration where it has only the original, fewer, number of states available to it. You'd have to separate any intermingled carbon and oxygen, fuse single oxygen atoms into pairs, and also condense the carbon back into whatever format it was in before you burnt it. To do this efficiently would take exactly the amount of energy which was liberated during the combustion, and would also represent a decreas in the number of total states the system could occupy. In a numerical sense, this is a radix reduction. Put another way, most processes in the universe increase the entropy - they increase the number of possible states available to the universe, but of course only a few of these states are ever occupied. The universe likes to increase the number of configurations which it can choose to be in. To add entropy to a system, therefore, is to increase the number of configurational states it could possibly occupy. Naturally a system can only occupy one state or group of states at a given instant but the number of other possible states it can occupy will be much larger after you add entropy - so-called disorder, to it. To add to the number of possible states is to raise the total information content of the particular system which it *does* occupy, because to do so makes each possible state more informative in terms of the others which it doesn't occupy. Adding or subtracting energy to or from a system might change the state that a system is in (therefore changing the actual state, but it might not change number of states which a system _can_ occupy, so the total information content of the specific state is not changed. In a numerical setting, you lose no information by converting from one radix to another, but you do change the entropy per symbol (taking a decimal value and exercising a change from base 10 to base 16 will preserve your number but the entropy per digit changes from 3.32 to 4 bits per symbol. Consequently you can describe larger quantities with fewer symbols. In a chemical system, changing the system's energy might change the configuration of a specific molecule (by invoking, say, a conformational isomeric change) but until you reach an activation threshold, and do a chemical change, the molecule's particular information content (b) with respect to its possible information content (determined by [, above) doesn't change. When the reaction occurs, [ will change, so b will change. In the ethanol/plasma torch/inert exhaust chamber example above, the entropy of the stystem doesn't change, because the total number of states available to the atoms in the ethanol does not change. Energy in that system is being absorbed or emitted to change the information in the bonding configuration, not to add to it or subtract from it. If the configuration states are different then there will be an energy change, obviously, but we should not confuse this with the energy change accompanying the information change associated with combustion of ethanol, which not only reconfigures the atoms in the ethanol, but which also increases their entropy by providing more states for them to be in, by making available the possibility of doing interactions with additional atoms of oxygen. Carbon dioxide is a notable absentee from the plasma-torch list of possible products, simply because in a single molecule of ethanol there's not enough oxygen available to make such a product from the available atoms. The stoichiometrically complete combustion of ethanol is: C2 H6 O + 3O -----> 2 CO2 + 3 H2O 2 We've added six divalent atoms to the mix, enormously increasing the possible configurations available to the atoms on the left side of the equation. I'm not even going to try and work out [p for the system on the left, but it is very much larger than for ethanol on its own. You could do the same reaction with two ozone molecules instead of two dioxygen atoms, for starters, and any molecules or combination of molecules satisfying the empirical equation C2H6O7 (including a range of what might be considered incomplete combustion products, like aldehydes and CO and hydrogen) all contribute to this new chemical possibility space. Here's an example, oxidised to the maximum extent: HO OH | | HO-C-O-C-OH | | HO OH --- So much for information content. Something else I noticed was, as we got closer to unambiguously describing actual molecules, something funny happened, we could only speak about it in terms of what we think we can _probably_ know. We couldn't know exactly where are the atoms but we could know where the atoms were _likely_ to be; we could not know where the electrons actually were, but we could know what the probability distribution of an electron was... you tended to find it hanging around on this part of the molecule more often than this part of the molecule. See the ORTEP picture above? Those spheres show how atoms relate to each other, and where an atom _might be_ but NOT where it _is_ in absolute space. Whaddya mean, probability distribution? In plain English, the probability distribution of say, your prize pair of sharp kitchen scissors, generally includes any place where some member of your family had a cutting job they wanted to do and then dropped the scissors upon completing their task, plus wherever a person might currently doing a cutting job with the scissors. You don't know *which* place, but you know intrinsically as you leave the kitchen to go looking for the scissors, that they might be in the laundry where someone needed to cut an unravelling thread on their clothing; they might be in the bathroom next to an incriminating pile of offcut toenails, or, say, they might be with someone somewhere in the garden, currently being used to chop the sex organs off innocent flowering plants. P(scissors) = p (bathroom) + p (laundry) + p (garden). When you don't know where they are, in this case you have an idea where they might be, but to get any more specific than that you need to look in each of the places. But it can be more complicated than that... a person with the scissors might move around and take the scissors with them. If you have a lot of flowers and historically you've discovered the scissors more often than not are left in the garden, the probability distribution for the scissors is biased mainly towards the garden. When it's winter and the flowers are gone, the probability distribution might become biased back towards the laundry or somewhere else. When my bicycle was stolen, its probability distribution became: P(bicycle) = p (somewhere in Australia) and worse, p (somewhere in Australia) had no implication that the bike was still was all in one piece, rather than dispersed as a bunch of parts (wheels, frame, chainrings, pedals, cranks, headlights) each with their own probability distributions. Scientists don't say their bicycle is lost, they say it is "delocalised". In the latter sense, they don't know exactly where it is, but have a bit of an idea. My bicycle, in technical terms, was permanently delocalised. Since p (somewhere in Australia) is very large I could potentially spend my lifetime searching for it. Bicycles are made of squillions of atoms so it is reasonable to talk about them in fairly broad terms, a bicycle's location can be specified easily without going to idiotically precise lengths of description. "It's chained to a pole on the corner of street X and street Y" narrows it down to four bits of footpath, and you're fine provided that you can recognise a bicycle and it's the only one there chained to a pole. This problem becomes more thorny when you chain up to a pole to which many other bicycles are chained, because there are several things which fit the description. It then becomes a matter of specifying the bike - it's the one with a plastic chicken head on the handlebars - hopefully the other bikes lack plastic chicken heads on _their_ handlebars. There is a potentially endless quantity of information you could include to assist someone in discriminating a specific bike from hundreds of others. However, when we get down to the quantum-mechanical level, where things are difficult to see because they are so tiny and changable, we can only speak in progressively less specific terms about progressively more specific things, because the parts are all pretty much identical. For example, electrons are subatomic particles which are common to all matter and they can be induced to pop off into an evacuated space if you persuade them with a sufficient electrical field, heat the material up to a certain temperature or bash 'em with certain kinds of light, for example, ultraviolet. One electron looks the same as the next, though they can have different states : always negative charge, spin up or spin down, this or that velocity or direction. There is a limit to how much information you can encode on a single electron since there's really nowhere to put very much of it. When these items were initially discovered their behaviour was thought of in much the same terms as billiard balls; round things with mass, speed and the tendancy to move according to the presence of certain kinds of fields (gravity perturbed the movement of billiard balls; whereas magnetic and electric fields bend beams of electrons, and still do in, amongst other places, just about every television set in the world today). But then something else was noticed. They'd sometimes behave like a wave. People couldn't figure out which one it was, and they arrived at the conclusion that how it behaved was mostly determined by the nature of experiments you performed on it. You could not say of an electron, "it is at this position" AND "it has this velocity" but you could say of an electron, "this electron is *here*, *now* but I don't know if it is actually slightly moving at all" OR, "this electron is moving at *this* speed but I can't put my finger on exactly *where* it is exactly *now*." You could not simultaneously say claim awareness of both aspects of its behaviour. This was in distinct contrast to a billiard ball which some people can observe as it rolls across the green felt surface and intuitively know about its behaviour well enough to enable the existance of snooker championships. Snooker with electrons would be a shitfight not just because they repel each other and stick to an electrically neutral cue, but also because you could never prove you'd pocketted one in the corner pocket anyway, since for it to be pocketted its velocity in the pocket must be zero; you know its position exactly (it's supposedly in the pocket, after all) and hence you cannot prove its velocity is zero, so it might possibly still be floating around on the table somewhere. The nature of the information we can and cannot extract about the electron is not just data per se, it also encodes the relationships between those data, in this case, the more accurately we know about the velocity of the electron (that is, how its position changes during some period of time), the less accurately we know about where it actually is. There's only so much you can know about what a single electron is doing. Our brains are used to dealing with big fat chunks of matter which have average values and group behaviours with which we can grapple... your dog, for example, will not quantum-mechanically tunnel into the next room, although, according to some tricky branches of mathematics, there exists a chance that it could spontaneously do this if you waited for a very long time, since the stuff of which the dog is made (subatomic particles) can do this when individuated and placed into a position where it can exhibit this property. To tunnel from room to room, dogs have to resort to bulk methods, using their paws to raise to certainty (and hence, make into reality) the probability that a bunch of dirt will go away and no longer represent a barrier to them. Photons - discrete chunks of energy moving at the speed of light - confused the hell out of us. We were so accustomed to being able to chop things into smaller and smaller sizes to find more and more finely graded information. It worked for a while but it had to end somewhere. That epistemological brick wall is the intrinsic nature of information. Eventually, the best answer we could get about a system is not "How much a photon a wave or a particle" but a simple "Yes, it is a wave" or "Yes it is a particle". This is a discrete answer, an answer containing the smallest possible quantity of information - a bit. Nature leaves it up to the investigator to interpret the answer. The primary thing to remember is, that the photon is in this case acting as a carrier of information about the photon emission source, and whatever you do to it between emitter and detector will change the nature of the information it carries. As happens, the photon can behave as both particle and wave. You can use it to encode information and it will exhibit either either of these behaviours, each behaviour being a consequence of what you are watching for when you or some kind of detector tries to look at it. The double-slit experiment is named the wrong way - it should be called the multiple-identical-photons-treated-the-same-way-at-different-rates experiment. Young in 1801 (two hundred years ago) did an experiment, the double slit experiment, where he took a zillion photons emitted from a point source, and let them pass through space to a barrier where they pass through, or fail to pass through, two distant parallel slits. Some of them continue past the slits and make a pattern on a screen beyond. He did this experiment and discovered that instead of a pair of slit-shaped illuminations, there's wave-like interference pattern from photons emitted, reaching the screen via two paths (one or other slit). Wow! But when he emitted these photons one at a time from the photon source, they would gradually build up the same interference pattern. How could they do that? They were discrete lumps, emitted at separate periods of time and which knew nothing about each other right? Yes, yes and no. A photon is a teensy self-propagating disturbance in the local electric and magnetic field strength of a region. They are started by all sorts of things, like collapses of atomic nuclei (this produces Gamma photons) or the oscillation of an electic field (depending on what the oscillation speed is, this ranges anywhere from microwave to radio, including light). Photons operate in push-pull mode and at right angles to each other - as the magnetic field collapses into space it generates an electric field, and when that subsequently collapses it regenerates the original magnetic field. They're out of phase to each other and this defines a direction in which they travel. Hence a photon is the unit of information propagation via electromagnetic radiation and it carries information about the circumstances under which it was produced. But how much information does a photon carry? A photon in most circumstances has one frequency, which is determined by the energy packed into it when it was generated, and the relationship between the frequency and energy was deduced by Planck, when he came up with: E = hv Where v is the frequency of the photon, and E is the energy of the photon, h is a constant, tiny amount in Joules per second. High frequency photons, with short wavelength, have more energy than ones with low frequencies and long wavelengths. We assume W to be the bandwidth of a fixed-frequency photon and that it is necessarily equal to one, since the photon only has and can only have one frequency. The interesting bit it the signal to noise component, S/N. A photon will either be signal or noise, never both. If the photon is "signal" (that is, meaningful) then it's S/N ratio is undefined, since the noise term N is zero. If the photon is "noise" then its S/N ratio is still undefined, since the signal term is O and dividing that by any value gives zero. A photon has a total Shannon channel capacity of C = W log (1 + S/N) 2 which in log format, becomes 10 C = W log (1+ S/N) 10 ------------- log 2 10 So for a signal photon with bandwidth = 1 you get: C = 1 x log (1 + (1/0)) 10 ----------------- log 2 10 so C = 1. On the other hand, for a noise photon with bandwidth = 1 you get: C = 1 x log (1 + (0/1)) 10 ----------------- log 2 10 which is undefined. Fair enough. It's noise, by definition it carries no information in which you might be interested. This has interesting consequences for interference. Interference was the name given to the pattern Young saw on the screen in the double-slit experiment two centuries ago, and to get it you had to meet some interesting requirements. The light had to be from the same source (coherent : either synchronised in emission time or phase), the light had to be monochromatic (all of one frequency, therefore its bandwidth was equal to one) the intersection angles had an upper limit, and the photons couldn't be plane-polarised at right angles to each other, in other words you had to configure them so they all had much the same information in them. The two weird things observed were these: 1) if you took away one slit you got a standard radiant emission pattern and no interference. 2) if you did it one photon at a time you got the fringes anyway. Let's look at this experiment in an information systemic manner. What the slits do is perform a logical operation on the information in the photon stream. We can construct systems of logical gates which implement the photonic gate logic implemented by Young, which gave rise to his interference pattern. Which was, interestingly, a patterns of zeros and ones, where 1= detectable photon and 0=no detectable photon. Here's how it works. The light source makes photons. The inverse square law says how they should propagate themselves, which they do radially from the light source. No changes are imposed on the photons when they do this. Spatially, though, the inverse-square law represents a logical operation known as a fanout. If the inverse-square law is correct then the same information is being dispersed across a large area. Intensity is a function of photons per unit time. If you do it one photon at a time, you'll naturally have to wait a long time until you have enough photons to do all the logic and give you your interference pattern. Naturally you lose some intensity as you increase the distance between the detector and source but that doesn't matter as long as the photons are detectable (that is, have a carrying capacity of at least one bit per some period of time during which you run your detector). Fanouts take one bit and make lots of identical bits all carrying the same information but with less intensity. These fanned-out photons eventually reach the barrier with the slits in it. We can concieve of this as a huge wall of single-input, single-output logic gates. Most of these are NOT gates. Some of them are also identity gates. If we have one slit only, a photon goes through and keeps on fanning out as required by the inverse square law. It eventually reaches the screen, which I consider to be a bunch of AND gates. single | = NOT slit : = IDENTITY logic D = AND | D | For the same reason, it also makes sense to have a bottle labelled "vanilla" on the kitchen shelf rather than . Learning all the precise chemical names for everything is a waste of time and effort if a simpler and adequate naming system exists... plus you get funny looks going into milk bars and asking for things which sound like instructions for manufacturing illegal drugs. It is nevertheless important that the label is sufficiently discriminating: four jars with the label "Vomeronasal vanilloid receptor VR-1 adherent" is very informative to a biochemist, in senses, but it could contain zingerone, the spiky flavour you know from ginger beer, vanilla (synthetic or natural - chemically the same molecule), capsazepine (a synthetic material designed to stop the VR-1 receptor from telling the nervous system about anything it detects) or capsaicin, the stuff the police spray on you when you attend a street protest about capitalism and also the active principle of Jalapeno chilli peppers. ---------Chem: catalysts. The next thing which intrigued me about chemistry was catalysts. We were told that they enabled a reaction to proceed more easily. Same reaction, same products, same yields, just more easily and quickly. That is, the reaction would have blindly proceeded anyway if you hit it with enough pressure, heat, or whatever, but catalysts assisted a reaction, you didn't have to heat 'em up or blast them with radiation in the microwave oven. Catalysts somehow knew how to make it work. And they didn't get consumed in the course of a reaction (or if they did they were recreated in equal amounts). The materials were simple metals like platinum or nickel, or in the case of some organic reactions, simple compounds like sodium hydroxide or ammonium acetate. The catalysts assist millions of reactions per second. They are used over and over again to convert molecules from one format to another, in much the same way as a programmable calculator might add millions of pairs of numbers and the addition componentry would not be worn out by the process. The clue lay in biochemistry. Living things are stuffed full of catalysts specific to various biotransformative reactions, and living systems could not operate if these catalysts didn't do their work. I attended a lecture by John Barrow which filled me in. An enzyme, he mentions, uses various means to attract specific molecules to specific positions on the enzyme, where these molecules - called substrates - are then spatially oriented relative to each other. They then have their electron distributions deformed by various means; once sufficiently distorted, the substrates will react chemically with each other, forming a product molecule which is then released to the surrounds. The catalyst reverts to its original format and lures in another ignorant couple of substrate molecules for the purpose of informing them how to react with one another. Catalysts quite literally tell molecules how to go about reacting. They add positional and configurational information to the reagents, information which predisposes the reagents to undergo a reaction they'd otherwise have to figure out by blind chance in a random brownian soup of precursors. Catalysis = information provision to a chemical reaction I mention [p, which has to do with an empirical formula and the number of ways you could combine all the atoms in such a formula to produce different molecules. However [p only takes into account the information embedded in chemical bonding. Another interesting thing to know about a molecule would be the informational equivalent for a hamiltonian of a molecule. A Hamiltonian expresses the energy of a system... how it vibrates, rotates, translates, and how energised are its electrons above the ground state. It might be useful to generate for all the [p members for a given empirical formula, a larger description including not only the number of ways the atoms could be bonded, but also the ways all the compositional isomers could be energised, twisted, stereospatially configured, and how all their electron spins could be configured, electrons dispersed over conjugated Pi-clouds, rings could be stacked, etc. Even for small molecules this would be a biiig number but would represent the information-carrying capacity of an actual individual molecule. Again if one were to express this bitwise using Shannon's work, one could ascribe to a molecule a number of bytes of storage capacity. Note that this is different from a Hamiltonian which says nothing about configuration, but which does say a lot about energy state. When you form chemical bonds the net energy flow is into the molecule, and breaking them releases the energy required to form them in the first place. The quantity of energy in part represents an energy cost for storing the spatial configuration of that bond, and the atoms involved in forming it, in the newly formed molecule. information density = information number/mol.wt. In general, lighter molecules with higher valences have more configurational info in them. Also: how much information does a catalyst provide to a reaction? Catalysis, with respect to the catalyst, an example of a reversible computation, insofar as the catalyst reverts to its initial state after the chemical transformation of the relavent reagents is achieved. 2H2 + O2 ---> 2H2O is catalysed on Pd. Also: Yeast has 1megabase of DNA, and exists, as far as brewers are concerned, to convert sugar into ethanol. This would seem to be a very large amount of information into which to encode the instructions for doing the conversion. However, the yeast it snterested in encoding how to make more yeasts and how to run all the other aspects of its metabolism, which, as a byproduct of its so doing, enables it to make ethanol at all. Theoretical limits to information density. A black hole stores a lot of information, though not in a format one can readily read, its only parameters being its mass and spin. Bits are the answers to the simplest possible questions. proposal: Binary systems yes,no Ternary systems yes,no,maybe quaternary systems yes, no, maybe, your question is stupid. gases/elements contain very little information. Contaminants encode information. See: forensics. Elements require lots of energy to purify them from raw materials, which are full of complicated contaminants (zone refining, smelting, etc)... energy is required to _change_ the information content of a system, that is, increase *or* decrease it. Increasing temperature (ie, increasing the intrinsic kinetic energy) raises the probability of an information change. (Free air is a misnomer... this is a product of a complex bioprocess) proteins are highly information enriched. Ch.2 : What *is * information? How does it relate to time and energy? Energy is that physical property of a system required to change its information content. All information changes exhibit energy changes. Corollary: energy-depleted systems preserve information very well....cryogenic storage of cells. Note that wherever there is information processing or transmission, energy infrastructure will be closely proximal to it : electricity in phone wires or fibre optics; also, energy required to drive fluid computing, or valves, or babbages engine, brains (note special self-preserving energy metabolism of brains : will not metabolise lipids, since this would damage the functionality of the organ, which is mainly made up of myelin, phospholipids, etc etc etc). The closest proof of their interrelatedness is, look at DNA and RNA bases: ATGCU, and tri/diphosphates ATP, ADP, AMP is primarily a signalling molecule for, amongst other second-messenger tasks, make more ATP. These molecules, in which is encoded the functional programming of the cell, are intimately involved in the energy metabolism of the cell as well. The DNA synthesis machinery depends on it : polymerases need high energy information depleted monomers ATP, GTP, CTP and TTP to make information-enriched, energy depleted xxxAGCTxxx. The bases provide the information handling through H-bonding pattern recognition, the sugar-phosphate end is intimately involved in energy metabolism. I suspect this is not a coincidence that the information and energy metabolisms of organisms are so intimately tied. Energy is used to perform "work" - that is, changing the information state of a system. Remember Fosp's example about doing work with a magnet on a magnetic material (magnetising it) by moving the magnet. Product of the total energy and total information content of a system is conserved (good to prove this somehow). Time within a system is measured in terms of information change. If dI=0 then dT=0 for that system, since no change has occurred in the information in that system and therefore no change in time can be measured. An interesting consequence of the elapsing of time is that measuring instruments gradually go out-of-calibration. A field is a volume of space time where there exists the tendancy for objects within that field to undergo changes in various aspects of their information state... their velocity, orientation (etc). Storage methods: a change in some configuration, rate of change of configuration. Energy raises the probability of information changing (activation threshold of reactions) ... why cooling things down makes them less noisy and less prone to rotting/DNA degradation. Where does the information go when you burn a hydrocarbon? Simple hydrocarbons dont encode much info; C2H6, C3H8 can only be one molecule. C4H10 can be CH3CH2CH2CH3 or CH3CHCH3CH3, ie can encode two states. Information cannot be created or destroyed. It can be rendered unable to be extracted from a system using other information. There is a point beyond which that information is extremely difficult to retrieve, ie, is theoretically irretrievable. It still exists but can not be used to, say, catalyse a reaction. Once a signal is submerged below the noise floor, it becomes unreadable. See XOR/random noise. Embodiment of information in everyday things. We don't even think about it, since we're so used to tools, but they all embody information about _how_ to do a job. A knife embodies in its shape information about how to concentrate a lot of force exerted on an area (handle), onto an edge (the blade). A bookshelf embodies information about how to take a volume and convert it into several volumes each with their own area. Bookshelves and skyscrapers partition a volume into a usable set of areas. Resistors tell currents how much heat to add to their surroundings. Quote peter pedals (Rainbow Power Company handbook): a bicycle is the perfect transducer between human energy output and typical transport loads. Passive data processing: lens embodies information about how to bend light a certain way. Your face embodies information about what quality of genes you have. A towel catalyses the dispersion of water molecules. Materials are full of information. Cells, microprocessors. The food you eat is full of energy, sure, but also loads of information embedded within the configuration of the atoms which make it up. Try eating the elemental atoms and see how well you go. A yummy dinner of CHONPS and trace elements. Yecch. The information content of a mathematical equation : the usage of the minimum amount of information to describe a process or relationship. Hence, is an equation the ultimate form of data compression? Are there rules determining the minimum form of an equation actually those which describe how much information there can be in a system? How does one know there are too many significant figures? Disinformation : rogue signals (the noise floor and how we raised it with hormone mimicing pesticides; growth cycle genes turned on all the time) and lies. Random noise and the surprising difficulty of generating it... there's theoretically always some information embedded in a signal. A truly random noise source in a transmitted signal at least gives away that whoever transmits it has a clue about the true nature of randomness. Entropy : the energy cost of extracting information. ------------------------------------------------------- For any transaction involving information there is a relationship between how much is required to drive it, wether or not it is able to be reversed, and how fast the information process actually happens. Entropy per bit can be increased two ways : lower signal strength or increase the noise. You are a forensic investigator and by virtue of your job, motivated to solve the puzzle of a suspected murder. At the scene of a task, a particularly disreputable wine-tasting establishment, helpful witnesses mention the victim had, unsurprisingly enough, a number of wine glass on the table, each full of a different wine. A wine glass on the table is pretty simple to accurately describe... to make a description of the wine and the glass, take a sample the glass and chemically describe it, also sample the wines and describe them through the use of a good liquid chromatogram. Good instrumentation exists to do this. To remove any prejudice in the wine taster, the bottles were all carefully stripped of their labels, the contents poured into identical flutes and then meticulously washed prior to recycling - no association between bottles and content remains in the kitchen. The glasses were placed on the table near identification labels which bore no indication of which wine was which or from where. The wine taster died tragically at the wine tasting, amidst sips. A suspected assailant is a disgruntled vintner, not there at the time, and the theory is that the vintner poisoned the plonk which the wine taster was expected to taste. The forensic team's job is to confirm or refute this theory, but there can be no idle speculation, for there's a long stretch in the slam, awaiting anyone pinned with the motive, method and opportunity. During the victim's mortal thrashing about, his current glass, along with several others he used, and others he hadn't, is knocked off the table; they shatter on the crusty carpet tiles, dozens of kinds of wines splash about and go into the gritty weave. Things need to be ruled out : was the wine poisoned? If so, with what, and which was the fatal sip? Or did the poor chap die of something else? Due to the messy circumstances, deducing these becomes hideously complicated. The forensic scientists come along and spend a LOT of time and energy mapping the stains, sampling the fragments and splashed wines, attempting to determine if, amongst the years of historically and recently accumulated gunk in the carpet, these fragments of glass belong to the victim's critical glass or other innocuous ones; what sort of wine it was (one has to determine if victim were drinking the wine rather than spitting it out), and was there any poison, and if so in which ones? Their measurements and analyses now have a lot more noise in them... these other things in the carpet, other identical glasses, several kinds of spilled wine admixed on the floor, bits of grit and sediment and things which the wine has absorbed from the underlay, these things make the extraction of meaningful data from the measurements much more difficult. The critical information is still there, the wine flavour molecules and glass fragments and poison molecules have not gone away, but they cannot be associated at all, since there have been so many other synchronous spillages of wine and breakages of similar glass, and this makes it hard to discriminate a suspicious flute from amongst all of their combined detritus. The meaningful signal is drowned out by meaningless signals and no amount of energy and investigation and sensitive equipment will ever help you recover it. It is at this stage that the forensics chief holds her head high, goes to court and says to the jury, that the entropy of the available pieces of information exceeded their information value, that is, for each bit of useful information they wanted to know, there were several other available bits of information they didn't want to know, and, due to their identicality, no way to differentiate between them. This is why the best place to hide a specific needle from someone else is not in a haystack, but in a pile of other, almost identical needles. I say almost identical because, if you have a pile of needles identical to the needle you're looking for, picking any of them will do. This introduces the concept of redundancy. I worked at a second-hand record shop, which had thousands of 45RPM vinyl records. As I found them they were, essentially, random, and then I sorted them into alphabetical order by surname of artist and placed them on the shelves in the dingy back room. My employer's three-year-old daughter had a different but no less valid concept of order, however. She would take apart an entire shelf and sort it according to something else which I would never figure out until she told me. One day she took all the 45's in the F's-by-artist section and arranged them by the colours of the paper jackets in which they were wrapped. A week later she arranged them by some arbitrary measure called the prettyness of the patterns on the label, and I didn't figure it - she told me later. But of course, I tried to sort it out based on the words printed on the label, so I'd sort the increasingly familiar records and replace them on the shelf becoming increasingly adept at spotting small patches of the original F-alphabetical-by-artist sorting which the child had not quite removed. That was my main way of sorting records. That's what I'd do while I worked there, after all. Then one day I came in and she said she'd rearranged them, and I spent half an hour looking at them trying to deduce the order, and then she told me, as she ran out the back door with a grin on her face, that she hadn't rearranged them. The existing alphabetical order was staring me in the face and I had no idea. Now, any possible stack of 45RPM records takes just the same amount of energy to be sorted from one sequence into another specific sequence as it does to be sorted into _any_ other sequence, regardless of wether that sequence happens to contain some specific sequential order or not. I know this from having sorted millions of almost (but not quite) identical records. You just wouldn't know _what_ the order was until you spent more energy and time leafing through the records and trying to deduce it. The energy and time expenditure here to extract information was an result of the fact that I didn't know exactly how to find what I was looking for. Ch 3: Total information content of a system. Hamiltonians describe, for say, a tennis ball, all of the energy that tennis ball has due to spinning, being compressed, oscillating after being hit with a racquet, and speeding across the court. What about the information the tennis ball has? Tennis ball is a carrier of information (throw a tennis ball at someone and see if they think so).... information about where it is, how it moves, what it weighs, how incident light should reflect off it, how much resistance it should provide when the dog chews it. Limits to what information can be known. Information is itself the currency with which transactions about the nature of information must be performed. Recursive self-reference is an artefact of this : when you ask a question about information systems, or the nature of information itself, you will necessarily get an answer encoded in some kind of information system. Ultimately all enquiries into epistemology crash against this "nature of information" barrier, in much the same way as classical determinism - the idea that once you knew a system's initial state you could know everything about exactly what it would do in the future - crashed against the uncertainty principle, which stated that you could know one aspect in greater detail but only at the expense of gaining ignorance about other aspects. See also Douglas Hofstadter's "Metamagical Themas" for some great Quines and self-referential sentences. Consequences of Turing and Godel. Godel undecidability is symbolically represented by godel's theorem, but the actual undecidability is embedded in the nature of the information it describes. Turing's machine will never exist for long enough to prove that a proposition is undecidable, what it can do is prove that a proposition is decidable and therefore, by implication, a proposition can be considered undecidable if not proven decidable. The Turing test faintly annoys me, because it does not test intelligence as an intrinsic, self-defining function of the system suspected of exhibiting the intelligence. It tests two other things : how well one system can convince another system into believing its intelligence, and how poorly defined intelligence is for the purposes of the Turing test. Because the Turing tests one system, it in terms of what another system thinks is intelligent, the Turing test has a sting in the tail: you can turn it around and prove the examiner is not self-aware or intelligent. In a very Zen kind of way, and as anyone who teaches can tell you, just as an examination paper tests the students, so too the students test the examination papers, and, for that matter, test their teachers too. Turing's test examines the susceptibility of one system (which considers itself intelligent) to persuasion that it should attribute whatever it considers intelligence to another quite possibly very different system; It does not test the intelligence of the system in terms of its own intelligence and there is hence no need for self-awareness in the system under test. To pass the Turing test, therefore, there's no need for intelligent system, only an algorithm complex and adaptive enough to fool the examiner. I would fail this test, for example, if I interacted in the Turing test setting, with deterministic, uncreative machines (eg: my 80486 computer running an old operating system called DOS, or my programmable Hewlett-Packard calculator) and then came to the conclusion that they think : I'd have been fooled into saying I attributed creative thinking, and self-awareness, to a data processing system when in fact it lacked these properties. Yes, they might be capable of processing data in complex ways, but no, they're not intelligent or self-aware. The Turing test embodies humanocentrism within it - a human has to decide if another system possesses at least human intelligence. Exactly which human should make the decision has never been revealed, and this is instructive: I observe that human children, who are deemed to be intelligent, are often convinced that simple, unintelligent, microprocessor-driven devices are alive and conversing with them (for example, stuffed toys with onboard voice synthesisers). I also observe that some people believe they are interacting with an intelligent, empathic, computing device when in fact they are simply typing sentences into a mindless, keyword-driven canned response program called ELIZA running on my aforementioned 80486 computer, which bears a microprocessor renowned in some circles for its architectural stupidity even when compared to other unintelligent microprocessors. A subsequent aspect of this is that several kinds of humans fail the Turing test, including yery young children, very old adults, certain kinds of dyslexic persons or persons with various sorts of impaired brain function (including the people staggering past my premises on their way home from the pub). Another artefact of the Turing test as proposed, is perhaps that the only person fit to deliver the Turing test is the kind of person who thought of it in the first place, and we are now without the benefit of his existance. So instead I propose another, more general and hopefully less specist test, which humans have also passed, and which does not depend on anything other than the native intelligence and self-awareness of the system under test. I'll call it the Neuroanatomy Student Complaint test, since this is the circumstance in which the test appeared and was quite unconsciously passed by neuroanatomy students everywhere, including the ones who fail the course. I observe that humans have big and complex brains and that they have become big and complex enough to simultaneously exhibit behaviours such as scientific enquiry and self awareness - that is, human brains study the workings of human brains. Human brains doing this for the first time tend to voice the observation that neuroanatomy is a mentally taxing and rigorous subject, which is correct. Lecturers and tutors then point out that if human brains were simpler and therefore easier to study, we wouldn't be intelligent enough to study our neuroanatomy. Put another way, simple brains aren't smart enough to study their own function, learn their own structure. I suspect information processors attain some kind of threshold complexity and ability, once they are equipped to understand their own implementation, they can be considered intrinsically intelligent. So, provided it was never explicitly programmed to say it, I await the day when, say, a room full of rack mounted, parallel CPU's running some sort of distributed, evolving, genetic learning algorithm, or perhaps a fistfull of nanotechnological goo, transmits that "studying my own functional processes creates considerable load and may not be tractably calculable", and thereby demonstrates it is self-aware, and aware of the size of the task it faces in studying its own intellectual infrastructure. Humans are the only species to exhibit this awareness. The next level of intelligence, if there is one, will not complain about the difficulty of studying its own structure, since the structure will be fundamentally simpler (not a evolutionary throw-together of serendipitous molecular-level workarounds and make-do's as ours is) and subsequently more efficient, so studying it will be easier for two reasons, increased simplicity of the system under study and increased power of the system studying it - simpler because it will be partly of its own design. The threshold for the Neuroanatomy Student Complaint test will vary between intelligent systems but will, if quantified, never go below a certain value. The Halting problem is known to people who ask the following question - "why is something always in the last place you look?" The answer, mundanely enough, is that the act of finding the thing you looked for is one of the conditions which might arise and if it does, you consciously decide to finish your search, because the item for which you were looking is not lost any more. I guess you could go looking for all the other places in the house that it might have been, even though you know where it is, but that would be kind of silly. Unless you suspected that the one you had found was a duplicate. Newcombe's Paradox explained. Ch 4: Some basic behavioural properties of information technology. Information processing. What it is, in general.... information transformations. copying, comparison, boolean logic. Why life is active (dynamic, self-evolving), rather than passive and static (it could be argued that a lens processes information, but a lens, by any definition, is not really alive). Storage vs processing. I/O. Why it's good to have redundancy. Ynot2k- why we didnt crash at 12/12/1999 (errors never exceeded our capacity to repair them, and didn't erode our capacity to repair them beyond the critical threshold, either). Some basic problems with error-catastrophe agents when used against viruses. How to get around pathogenic obligate pathogens in the long term. Why it's good to have diversity of OS's and plant strains (robustness). Thermodynamic inevitability of microsoft's demise, from an open-source point of view (more open source programmers vs ms programmers, and the requirement similarities of OSs everywhere). What they're selling is the ability to run apps. Waste heat. Reversibility. Minimum cost of information transactions (ref: Landauer, Shannon, Feynman) Ch 5: Information in living systems. Genetic code: In general, those amino acids most enriched in information are the ones for which there is the least coding redundancy (degeneracy). Also look at one-base mutations... do they generally decrease the amount of energy req'd to synth peptide OR synth amino acid? Cells=chemical Turing machines. Affymetrix. G-proteins. Promoters. Parallelism of function (eg: roughly 30000 ribosomes per euk cell). Interprocess (intercell) communication. peptides. Hormones. Bandwidth and processivity and why it is allocated where it is (eyes, nerve tracts, neocortex, penis). Bandwidth is allocated where lots of data needs to go over evolutionary time. Visual system bandwidth vs auditory system bandwidth vs olfactory system bandwidth. It never fails to astound me when I'm listening to really good music on really good HiFi gear, just how much information I'm getting off the platter... fingerprints rippling on guitar strings, etc, transformer hum in the guitar amps, etc. How my hearing is failing, (less sensitivity to signal, more endogenous noise). Vision amazes me even more. What bandwidth is allocated to things like the gut? Fibre tract bandwidth between Broca's and Wernicke's area, the hemispheres, etc. The old channel, still used: Whereas nerves are speedy and narrowcast, blood is slower and broadcast: blood transports hormones, and even acts as a heatsink, dumping the waste heat generated by the information processing tasks performed by various organs; transports peptides, materials, energy. CSF and lymph are other fluids whereby signal peptides, ions and other information-bearing small molecules are routed hither and yon. If one tries to conceive of the total information throughput of a cross-section of a large artery, it dwarfs the channel capacity of even large neurological data pipes such as the corpus callosum or the spinal column, simply because of the *trillions* of bits of information intrinsic to the configuration of the materials being fed through it. The astounding throughput of diffusion-limited enzymes, and what they do (they spatially configure reagent molecules, distort them electrostatically, setting up the conditions required to do the reaction, then let them diffuse away). Throughput of nuclear pores ofr RNA, throughput and bidirectionality of DNAses (ssDNA dependant RNA pol... in euks, 0.5-5kbases/min ...slow compared to prokaryota, but we use 25,000 replicons in parallel, means duplication of 330x10^6 bases copied in less than three minutes). The fact that these proteins execute the bulk of the information processing load required to operate a living system, and do it very very fast... turnover numbers for some proteins (e.g. catalase... 40 million peroxide molecules broken down per second!). Also: throughput of porins doing active transport, and the computational nature thereof (moving an ion from one side of a membrane constitutes a state change from 0|1 -> 1|0 where | is a membrane. Error tolerance (c.f. PC's and OS's generally) Error correction. Xeroderma pigmentosum is a disease which demonstrates the importance of fixing errors which are introduced into the DNA of a given cell. The existance of the proteins involved in the execution of these proteins can certainly be taken as evidence that living sysems have been selected in partly on the basis of their ability to correct externally induced errors in their own DNA. The disease itself manifests itself in several ways, for example, sensitivity to the DNA-damaging ultraviolet part of the electromagnetic spectrum. free radical scavenging, correlation with mammalian aging, and what happens when errors aren't corrected (cancer, apoptosis, hayflick limit). Why we have a CNS AND a Distributed Nervous System. Why a spherical processor and a data bus (spine?) in chordates (ref: heat dissipation, conway's law, bandwidth)? Why no sensory neurons inside a brain (want no noise?). Note: brains as we know them are only one possible solution to the problem of acquiring a darwinian-probe processor (which happens to be able to be self-aware). Others may exist. Organs: task-optimised information processors. Livers, lungs, muscle, are all the same basic stuff, but optimised to to specific information tasks. Eg: muscle changes the positional information state of the organism, for example the heart continuously changes the positional information of the contents of the blood. James Lovelock was, to my knowledge, the first to refer to kidneys as an information processing organs in his ground breaking work Gaia. He's absolutely correct, of course. A quantifiable cost energy cost is incurred in running it. However, can we meaningfully quantify how much information it processes, before we waste all of its good work against the cold ceramic glaze of the nearest loo? Taken from an information processing point of view, the kidney, and every other organ is a marvel. There it is, intricately ducted; the membrane-embedded proteins within it, and indeed, in every cell, doing exactly the work originally postulated for Maxwell's Demon - in this case, sifting our internal juices for particular salts and molecules and selectively reabsorbing precious molecules of water. These functions are reflected in the organisation of the cells in the tissue. It might not seem obvious that taking a bowl of alphabet soup and, say, carefully removing the letter M from amongst the others is, in fact, to do computation, but it is. It's a kind of operation called sorting, and many different algorithms exist to do it. Nature does it a lot, in cells, but it doesn't chase alphabet soup letters around with a fork. It sorts small molecules - ions, sugars, nucleotides, that sort of thing. And it is important that it is done correctly. People who used to work in the linotype industry did exactly this job, assembling metal blocks bearing different letters and numbers, in a very particular order. These people had 26 lowercase, 26 uppercase, 10 numerals and a fistful of punctuation blocks to place into order. Per font. Part of the way you can tell that this was an information processing job is that it has been replaced by computers which do all that work automatically. If a linotypesetter had 64 such blocks from which to choose per position then each letter could be specified by 6 bits of information (that is, log 64 bits). 2 One can similarly quantify the information processed, in bits, intrinsic to the transportation of each specific molecular species dissolved in the available cellular soup, by taking account of all the different possible types of molecules available to be transferred across a cell membrane, and then looking at which specific type (or types) of molecule a given transporter actually transports across the membrane in which the transporter is embedded. Shannon's description becomes useful to us again - the number of possible messages becomes the number of symbols from which a choice of a single symbol must be made by the transport protein. If I supose there are 100 species of molecules to be transferred - say, a whole family of small ions, sugars, lipids, nucleotides, vitamins, amino acids, whatever - and the transport protein in question happens to specifically only shove *one* species of molecule across the membrane, then it has selected one symbol from a hundred alternatives. So the protein has, by selecting one molecule for transport from these 100 others, performed log 100 2 bits worth of information processing in the course of deciding which specific thingie to move from one side of the membrane to another. In this case, it's something like 6.64 bits per molecule, or more usefully, 664 bits per thousand molecules it moves across the membrane. Maybe this is an exaggeration, but perhaps I'll redirect any arguments about this point to people who do linotypesetting, or manual carpet weaving, or any other systematic manual labour involving repeated, precise sorting and selection. Suppose the transporter in question was solely devoted to pushing sodium ions across membranes. This is important in many kinds of cells, but especially neurons, both along the axons and also where at the synaptic junctions, the gaps where one nerve ending meets another. One must consider the speed at which this sodium transporter actually does its sodium-ion specific calculation to get an idea of the sort of information processing which is going on. One can routinely expect a single sodium transporter to pump out something like twenty million ions per second; even if we restrict ourself to a choice between sodium and potassium, (therefore, the decision facing the protein is - to transport one of two kinds of similarly charged ions) this still amounts to twenty megabits of information generated per second. Read that sentence again. It implies the existance of a bitwise 20MHz processor millions of years prior to humans appearing in their current format. Once that has sunk in, remember, there's millions of such sodium transport proteins embedded in the membranes of a single neuron. If we assume all of these transporters are functioning at the same time, then the total information processed per second, per nerve, rapidly climbs into the gigabits. And that's before you consider these very rough numbers in the context of brains consisting of billions of nerves and kidneys configured with Transport proteins are quite astoundingly efficient when compared to enzymes which do information transformation on big molecules at rates in the thousands of molecules per second, and I'd venture the opinion that they bear the brunt of the information logistics required to run a complex (or even simple) organism, even though the much slower, more complex DNA-transcription-translation and immunological processes get most of the media attention. Why pump so much sodium around? Redundancy creates reliability. A system relying on a single sodium ion to move across a membrane is exquisitely sensitive, to be sure, but also very brittle. What if that particular protein transporter was synthesised incorrectly and pumps nothing, or pumps sodium all the time? It doesn't matter so much if it is only synthesised incorrectly once for every ten thousand such syntheses, since its correctly-functioning neighboring transporters will carry the ions, carry the nerve impulse, and carry your thoughts identically well. It will come as no surprise that these things don't just run all day for the fun of it. They're often switched on and off by hormones, changes in electrical potentials across the membranes in which they are embedded, all sorts of different stimuli. Of course all this information processing comes at a cost. First there's the cost of synthesising the protein infrastructure which does the actual processing. Then the materials have to be motivated across the membrane. Sometimes this is powered by plain old diffusion - the tendancy of a bunch of concentrated molecules to become diluted (in these cases, the transport proteins indulge only some particular species in their wish to diffuse). More often it is powered by an electrical potential or is paid for by the destruction of energy-rich molecules. This helps to explain why, even though in some senses, kidneys are not as intelligent as brains, they're both metabolically quite expensive to run, compared to biological tissue which does not process quite as much information - for instance, bone, cartilage or, to take an extreme example, hair, which processes no information because it's metabolically actually dead. There's a twist: some kinds of destruction modalities in immune system cells exploit transport nonspecificity as a way to drive a rogue or foreign cell to its death. These cytotoxic T lymphocytes, for example insert a very nonspecific transport protein - perforin - into the membranes comprising the walls of other cells; Not only is it unspecific, it's deliberately not regulated. Loads of precious, assiduously accumulated and carefully synthesised molecules spew madly out of the perforinated (or, more accurately, perforated!) cell, in quantities for which no cell could possibly compensate. Faced with this massive loss of control of its own cytosolic information, energy and the raw materials both of those are contained within, the cell leaks to its eventual destruction. Why our humanity doesn't stop at our epidermis - intimate interactions with bacteria, viruses, and everything we eat. We contain an ecosystem within us anyway. Language as a solution to the bandwidth limitation between brains: c.f. the compression vs. bandwidth problem. Derrida mentioned that language (ie, compressed data in transit) has 0 meaning until interpreted. Clocked systems : circadian rhythms. A quick look into the nature of pesticides which we produce, compared to the ones made in nature, and why nature's ones are less prone to resistance. (note: the ones which were prone to be resisted by target pests tend not to show up in plants, or do some other job) Learning how to look... junkyards are full of interesting things. Junk DNA. A stack of records contains ... what? Why we're warm (how enormously much information processing we use to run our bodies). Compare this to how warm we'd be if we ran at, say, 75MHz. Why your head is warm... neuron density of human neocortex. Venous heatsinking requirement given head full of hair. Neuronanatomy. Why brains are centralised with a big bandwidth pipe at the end (spine). Self-descriptive belief systems (science) and self-encryptive belief systems ... religions which dynamically adapt , evolve so as to be unable to be killed by science (memes). See: Wilson, EO: On Human Nature (Belief systems generally, and religions especially, are therefore subservient to, and proof of, evolution and Darwinian selection). Language bandwidth is slow. This is why interpersonal relationships take a long time to get right. English has heavy redundancy, you can chop a lot out of it and it'll still work. No, dogs and cats do not possess what we call a language. Being shamelessly anthropocentric, language is a data communications protocol understood by humans. They have the rudiments. Primates appear to comprehend it. DNA by the way does not encode for a particular organism, it simply encodes for the set of conditions which gives rise to a particular organism and the processes needed to enable it to function. Free will vs. determinism. Parametric determinism. Twins... heavy bias towards determinism, but this does NOT mean that environmental influences and chaotic influences do not play any role. Some people might insist that there is no free will and hence the universe is deterministic, because, for example, nobody is free to transmute into a jellyfish, say. The biological constraints under which we operate leave considerable room for amusement and unpredictability. Twins probably find it quite patronising to be told that they do things *differently* Ch6: What we've built. Every day now a lot of humans routinely handle data sets biggger than their own genome and do it really fucking fast. Moore's Law, its limits, and its consequences for biological systems. AI (mark ward) ... but is it life? How we got here... animals, plants, microbes. The superscalar life forms: states, corporations, religions, and their unfortunate tendancy to ignore thermodynamic realities of resource limits. The observation that the government likes to place a tax on your thinking processes when they exact tolls on communications media like telephones or books. Q: What large society functions by making one decision every three years? A: None! Representative democracy isn't. Most of the decisions are made in the supermarkets, boardrooms, worksites and homes almost every hour of every day. There's no other way for a complex society to function. The vonneumann processor: does just about everything in the least optimal way. Ch7: consequences for the human race. Is this information stuff what really defines us? Yes. The fact that you can usefully apply information systemic terminology to humans and living systems in general is because the terminology is, in fact, appropriate to the properties exhibited by humans and living systems. The nature of brains (see: Pinker, language instinct, mind design; Chomsky's "Universal grammar," Brown's "Universal People", partly a consequence that we're biologically similar and hence possess similar brains; also a evolutionary consequence of the recurring similarities between processors and the tasks they need to do anyway, which can be built up into systems which can solve a multitude of problems. There are many ways to process information, but the fundamental logical and numeric (if any) operations will, in general, remain the same, so in general the number of ways needed to evolve solutions to these tasks will be small. Principle of inforrmation systemic developmental parsimony and the benefits inherent in developing information systems which scale well (perform well at larger sizes): Same as for reaping wheat : if a pair of clippers has worked historically, it is computationally easier to just produce a system with several hundred of them working in parallel (combine harvester) than it is to, say, engineer a harvesting device based on a powerful laser which severs the heads off the wheat as the harvester moves forward. Mention joke about toaster development. The fact that so many terms from the information technology we've developed apply so appropriately for the description of functioning of cells and brains and societies in general is, I think, exactly because they're all information ecologies of one kind or another. Genetic engineering, the relationship between biotechnology and software piracy and why we need to be very very careful. Humanity (the species) and humanity (the belief systems which it harbours) naturally co-evolve - we have a fear of self-modification but it is actually irrational, we've been doing it for centuries by mate selection and the engineering of societies where certain people get selected out for various reasons (eg: shooting politically troublesome intellectuals, committing creative writers to asylums, etc). In an authoritarian state, or conformist society, obedience tends to mean you might at least be left alone to reproduce. I think this is gradually being selected out. Information about humans: they're not auxotrophs, they need gravity and UV light and temperature range and .... not suitable for space. Stem cells and their future applications. Consequences of Moore's Law to cells. Freeze yours now. Why wiping out our ecosystem is a bad idea in terms of error catastrophe threshold. Ch 8: Death. Do we miss the body or the personality inside it? Consequences for abortionists. Immortality and do you want it? Viruses: the life they never had, you cannot kill what is not alive. Why I am upset by vegetables and their being kept alive, insofar as they do not even photosynthesise. We die biologically, but fragments of our personality are everywhere, some of them last for generations, so in some senses we are simply individual aspects of the possible solution-space for the possible idiosyncracies which make up humans, and bits of you are everywhere. You (your personality), as you know yourself, are trapped inside your body by the bandwidth constraints of your keyboard and mouth. This configuration is always dying and new bits being generated for as long as you are learning new things and forgetting old ones. Biological death simply means you stop learning and forget everything, irreversibly. You cannot notice this as it happens. Ch 9: Fair Warning. What does a personality aware of these things conclude to be the logical behaviour for the preservation of its information processing infrastructure? Decentralised distributed architecture: (implemented already by large populations) c.f. serial centralised architecture in processors today. redundancy : achieved by millions of the same species. ideo-homogeneity: achieved by massive broadcast bandwidth control Get off the planet. You may not be human any more in order to do it, but that's part of the selection process. Humans did not evolve to live in space. Whatever supersedes them will have to design themselves for it. Information systems evolve and are selected to improve themselves by Darwinian selection. The system with the most robustness, redundancy, adaptability, and distribution wins the right to be the developmental substrate for the next system. In the long term humanity will be surpassed if it hasn't been already. Look at the grunt on a Beowulf. Humanity's information system theorists, Turing, Landauer, Kilby, Conway, Shannon, Huffman, Lovelace, Babbage; Watson/Crick/Franklin, (etc), and its meta-alive immortal tyrannical organisations (corporations providing the infrastructure Intel, M$ Corp, kleptocratic governments), and democratic organisations (Linux) are just a step in their evolution. Ch 10: What's probably out there. Space is nasty. Anything that qualifies for the title of life out there is likely to be an artefect of its evolution and will, out of necessity, conform to the information systemic laws... redundancy. Robustness. Wish to improve itself using other information systems as steps upon the way including ours. The fundamental imperative of all information systems is to maximise speed, reliability, decentralisation, bandwidth, processivity (etc) so as to enable self-directed evolution, ie, the fabrication of any reality it might desire within the laws of nature. Not all of these will be self-cohesive. "This ship literally thinks what it wants and then it happens." -Riker One useful thing which I might attempt to provide with this work is a functional and rational replacement for the several thousand religions which have evolved on earth. This weltanschauung I provide here, the understanding of the universe as it evolved in this particular neural net, Dawkins' Thresholds as applied to information technology. They are: 1- replicator threshold : the arising of some sort of self-copying system in which there is at least a form of hereditary variation with occasional random mistakes in copying. 2- phenotype threshold : when replicators get to exert an influence on their surroundings by exerting a causal influence on the systems which they build. These influences are not heritable. Replicators survive by virtue of their consequences in the world. 3- Replicator team threshold. Groups of genes working together. 4- Many cells threshold (much larger possibilities are able to be achieved, such as large scale organisations into organs performing particular tasks) 5- High speed information processing threshold 6- Consciousness threshold, 7- Language threshold 8 - Cooperative tech threshold 9 - Radio threshold 10- space travel threshold