Posted by Twain on July 16, 2010

Chaitin, Wolfram, Godel, Taleb, data reducibility, linguistics, Semantic Web, Black Swans, J curve, Theory of U + Twain’s context paradigms to reach W (Web) equilibrium

Maybe I could have entitled it: “Twaining a Conscious Web”

I decided to write this post because of the passionate debates a handful of us have been having about AI, NLP and resolving the reducibility question. At the surface, this may seem strange or over theoretical but at its core it goes into how much more powerful and intelligent the machines we build can be and also whether there is such a thing as data consciousness, context and the ability to compute every theory.

If any readers think that 50-100 comments on a thread is substantial (and whenever I’m on a thread, they do seem to attract more comments from others — it’s quite bizarre how interested people become and want to actively contribute which is good), this seriously smart debate is a 530+ strong comments thread!!! So…………..at some point we’ll be able to solve the question of how the Universe was created — LOL.

Essentially, there are two key players whose work is being debated:

* Gregory Chaitin — data is not reducible and there are some things we just can’t know.

* Stephen Wolfram (of Wolfram Alpha) — data is reducible and we can compute the answer to every question.

Computer scientists are currently trying to break data down into forms and associations that can be more readily computed, connected and extracted. It’s well known that in NLP, meaning extraction is still proving to be problematic. Additionally, we have to factor in Nicholas Taleb’s “Black Swan” type probability anomalies if we’re going to be able to compute everything, apparently.

[Twain's observation: we can't currently compute the answer to every question --- for example, how much and why we love our parents, the existence of a Supreme Being or anything involving subjectivity that's culture affiliated --- and what we should be doing is innovating algorithms to contextualize as much of the data points as possible. Moreover, data is reducible and can be transformed for smarter extraction in ways we haven't explored yet.]

As well as the Chaitin and Wolfram, we also have to throw Godel’s two Incompleteness Theorems — which work their way through mathematical logic like so:

Diagonalization arguments are clever but simple. Particular instances though have profound consequences. We’ll start with Cantor’s uncountability theorem and end with Godel’s incompleteness theorems on truth and provability.
In the following, a   sequence is an infinite sequence of 0′s and 1′s. Such a sequence is a function   f : N -> {0,1}   where   N = {0,1,2,3, …}.
Thus 10101010… is the function   f with   f(0) = 1,   f(1) = 0,   f(2) = 1, … .
A sequence f is the   characteristic function of the set   {if(i) = 1}.
Thus 101010101… is the characteristic function of the set   {0,2,4,6, …}.
If X has characteristic function f(i), its complement has characteristic function 1 -f(i).
Proof. Suppose not.
Let   f0f1f2, …   be a list of all sequences.
Let   f be the complement of the diagonal sequence   fi(i).
Thus   f(i) = 1-fi(i).
For each i,   f differs from   fi at i.
Thus f is not in   {f0f1f2, …}.
This contradicts the assumption that the list contained all sequences.
Corollary. There are uncountably many subsets of N. There are uncountably many reals.
Proof. The set of subsets of N is isomorphic to the set of 0-1 sequences via the bijection between subsets and characteristic functions.
There are uncountably many reals since the map which sends a 0-1 sequence   10101010…   to the decimal   .1010101…   is 1-1.
The diagonal   fi(i)   is constructed from the list   fj(i)   by substituting i for j. Thus fcan be constructed from the given list using just complementation and substitution.In general, diagonalization shows that a set of objects (sequences, programs, provable theorems, true facts) either can’t be listed, computed or defined in a nice way or else a simple-to-construct diagonal or self-referential object is not one of the set’s objects.
Roughly either the objects can’t be listed or they aren’t closed under the substitution and complementation operations used to construct a diagonal.
Let’s replace “sequences” by “sequences I can comprehend”.   Then either I can’t comprehend the list of all such sequences, or I can’t comprehend the diagonal.   I figure that if I could comprehend the whole list in any way, I should also be able to comprehend the diagonal.   Hence I must accept the first alternative: I can’t comprehend the list of comprehensible sequences.   The same applies to “sequences which God can comprehend”.   Thus omniscience has some limits.
Now replace “sequences” with “computable sequences”.
Definition. A sequence f(i) is computable if there is a program which given input i computes f(i).
Are the computable sequences countable?   Sure, a program is a finite sequence of symbols, say, ASCII symbols.   There are only countably many finite sequences of symbols and so there are only countably many programs and hence only countably many computable sequences.   But on the other hand –
Theorem. The set of computable sequences cannot be listed in a computable way.
Proof. Suppose   f0f1f2, … , is a computable list of all computable sequences. By this we mean that there is a program   P which given inputs j and i computes   fj(i).
Let   f be the complement of the diagonal:   f(i) = 1-fi(i).
As before,   f is not in the list   f0f1f2, … .
But we can compute f as follows:
Read input i.
Apply P to the two inputs i and i.
Output 1 if P outputs 0 and output 0 if not.
Again we have a contradiction.
Pick your favorite programming language (if its COBOL, take a break and come back after your nap). Each program is a string of symbols.
Definition. 0-1 sequence program is a string of symbols which
(1) is grammatically correct for the chosen programming language,
(2) has a single input variable i with domain N,
(3) has output statements only of the form “return 0″ or “return 1″,
(4) for every input i, produces an output (“halts”) in a finite number of steps.
Any program which computes a sequence of 0′s and 1′s can easily be rewritten so as to satisfy (1)-(4).
Corollary. The set of 0-1 sequence programs cannot be listed in a computable way.
Proof. Suppose   P0P1P2, …   is a computable list of such programs.
Let   f0f1f2, …   be the list of sequences they compute. This list contains all computable sequences and it can be computed as follows:
Read inputs j and i.
Get program Pj from the given list.
Run program Pj on input i.
Output whatever Pj outputs.
This contradicts the theorem above.
We can computably list all strings.
We can also computably check conditions (1), (2), and (3) of the definition above.
Hence it is condition (4) which can’t be checked in a computable way.
Thus –
Lemma. There is no program which each input   p,  determines if   p is a program which halts on all of its inputs.
What about the simpler problem of checking that a program halts a particular input?
Proof. Suppose there is such a program   R(p,i).
Let h be the program which on input p computes
R(p,0), R(p,1), R(p,2), …   until it finds an i such that   R(p,i)   is “no”.
On finding such an i, it outputs i and halts.
If there is no such i, it searches forever and doesn’t halt.
Now for any program p, we can decide whether or not p halts on all of its inputs:
p doesn’t halt on all its inputs iff
h does halt on input p iff
R(h,p) is “yes”.
Contradiction: by the lemma above, this is undecidable.
To see why halting problems are hard, consider the program which
on input n,   looks for the first pair of twin primes greater than n.
Thus on input 8,   we get 11,13.
Does this program halt on all inputs?
The extra-strength version of Cantor’s theorem says that a set cannot count its own subsets.
Proof. Suppose they have the same number of elements.
Let   f X -> P(X)   be a bijection between X and P(X).
(1) Let   D = {x in Xx is not in f(x)}.
Since D is a subset of X and f is onto,
(2)   Df(d)   for some d.
Thus   d is in f(d)   iff (by 2)   d is in D iff (by 1)   d is not in f(d).
This is a contradiction.
The set theoretic analog of listing a sequence of things, is grouping or “comprehending” a collection of things into a set.   Sets are sort of unordered lists.
Quine proposed banning self-referential conditions like “x not in x” by requiring that the variables of the condition be stratifiable into layers with membership “x in y” allowed only when x is in a lower layer than y.
Zermelo proposed restricting the comprehension schema to subsets:
For every condition p(x) on x and every set Y,
there is a subset   {x in Yp(x)}.Both proposals finesse Russel’s contradiction but are there other inconsistencies in the closet?   Once burned, logicians wanted a proof of consistency.   None was found.   Then Godel proved such consistency proofs are impossible.   Zermelo’s set theory has been universally accepted, but its consistency will always be a matter of faith.   Quine’s set theory would be just an historical footnote except for a long-standing open problem:   Does the consistency of Zermelo’s axioms imply the consistency of Quine’s?
From sets which are members of themselves we now go to sentences which refer to themselves.
Proof. Suppose it is. Then so is its complement “False”.
Let   s be the sentence “This sentence is false” .
Since the phrase “This sentence” refers to   s,   we have
s iff   “This sentence is false”   iff   “s is false”   iff   not   s.
A contradiction.
Proof. Suppose it is.   Let n be the least number not definable by a sentence of less than 1000 symbols.   Exercise: find the contradiction.
When translated into precise formal logic, these curiosities become Godel’s magnum opus.
To make the transition,  note that the sentence   s which says
“This sentence is false”
is characterized up to logical equivalence as being the solution to the logical equation:
s iff   “s is false”.
Tarski’s Self-Reference Lemma states that in adequate mathematical theories, such equations always have solutions.A theory is adequate if it is strong enough to encode finite sequences of numbers and define simple sequence operations such as concatenation. In an adequate theory, we can encode the syntax of such things as terms, sentences, programs, and proofs. In particular, for every formula p, there is an object < p > which encodes this formula.

Even very weak number theories are adequate. So is set theory since numbers can be defined in set theory. For concreteness, let’s pick number theory with our favorite axioms:     +, x, 0, 1 have the associative, commutative, distributive, identity and cancellation properties.

  • For any first-order formula p(x),
    if   p(0)   and   p(n) -> p(n+1)   for all n,   then   p(n)   holds for all n.
  • Proof. We omit the short but technical 5-line proof.
    Suppose   p(x)   says   “x has at most 1000 symbols”.
    By Tarski’s Self-Reference Lemma, there is a solution   s to:
    s iff   p( < s > ).
    Thus   s says   “This sentence has at most 1000 symbols”.
    Since sentences of number theory can be coded up as numbers (the ASCII coding your computer uses does just fine), the set of true sentences can be identified with the set TRUTH of numbers which encode true sentences.   Is this set definable in number theory?
    Proof. By the definition of TRUTH, for any sentence   s,
    (1)   < s > is in TRUTH   iff   s is true.
    Let   s be the sentence “This sentence is false”.
    This sentence exists by Tarski’s Self-Reference Lemma since it is the solution of
    (2)   s iff   < s > is not in TRUTH.
    Thus
    s iff   < s > is not in TRUTH   iff   s is not true   iff   not s.
    This is a contradiction.   We have used the law of the excluded middle and the consistency of the set of true sentences.

    Since undefinable implies uncomputable, there will never be a program which can decide, for each sentence of number theory, whether the sentence is true or false.
    Let PROVABLE be the set of sentences of number theory which are provable in our favorite axiom system.  Since all our axioms are true, PROVABLE is a subset of TRUTH.   It would be nice if they were the same.   In this case our set of axioms would be complete.   No such luck.
    Definition. A theory is axiomatizable if it has a computably generated set of axioms.
    Any sentence can be an axiom as long as it is true.
    Proof. Given a computably generated set of axioms, let PROVABLE be the set of numbers which encode sentences which are provable from the given axioms.
    Thus for any sentence   s,
    (1)   < s > is in PROVABLE   iff   s is provable.
    Since the set of axioms is computably generable,
    so is the set of proofs which use these axioms and
    so is the set of provable theorems and hence
    so is PROVABLE, the set of encodings of provable theorems.
    Since computable implies definable in adequate theories, PROVABLE is definable.
    Let s be the sentence “This sentence is unprovable”.
    By Tarski, s exists since it is the solution of:
    (2)   s iff   < s > is not in PROVABLE.
    Thus
    (3)   s iff   < s > is not in PROVABLE   iff   s is not provable.
    Now (excluded middle again) s is either true or false.
    If   s is false, then by (3),   s is provable.
    This is impossible since provable sentences are true.
    Thus   s is true.
    Thus by (3),   s is not provable.
    Hence   s is true but unprovable.
    Note 1. An analysis of the proof shows that the axioms don’t have to be true. It suffices that (a) the system is consistent and (b) it can prove the basic facts needed to do arithmetical computations, e.g., prove that 2+2=4. The latter is needed to encode sequences of numbers and insure that computable sets are definable.Note 2. Godel discovered that the sentence “This sentence is unprovable” was provably equivalent to the sentence   CON:
    “There is no   < s >   with both   < s >   and   < not s >   in PROVABLE”.
    CON is the formal statement that the system is consistent.
    Since   s was not provable, and since   s and   CON   are equivalent,
    CON is not provable.   Thus –

    After all that (con)textual mathematics, here are the key Chaitin videos which encapsulate his position on “maximum unknowns”.

    Now here’s Stephen Wolfram explaining the computability of everything:

    So then how does social science and psychology strand into what is high-end mathematics? Well……….

    Those of us who’ve studied macroeconomics are aware of the J curve theory from John Maynard Keynes:

    A country’s trade deficit will worsen initially after the depreciation of its currency because higher prices on foreign imports will be greater than the reduced volume of imports.

    The J curve theory has been adapted by management consultants like Gartner into a theory about technology hype cycles:

    Meanwhile, Otto Scharmer in organizational behavior has proposed a different letter from J, U, to explain how we examine ourselves, our perspectives on the world and the way in which we solve problems:

    Since both camps (vectorial scale algorithms approach to data reducibility versus Quantum Mechanics approach) have merits, I wrote:

    This thread is observing a classic W approach to problem-solving: two schools of thought, approaching from either end, drilling down and shifting their vectorial positions as time elapses and finally inflecting upwards (with views on what their prior slopes looked like) until the two schools converge and are on a different plane from where each and both started.

    Wrt whether the Semantic Web can become an inference engine………….not if it continues to deploy the taxonomies and categorizations it does because that still roots us in probabilities, correlations and the other facets of it being actually no fundamentally different from Google (which itself is a difference engine just as Babbage postulated — albeit instead of absolute real numbers and binaries, it’s about the difference between statistical points).

    Now, if it was a …..true DIFFERENTIATION ENGINE, this would be a real leap forward rather than an imaginary one.

    We cannot infer until we can differentiate and the Semantic Web cannot do this (yet).

    Specifically on data reducibility and context, this is what I offered to the debate:

    The dimensions of context for each of us is personal, experiential, spiritual / emotive and cultural.

    Conversely, the dimensions of computing are impersonal, iterative, rational and culture agnostic.

    Explications which seem clear, obvious and even underpinned by established/irrefutable science and mathematical equations are fine to follow if we’re conversing with another person schooled with the same scientific reference points as us. They’re not so clear, obvious and irrefutable when we’re conversing with a lawyer, a photographer or a linguist because their context points for deduction, perspective and language will be distributively different.

    This is why when we surface a piece of raw data in a search engine list – suppose something as simple as the number 2 – the context of it is going to be interpreted in diverse ways. The literally-minded will perceive it just as the number after 1 with a value of 2. The mathematically-minded will think of it as being a prime as well as in terms of power series, halves and double integrals. The literately-minded will automatically associate it with ‘Tale of Two Cities’, “To be or not to be”, JRR Tolkien (‘Two Towers’) and “it takes two to tango”. The artistically-minded will see the image of a swan or one half of a heart because that’s the shape of a 2. The Spanish-speaking computer scientist will think of it as being DOS (Disk Operating System). The classical scholar will reference it to Janus, the god of two faces. The romantic would tie it with coupledom…..etcetcetc………Whilst the Chinese would word associate it with the homophone for “fish”, “happiness” and “prosperity” all intrinsically bound to each other.

    So that’s an example of raw data carrying implicit context which is not currently being included in or extracted explicitly by algorithms.

    How to resolve this so we can compute this context and not only the raw data?

    I’m working on it, as they say.

    And so……………..I am………………

    Posted by Twain on May 19, 2009

    Wolfram Alpha | Google | True Knowledge: the Twain test — follow up

    Yesterday the Wolfram Alpha team announced the creation of their community:

    http://blog.wolframalpha.com/2009/05/18/announcing-the-wolframalpha-community/

    They wrote:

    “To that end we are officially launching the Wolfram|Alpha Community, which allows you to submit questions, ideas, and favorite inputs.

    We already have a few static forms to contribute things such as facts, figures, and structured data or algorithms, methods, and models. The Community serves to supplement these types of feedback with a more free-form discussion among all Wolfram|Alpha users.”

    Everyone knows by now I’m a firm believer in companies providing open, democratic and multilateral channels for users to provide feedback and to interact with the company. Given this opportunity, I suggested that the Wolfram Alpha team could look into providing answers to my test questions:

    (1.) Who discovered radium?

    (2.) Where is Atlantis?

    (3.) How do we make gold from lead?

    (4.) Can robots dream?

    (5.) What is a sprite? [This is my trick question since ‘Sprite’ is a drinks brand as well as a type of fairy.]

    (6.) When did Homo Erectus become Homo Sapien?

    (7.) Why are we here?

    (8.) How many light bulbs are there in the world?

    (9.) Who is the Vitruvian Man?

    (10.) Where is Schrodinger’s cat?

    COBALT + ERIC SUGGEST I’M CLUELESS

    In response to my suggestion, two users by the name of Cobalt and Eric wrote this:

    “You misunderstand the point of Wolfram|Alpha I think. It is not a search engine like google nor is it a forum of expertise like answers.com.”

    It is a tool that allows you to find and analyse data (i.e hard facts) from the web. It can only answer questions that have a definite answer or present data related to a subject. Things like the weight of 1g of gold or the average age in Australia. Questions with no definite answer such as the location of Atlantis and do robots dream will not and should not be answered as that is what google and the like already do.” — Cobalt

    *********************************************************

    “2. not a fact
    3. not a fact
    4. not a fact
    6. not a fact
    7. not a fact
    8. not a fact
    9. not a fact
    10. not a fact

    You don’t understand what Alpha is used for. If it’s not a fact, it can’t calculate it. Where is Atlantis? Mankind doesn’t even know if it existed, how the hell are is Alpha supposed to point it out? Can robots dream? It doesn’t create narratives, it gives you numbers in return. If you want to know why we are here, talk to a philosopher. If you want computable data, use Alpha. It couldn’t be clearer.” — Eric

    MY RESPONSE: THERE’S RATIONALE IN TWAIN’S TEST

    This is what I wrote in reply:

    Unfortunately, Cobalt and Eric, you’re the ones who may be misinterpreting my testing approach. Stephen Wolfram, in an interview with Semantic Universe, notes that WA should be compared with the likes of Google and Yahoo! and not with HAL or Cyc, so it was reasonable for me to run WA results against Google’s and True Knowledge’s. Also let me give some context which may help.

    I have a maths degree and have worked in banking, so I understand perfectly well the difference between calculable inputs to derive proofs and business models from information which is unquantifiable or simply has no quantity — such as “How is Michelle Obama related to Barack — which are questions needing answers of a qualifiable nature.

    Now, Wolfram Alpha is marketed as a “computational knowledge engine” rather than a fact+figures finder/calculator so it’s supposed to be able to derive KNOWLEDGE not facts+figures alone.

    Let’s tackle what logically each of my questions should have derived:

    (1.) Who discovered radium —- WA gave the year but not the who (Marie Curie). Moreover, the expectation would be that the system would generate both a visual of the radium atom, some charts of radioactivity, a picture of Marie Curie and some facts+figures on the laboratory conditions of discovery.

    (2.) Where is Atlantis — WA could have generated a series of maps not only of actual locations called ‘Atlantis’ (e.g. in South Africa and the US) it should also have produced geo-thermal images from archaeological expeditions that have tried to establish where Atlantis is.

    (3.) How do we make gold from lead — instead of producing a “WA isn’t sure what to do with your input” the system could at least have produced some Periodic Table definitions of gold and lead, their reactivity with other chemicals and some paragraphs on historical attempts by people to try to make gold from lead.

    (4.) Can robots dream — again instead of producing “WA isn’t sure what to do with your input” the system could have listed all the works of fiction by people who have actually existed (Philip Dick / Isaac Aasimov / Stanley Kubrick) who are factually connected to this phrase. After all, WA is supposed to apply NLP to derive what we mean by the inputs.

    (5.) What is a sprite — WA produced a table of nutritional breakdown of Sprite the soft drink. In fact, apart from the faerie connection which is fictional entity, sprite is also a FACTUAL term used in computer graphics and the WA system failed to pick this up.

    (6.) When did Homo Erectus become Homo Sapien — again WA issued a “WA isn’t sure what to do with your input” message. It’s an established FACT from anthropology and archaeology that in the evolution of Man, Homo Erectus preceded the emergence of the Homo Sapien. WA failed to produce a timeline graph of that evolution to help pinpoint whether that happened 500,000 years ago or 50,000 years ago.

    (7.) Why are we here — again WA issues a “WA isn’t sure what to do with your input”. Fair enough, the system is not sophisticated enough to infer philosophical constructs yet; we are some way from truly consciously aware machines. Nevertheless, the expectation would be that some graphics of Big Bang Theory and the formation of the planets would have been produced.

    (8.) How many light bulbs are there in the world — actually, this is a FACTUAL question. There are definitely numbers available of light bulb production, US expenditure on light bulbs per annum and how many light bulbs are used in each household per annum.

    (9.) Who is the Vitruvian Man — unfortunately, Eric, you may not have seen sketches of Da Vinci’s masterpiece and which actually exist and are FACT-based. Instead of WA stating it “isn’t sure what to do with your input” the system should at least have generated an image of Da Vinci’s work. It could then have made the linkage of how the Vitruvian Man image has been applied in various fields of science — as clues to atomic structure as well as a reference diagram of human anatomy in medical science.

    (10.) Where is Schrodinger’s cat — WA said it “isn’t sure what to do with your input”. This was the most surprising answer of all out of the questions posed. The expectation would be that the engine would at least interpret the question as one related to quantum physics and generate calculations and proofs attributable to Erwin Schrodinger. If it was even smarter it may even have done a compare/contrast with Einstein’s equations and Hawking’s postulations.

    As for whether Schrodinger’s cat is a FACT or not, there are all manner of scientific phenomena that cannot be seen or established by the naked human eye (it’s somewhere else on the electromagnetic spectrum) for which generations of scientists have extrapolated proofs, corollaries and reductive provisos.

    What matters in the question relating to Schrodinger’s cat is the fact that WA did not even produce an answer which said something like, “Schrodinger’s cat was a scientific experiment conceived by Erwin Schrodinger in 1935 in response to potential limitations in the Copenhagen approach and as a commentary on the ‘quantum indeterminacy or the observer’s paradox’. Schrodinger’s equation itself is applicable in wave physics, energy calculations of chemical reactions and is derived from the Hamiltonian and Poisson functions to produce:

    (∂2Y/∂x2 ) + (8π2/h2)(E-V) Y = 0

    where Y is Schrodinger’s wave equation.

    X is the position of the particle.
    E is energy in Joules per second.
    V is the potential energy in Joules per second.

    followed by various corollaries and supporting suppositions of the type similar to those printed in this UCLA paper:

    http://www.math.ucla.edu/~tao/preprints/schrodinger.pdf

    Even as the most basic answer, instead of “WA isn’t sure what to do with your input” the simple and FACTUAL answer would have been “In the Schrodinger’s cat hypothesis, the cat is placed inside a steel chamber” followed by some of those equations Erwin Schrodinger is famous for.

    All of my questions are science-based and either already have definitive scientific proofs or are established hypothesis based on scientifically-derived means. This includes “how do we make gold from lead” and the evolution of Homo Erectus into Homo Sapien.

    WA is marketed as a “computational knowledge engine” and on the basis of its NLP which can semantically derive what our questions mean. If I ask “Who discovered radium?” and the answer provided doesn’t even mention Marie Curie then there’s clearly room for improvement.

    As I’ve written elsewhere, WA’s entry into the search/knowledge space is great for us all as information consumers, knowledge connectors and sense discoverers.

    Of course it’s fantastic that a tool like WA is made available — not just for the scientific community — but for anyone who needs to crunch any form of numbers or needs a piece of knowledge to support, quantify, qualify and visually compliment their articles (whether that’s on the fluxing orbital paths of the planets or the score lines of the World Series for the last century or projected growth of the shrimp population in the Indian sub-continent).

    Nevertheless, we have to identify and be realistic about its current limitations because only then can we as consumers have genuine “computational knowledge engines” which can connect facts+figures from different disciplines, make sense of the world around us (visible, invisible and maybe as yet undiscoverable) and perhaps find solutions to global common ills.

    ************************************************************************************************************

    For me, Cobalt and Eric’s comments are interesting because both point to how users are perceiving and interpreting what these words mean:

    * fact

    * computational knowledge engine

    * definite answers

    Again, it’s relevant to semantics and the way we classify axioms via ontologies, taxonomies, folksonomies and other linguistic categoronomies (I’m coining this phrase, ha ha).

    To be tic-lol (tongue-in-cheek, laugh-out-loud) I could ask, “Well, since we’re talking “facts and definite answers” how definite will Wolfram Alpha’s calculation of imaginary numbers be and has anyone actually scientifically observed them to establish them as a fact?” The reality is that imaginary numbers are philosophical constructs plucked from mathematical minds just as is most of complex algebra and even the existence or otherwise of Schrodinger’s cat (an example of quantum physicists’ paradigms), Atlantis (anthropologist-archeologists axiomatic construct of a ‘lost world”) and whether or not robots can dream!

    Maybe it would have been easier if I’d asked the WA system something straightforward like, “What is the largest prime number in the universe?” or “What is the 123456789th number of π?” Ha ha. Actually not even the most powerful supercomputer has arrived at the definitive answer to the largest prime number on Earth much less the Universe! One day someone from the planet GYG is going to materialize and say that our Googol is only worth a 10 squared in their numerical scales, so the largest prime number is only 100 digits long or something!

    In any case, the fact stands that the current Wolfram Alpha can’t apply its natural language processing to semantically extract that when I ask “Who discovered radium” it should give me an answer with Marie Curie in it in priority / precedence before providing me with the year of discovery.

    It also can’t calculate how many light bulbs there are in the world. Clearly, some of the light bulbs in WA’s computational clusters haven’t been switched on yet to spotlight this missing information and the inputs necessary to generate an answer which would go along the lines of a simple equation like this:

    L = n(A + B + C +….) + q(h1 + h2 + h3….)

    where L is the total number of light bulbs in the world.

    n is the number of light bulbs produced per year in each country.

    A is country A.

    B is country B, etc.

    q is the quantity of light bulbs in each household (used and unused).

    h is the number of households in each country.

    By Twain deduction,  Wolfram Alpha is not yet a “computational KNOWLEDGE engine” if it can’t differentiate that a sprite is either a computer graphic term, a type of tiny glowing faerie and a brand of soft drink and can’t make the neural nets connections I’ve noted above. It would probably help the WA team to communicate and market what their service can/cannot do if they called themselves a “computational DATA engine”.

    In any case, people suggesting I don’t understand the nature or the application of Wolfram Alpha is good and healthy for intellectual stimulus and keeping my ego in check. It helps me sanity-check the approach of a Twain test and whether its validity stands up.

    What matters most is that it will be interesting to experience how Wolfram Alpha, Google Squared et al develop and whether one day they’ll be able to answer my 10 questions — both by deriving facts as well as extrapolating the semantic and philosophical nuances of the queries.

    Now THAT’s when we may see truly intelligent and proxy-consciousness agents…………

    Posted by Twain on May 18, 2009

    Wolfram Alpha | Google | True Knowledge: the Twain test

    As promised, I’ve now plugged and played with the new “computational knowledge engine” offering from Stephen Wolfram, the British-born physicist renowned for having been awarded his Ph.D. when he was just 20 and being the inventor of Mathematica, a highly regarded research tool amongst the academic scientific community.

    Wolfram Alpha had a soft launch on Friday 15 May following various media “sneak previews” in late April and it’s officially live today.

    As per my previous provisos (please follow links provided at end of this blog), I’ve reserved assessment on the system until today so that it’s based not on being influenced by either SemWeb hype nor journalistic ignorance, but rather objective intelligence and a mild dose of wit. Some of the articles written to-date about Wolfram Alpha have been poorly researched and re-hashes of whatever PR has been issued by the company rather than informative. This has been unhelpful for determining where and how the various search / browse / info source systems are differentiated.

    This is why I decided to do my own Twain test.

    I’m interested in any AI / natural language processing / Bayesian alternative / neural nets / innovative attempts to connect and make sense of the vast amount of knowledge out there (within the Internet as well as as-yet electronically unarchived sources). I’m also interested in machines which try to discern meanings, wit and nuances from our questions in an equivalent manner to how humans do naturally.

    Yes, I am aware that Stephen Wolfram has provided guidance that the system is not AI. That’s clear from this Semantic Universe article:

    http://www.semanticuniverse.com/blogs-i-was-positively-impressed-wolfram-alpha.html

    Now, since Wolfram Alpha is based on Mathematica I decided not to ask any straightforward numerical questions; most of us are aware by now that it can deal with statistics, indices, trigonometry, Fourier analysis, Boltzmann constants, Boyle’s gas pressures, the various constituents in organic reactions, velocity in space calculations and other scientific and quant-oriented calculations etc. fairly well.

    What would be more interesting is to really test its semantic extraction, linguistic deduction and visual generation capabilities.

    Below are my 10 questions accompanied by screenshots of and commentary on the results. Wolfram Alpha is being directly compared with Google’s and True Knowledge’s which are its nearest competitors in this test. Incidentally, Wolfram has apparently noted that since it’s not AI it’s unfair to compare it with HAL or Cyc but compared with Google or Yahoo. I chose True Knowledge because as was (rightfully) highlighted by some friends elsewhere, this would be an interesting case study.

    TWAIN’S 10 QUESTIONS

    (1.) Who discovered radium?

    (2.) Where is Atlantis?

    (3.) How do we make gold from lead?

    (4.) Can robots dream?

    (5.) What is a sprite? [This is my trick question since ‘Sprite’ is a drinks brand as well as a type of fairy.]

    (6.) When did Homo Erectus become Homo Sapien?

    (7.) Why are we here?

    (8.) How many light bulbs are there in the world?

    (9.) Who is the Vitruvian Man?

    (10.) Where is Schrodinger’s cat?

    These questions may seem off-the-wall, but actually they’re not.

    Wolfram Alpha is being built by scientists so information on who discovered radium, the evolution of Man, the alchemy of metals, scientific expeditions and application of geothermal imaging / satellite capture of potential sites for Atlantis, the proliferation of light bulbs which are the invention of Thomas Edison and the connections to Schrodinger’s cat should be easily surfaced by the system since it is all familiar territory to scientists, machines are built in the mould and mind of their creators and before we expect Wolfram Alpha to provide us with the missing links between Marilyn Monroe and baseball (answer: Joe di Maggio) in pop culture and sports arenas — two sample areas where ‘Der Spiegel’ has already highlighted Wolfram Alpha’s current deficiencies (http://www.spiegel.de/international/zeitgeist/0,1518,624065-10,00.html) — it should at least be able to deal better with queries associated with its forté, science. Even if it’s not precise and no visual graphics can be generated, the algorithm should direct us to sources where we can delve further and derive some answers.

    Unfortunately, as can be seen from the screenshots Wolfram Alpha gives several Wolfram|Alpha isn’t sure what to do with your input” answers.

    Hmmn………

    By comparison, as an example, Google recognizes the deliberate spelling mistake I made on the search term “VEtruvian Man” and asks me whether I mean “VItruvian Man” and it provides multiple links to suitable sites where I might find the answers. Meanwhile, True Knowledge doesn’t catch the spelling mistake and only offers, “It sounds like the vitruvian man may be a human being, organisation or other legal person that I don’t know about yet,” with a suggestion that I teach the system about him and input my knowledge in via the wiki.

    As for the question, “Where is Schrodinger’s cat?” I suppose I could have been mischievous and asked, “Where is Schrodinger’s car?” instead to determine whether any of the three systems understood that it’s still a question about quantum physics and the ‘quantum indeterminacy or the observer’s paradox’. In other words, where we are and how we observe relative to the cat (aka the reception of visual particles into our eyes vis-à-vis the cat — which is an analogy for atoms invisible to the naked human eye, btw) itself affects an outcome, so that the outcome as such does not exist unless the measurement is made. Ergo, there is no single outcome unless and, I’d say, UNTIL it is observed.

    If I put the cat into the car then the computational search engines will get even more confused……….LOL.

    As it is, Google does a fairly decent job of discerning that I mean “Schrodinger’s cat” and even when I use “car” it provides me with a link to a YouTube entitled ‘Schrodinger’s car and parallel universes’.

    Anyway, I hope you all enjoy the screenshots of the Twain test. Yes, and everyone should be aware that no media outlets have reported any “slow script” messages from Wolfram Alpha in their tests and I found one on my second query, “Where is Atlantis?” Oops, WA……

    TEST RESULT SCREENSHOTS + TWAIN CONTEXT

    (1.) Who discovered radium?

    This is a straightforward question and the expectation would be that a diagram of a radium atom would be generated along with a map of where it was discovered along with a picture of Marie Curie. Instead, this is what each system produced:

    (2.) Where is Atlantis?

    This question resulted in a “slow script” message on Wolfram Alpha before it offered a map location of Atlantis as being on the South African peninsula coast. Like True Knowledge, it didn’t make that interpretation leap to identifying Atlantis as a potential mythical construct rather than an actual geographic location.

    Google does make that interpretation leap.

    (3.) How do we make gold from lead?

    Ideally, the generated answer should show the historical (failed) attempts by various people to turn lead into gold — including the tales from the Hermetic schools of thought on this and those during Croesus’ age.

    (4.) Can robots dream?

    Would Isaac Aasimov / Philip K. Dick / Stanley Kubrick be impressed by the latest machine offering which produces results like these?

    Wolfram Alpha says it “isn’t sure what to do with your input”. If I’d programmed the algorithm I’d make it respond like so, “Please ask us again in the morning after we’ve had the chance to sleep on it and think about it. Thanks!”

    (5.) What is a sprite?

    This is my trick question since ‘Sprite’ is a drinks brand as well as a type of fairy. Interestingly, Wolfram Alpha generates what appears to be the nutritional content of a can of Sprite but fails to pick up that the query may be about a glowing elfin creature that appears at the bottom of gardens in works of fiction. Meanwhile, Google pulls in some references to the term being relevant in computer graphics as well as the faerie references.

    True Knowledge goes off-base and provides us with a picture of a ferret followed by a definition of it being a drink from Coca-Cola.

    (6.) When did Homo Erectus become Homo Sapien?

    Again, this should have produced a straightforward answer — either in the form of a timeline chart plotting the Evolution of Man which is being pieced together by anthropologists and other scientists or in the form of a textual examination into various Jurassic, Ice, Neanderthal, Paleolithic etc. ages.

    (7.) Why are we here?

    The greatest Existentialist question in our search for knowledge which perplexes philosophers, physicists, Presbytarians, polemicists, party people et al alike.

    (8.) How many light bulbs are there in the world?

    Let’s compare WA’s answer with Google’s. Google’s first link offers some data from Wiki answers on the daily production levels of light bulbs as well as the estimated annual expenditure on them whilst WA says it isn’t sure how to use the query input — which is, effectively, what True Knowledge also says. Interestingly in the TK results, there seems to be some kind of lag and it shows answers to the previous question of ‘Why are we here?’

    Perhaps the lightbulb isn’t on in the TK thought engine — LOL.

    (9.) Who is the Vitruvian Man?

    Here, the search / computational engines should ideally have generated an image of Da Vinci’s famous drawing within the semantic context of the query. None of the systems did. Notably, neither Wolfram Alpha nor True Knowledge auto-corrected the deliberate spelling mistake of Vitruvian Man whilst Google did spot it and auto-amend.

    (10.) Where is Schrodinger’s cat?

    At the very least, Wolfram Alpha should have produced some equations associated with Erwin Schrodinger’s postulations as well as their interlinkages with Einstein’s, Stephen Hawking’s and the research currently being undertaken with the Large Hadron Collider. Plus the research from the Austrian university who managed to demonstrate time-travel by sending quarks over the River Danube.

    Well, that’s how my mind would work if I was trying to locate Schrodinger’s cat and its connectivity trails…..

    This is what the systems gave us instead:

    CONCLUSION FROM TWAIN TEST

    Google isn’t going to be killed just yet with today’s launch of Wolfram Alpha. Certainly, it’s helpful to see more visual and graphical representation of computed results but, then again,……….Kosmix does that better than Wolfram Alpha, Google and True Knowledge.

    Once Google Squared goes officially live we’ll probably realize and accept that Google is keeping ahead of the curve by crossing Semantic knowhow with more visual knowledge representation techniques.

    Companies should avoid marketing themselves or being labelled by the media / so-called search experts as “Google killers” and paradigm shifters before they’ve actually been tested by ordinary people like me or gone live. It’s critical to manage expectations and also to be more aware of the types of random and unexpected queries which do pop up in people’s minds and that they’d like the computational and philosophical derivations to.

    Since the Semantic community is aiming towards artificial agents being able to answer some of the world’s most complex questions, systems should definitely be able to either answer questions like mine or, at least, provide appropriate and meaningful links to where else I can and should seek the answers.

    In any case, innovations like Wolfram Alpha and other (non) Google killers can only result in keeping Google and other tech giants on their toes and result in improved search-browse-computational-discerning-sensemaking tools for us.

    Hurrah! This is gr8 for us as information consumers, knowledge connectors and sense discoverers.

    Yes and any company who’d like me to road-test their systems prior to launch should contact me, :*).

    *************************************

    For completion, here are the two blogs I wrote last week on the today-launched Wolfram Alpha platform:

    http://www.alwaysthetwain.com/blogs/2009/05/12/wolfram-alpha-cf-true-knowledge-non-google-killers/

    http://www.alwaysthetwain.com/blogs/2009/05/10/wolfram-alpha-objective-anticipation-analysis-please/

    Posted by Twain on May 12, 2009

    Wolfram Alpha cf. True Knowledge + (non) Google killers

    I just read some interesting commentary from some friends which compares what they’ve seen of Wolfram Alpha with True Knowledge so I re-visited the True Knowledge site. Clearly, TK has had a redesign as these two screenshots from 2008 and 2009 will show:

    The soon-to-be-live-computation WA engine is launching with the same color scheme as the old TK site as well as Primal Fusion’s choice: faded orange.

    In our WA compared with TK analysis, we should be aware of and note that systems are built in the mould of their creators and their pre-orientations / pre-dispositions / accumulated pasts. This helps us to contextualize the systems, what each can do and why they’re constructed in the way they have been and are being.

    The background of founders can provide us with clues on likely development and strategy of the platform.

    From what I can make out from WA’s video presentation, Wolfram’s solution takes its lead from Mathematica and other natural sciences databases. In essence, it’s like taking an online calculator that can generate visuals of trigonometric function graphs (like my Casio 5100FX did when I was a teenager) crossed with elements of:

    * Bloomberg + MS Excel + SAS (Statistical Analysis Software) to generate economic charts

    * some biochem modeling software

    That’s why ‘Der Spiegel’ (http://www.spiegel.de/international/zeitgeist/0,1518,624065,00.html) can identify WA’s limitations in information availability/accuracy on politics, popular culture, sports etc. It’s not the natural or default orientation of Wolfram’s team, who spend more time thinking about Fibonnaci and Feynman than Britney or Barack which is what Tunstall-Pedoe’s team does. William Tunstall-Pedoe, the founder of TK, is from a journalistic background so his natural information orientation would be towards what’s published in most papers (politics, popular culture, business, sport, etc.) and that’s the direction he would most likely direct his team efforts towards.

    In the greater schema of the Web, my observation is that WA’s launch reflects the trend of commercializing and hybridizing previously closed niche academia sources like Mathematica for the masses. We also see this when, for example, Google takes software that was in architectural niches (e.g., Autocad) and creates free tools like Sketch-up and now Google Draw.

    Personally, I’m in favor of this trend. The question will still arise for WA, “How do we make money from our platform?” but they seem to have some cost per embed of a WA-generated search / graph in their business model.

    With TK, another important distinction is that its wiki capabilities allow for collective correction. We read the definitions / links provided and we can apply our naturally accumulated knowledge to orientate and refine the definitions / links provided, according to our semantic (aka linguistic) interpretations.

    With WA, there may be less scope for collective correction. How many people are going to use pen and paper to check that the integrals and statistics generated by WA are accurate?

    In due course, Google will probably release something which is a 3rd way of both: wiki, visual knowledge representation and semantically-linked facts+figures. In fact, it already does in some form with Google Finance:

    and now its recently announced Public Data Search capabilities:

    Moreover, contrary to misconceptions (or rather lack of proper investigation by some quarters of the Semantic and journalistic space) Google is interested in and has been actively building teams with semantic knowhow for several years.

    I wrote a lengthy, objective and well-researched article on this topic last year. Unfortunately, I made the mistake of posting this article on a certain SemWeb platform which I’d entrusted with the safekeeping / stewardship of my and my friends’ content on its public platform. Instead of reciprocally honoring that act of trust, said SemWeb platform’s coding was so awry and poorly architected that they deleted 8 months worth of our collaborative content, including that particular post examining Google’s interests in the semantic space.

    Therefore, my hard work on the issue is lost indefinitely — despite the CEO’s non-performance of his own promise to restore our content.

    This is personally annoying since my article cast a contextual light on whether any of these Semantic offerings springing up are genuinely paradigm-shifters and “Google killers”. They cannot be Google killers if the basic assumptions about Google not being actively involved in utilizing Semantic knowhow and tools is either fundamentally wrong or flawed.

    Alas, I cannot now reproduce that article and the links which I found about Google hiring teams from known Semantic Web-related techco’s like CYC. However, I can point to some articles from Read/Write/Web this January 2009 and from eweek.com in the same month which point towards Google having and progressively incorporating semantic search features:

    http://www.readwriteweb.com/archives/google_semantic_data.php

    http://googlewatch.eweek.com/content/google_search/yes_google_is_doing_semantic_search.html

    http://googlewatch.eweek.com/content/google_and_semantic_web/google_ceo_hints_at_semantic_contextual_search.html

    http://www.pcworld.com/businesscenter/article/161890/semantic_search_could_secure_googles_future.html

    http://googleblog.blogspot.com/2005/04/just-facts-fast.html

    As I note, I wrote my article last year — May 2008 — several months before R/R/W or eweek did. That SemWeb platform, in their inconsiderate and wholesale deletion of users’ content, is responsible for my article not being available for others to use as a public and democratizing information source that would put into perspective whether any SemWeb offering is a paradigm-shifter or “Google killer”.

    From what I’ve road-tested in the SemWeb space to-date, none of them are.

    Google remains ahead of the curve both technically as well as the way in which they service and market to users. Certainly some of the businesses and their tools could be better integrated but, nonetheless, the key components remain technically more interesting than those offered by wannabe “Google killers” to-date.

    Specifically wrt Wolfram Alpha, I’m sticking with my position as stated previously: I reserve proper assessment of it until I can plug+play it myself, objectively.

    This is because all kinds of people have hyped WA or tried to make me believe that “Google doesn’t do semantic search” (their words) — despite me providing analysis which contradicts their convictions and competitive intelligence insights.

    If I have a positive / negative perspective on Wolfram Alpha it will be based on my own independent and objective analysis (ok also humorous), and informed with previously accumulated, distilled and connected observations of the Semantic space rather than anyone else’s spin / misinformation / ignorance.

    PERSONAL NOTE

    I would never trust that SemWeb platform with my content again. At least on my own blog I know my information isn’t suddenly going to be deleted because of some irrational / small-minded / inconsistent / undemocratic whim of someone else.

    Most importantly, I am not giving any licensing rights to the SemWeb platform over my content (original articles, images, comments, business models etc.) and their associated copyright for the SemWebco’s use or commercial exploitation. Frankly, their actions showed themselves to be unworthy of my trust and underlined how important it is to have ownership of your content and credit assignation for it.

    Their Big Brother policies and breaches of user privacy were also not very appealing as a user-member.

    All-in-all, I’m glad I don’t buy into that CEO (words and actions). He’s the same guy who insisted Google isn’t into semantic search, searching with Google is like “looking for a needle in a haystack” and who hypes up supposed “Google killers”. His radar’s way off.

    Clearly, mine’s more perceptive, calibrated and spot on.

    Posted by Twain on May 10, 2009

    Wolfram Alpha: objective anticipation + analysis, please

    The latest information from Danny Sullivan at http://searchengineland.com/, whose insights I hold in good regard, is that Wolfram Alpha will officially go live on May 18. Please calendar mark this date and go to www.wolframalpha.com on that day to see where and how the engine works / could be done better for your needs. The company is marketing its product as a “computational knowledge engine”.

    Already there’s been a certain amount of commentary around it (including, unfortunately, some hype from certain SemWeb individuals whose credibility is questionable since their own actions have earned them reputations for spin more than substance). However, trying to be objective about the commentary to-date, I’ll reserve my assessment of Wolfram Alpha as a knowledge tool until I actually get to plug+play with it — in the same way I did Powerset, True Knowledge, SearchMe, cuil and other supposed “Google killers” / “Next Big Thingammyjigammys”.

    There’s nothing less smart and more irrational than to be swept along by vacuous spin and sheep/lemming-like behavior; ‘The Emperor’s New Clothes’ is a cautionary tale for us all and, in some instances, we may even lose our own shirts if we invest time, content or money in ventures without objectivity. We invest with good intent but, probably, insufficient anticipation that the company may be less than truthful and straightforward about what their product / service can and cannot do and how they will or will not deliver to users.

    [This is why proper due diligence is SO important --- please also see the global financial crisis and the primary lessons there: due diligence, due diligence, due diligence.]

    Returning to WA, so far this is what I’ve seen of the site itself; a preview request box that’s been poorly coded with CSS.

    Hopefully, this is a itsy-teeny-weeny aberration on the part of WA developers and the knowledge engine itself is in synch, in dynamic stacks and properly aligned (data sets, surfacing of information and relational databases).

    I’ve actually seen some screenshots of WA on ReadWriteWeb and first impressions are that the preview form is a minor coding mistake:

    Plus I’ve watched the video from WA’s presentation at Harvard’s Berkman Center. It’s over an hour long, so please have some coffee / hot chocolate and snacks to hand before you venture into it:

    Again this looks interesting.

    Any platform which provides “show + tell” (aka visual representation) of a search term is helpful to users. Nevertheless, whether it’s as innovative or disruptive as some previewers suggest remains to be seen. Anyone who’s done any form of asset allocation or portfolio modeling like I have will be familiar with WA’s searchable datasets on the likes of Reuters, Bloomberg, Thomson Financial, Amadeus, Multex and most of the mgmt consultancy research providers (Datamonitor, Jupiter Media Matrix, McKinsey, META Group, Nielsens, and more) and inter-governmental agencies (SEC, FSA, Edgar Online, Office of National Statistics, etc.).

    Most of this information is “locked down” and “silo source” or, at the very minimum, subject to an access/subscription fee typically in the US$ tens of thousands.

    Looking at R/R/W’s screenshots and WA’s video, it would seem WA is attempting to pool and provide this information to the masses on a democratic and free basis in much the same way Wikipedia tries to provide information — albeit, it’s widely recognized Wikipedia faces questions about content accuracy and reputational issues since contributors can be anonymous, unverified and the system sometimes experiences hoax postings.

    Certainly having searchable and showable information on the number of Internet users on a global breakdown basis on a single site and page will be welcomed by academic researchers, management consultants and equity analysts who need data points to extrapolate and project future growth trends and how this will impact on Internet revenue models.

    The questions here would still include:

    (1.) how clean, current and reliable is the data?

    (2.) where is this data derived from? What are its primary sources? What processes have been applied to clean the data — e.g., have Durbin-Watson and White’s tests been run to generate confidence of accuracy and have anomalies been identified / removed?

    (3.) when are data points updated? Continuous moving targets or defined end-of-a-period (e.g. at midnight, end of working week on Friday, end of calendar month, end of accounting year, etc.)?

    This may seem like over-precision but I’ve sanity-checked a lot of data over the years. As someone with a maths degree (including a 99% score in my Probability + Stats exam), who applied econometrics to model the Tiger Economies and who deals with facts and figures all the time in strategy modeling, I’m all too aware of the potential outliers and impurities that can affect those facts, figures and generate what people sometimes term, “Lies, damn lies and statistics.”

    This is not to say that if a site tells us the population of the world is approaching 7 billion, as the US Census Bureau does, that we should reject their extrapolations.

    It’s simply about TRUST in the source, reputation of the source, credibility of the source and the probability of us relying on the source. These are core principles which apply in meatspace as much as Interspace environments, btw.

    As for WA being able to calculate trigonometric integrals, well Mathematica which Stephen Wolfram conceived is a recognized leader in the academic and research worlds for being computationally rigorous as far as numerate and statistical inputs and derived outputs is concerned.

    Whether this will translate into the worlds of semantics, vernacular, popular culture slang, knowledge wit and discerned meaning is an unknown at this time-point.

    I just want to be able to test the platform on 18 May 2009 and experience what happens and what can be learnt from it, technically and visually.

    I won’t be asking the WA system for answers to the following:

    (1.) Who is the President of the US?

    (2.) What is Paris?

    (3.) Where are King Solomon’s mines / Atlantis / Xanadu / sunken gold-bearing galleon?

    (4.) When will the end of the world be at nigh?

    (5.) Why do we fall in love with some people and not others?

    (6.) How many air molecules are in a telephone box?

    [This is a question they ask in Oxbridge entry exams, for natural science courses, btw. You’re supposed to know your Brownian motions and apply the gas laws to derive the answers.]

    Instead, I will be asking some questions of a probabilistic, interconnecting and predictive nature to test to what extent WA is sufficiently intelligent to plot holistic decision paths — or, at least, provide related data points with relevant velocity.

    Re. cellular automata which underpins some of Mathematica, I’ve already commented it on my ‘The Global Brain’ knol.

    So in conclusion…………let’s please watch out for 18 May 2009 when we get to test WA for ourselves, objectively.

    SOME INTERESTING NEWS LINKS:

    http://news.cnet.com/8301-17939_109-10233763-2.html

    http://www.wired.com/epicenter/2009/05/how-the-wolfram-alpha-search-engine-could-save-google/

    http://www.independent.co.uk/life-style/gadgets-and-tech/news/an-invention-that-could-change-the-internet-for-ever-1678109.html