Back in 2009, while contracted to work on a doomed content-repository project, a flash of insight struck me: The problem with grand visions of the Semantic Web was that they all assumed a top-down structure. One wickedly clever set of rules to wrangle every fact. A global ontology.
It didn't make sense. Global ontologies are like Soviet Central Planning. Rules are meant to be broken. And top-down systems are crashing and burning everywhere you look.
Plus there was another un-fixable problem: Everyone with money to spend on these projects wanted machines to do the yeoman work. Because machines are cheap.
Think about that for just a moment. We're talking about organizing the sum total and nuance of human knowledge, but the entire world assumes somehow that this is a job for machines. That the best way to understand the complex, pattern-based output of human intelligence and language is to assign computers to decode it after the fact.
People think that makes sense because they think computers are magic, not machines. Meanwhile, in the real world, Text Mining Engines aspire to 75 percent accuracy. That's why our content-repository project failed. The client's product specifications couldn't be satisfied via the vendor's pathetic 75 percent accuracy rate.
So one night I asked myself: Could you reach the goal if you flipped the script on every core assumption? Not top-down, but bottom-up? Not machine intelligence, but human intelligence, assisted by machines? Not one "global graph" but many interconnected "directories of meaning" based on capturing machine-readable statements of fact during the production of human-readable articles?
And of course, the answer is yes, you can do all these things, and you can do them profitably, so long as you follow two simple rules: 1. Build tools that make it easy to publish directories of meaning; and 2. Give users the power to make their directories cooperate with other directories.
Once you do that, the need to create perfect top-down rules for knowledge disappears, because you'll have harnessed the power of emergent properties. If you build a good directory, others will want to use it.
What's so hard about that?
But people didn't get it. Most still don't, for lots of reasons -- including our very human inability to hear anything new without forcing it to fit into old assumptions.
They're about to start getting it, though, because now Google gets it.
The search giant has constructed a bottom-up directory of meaning. The company calls the product "the knowledge graph" and the service Semantic Search. Being Google, the company still sees the problem as a data-recovery challenge, but that doesn't matter. Once such directories exist, a semantic economy based on the value of machine-readable definitions is born. Once we begin feeding that market, information becomes a public commodity. And once we give people the power to define the meaning of their own words, and then to share those meanings in a mutually beneficial way, we'll have tapped into the same emergent property that generated Wikipedia.
It's not that complex. It doesn't require any exotic programs. But it does take vision, discipline and the tiny bit courage required to buck the status quo.
I agree with a lot of what you say, in particular that recognition by Google and the other search engines is significant. But I must take issue with one key point: the Semantic Web has never been top-down, and is certainly not about a global ontology.
The RDF model that's the basis for Semantic Web technologies is based around two simple ideas:
1. you can identify anything with a URI (a person, a product, a concept), not just Web pages
2. a Web link can specify the relationship between two things with URIs, i.e. going a step further than typical links between pages (which you could interpret as meaning "somehow related")
The rest of the technologies are about how to make use of these ideas. It's primarily bottom-up, by design, just like the Web. It's no coincidence that the inventor of the Web, Tim Berners-Lee, has been one of the leading lights around Semantic Web development. It's essentially the same Web.
Anyone can create their own vocabulary/ontology, and there are hundreds, if not thousands around. Ok, it's considered good practice to reuse existing terms, but that's not mandated anywhere.
The data itself can be (and is being) created by anyone, for guidance on how to check http://linkeddata.org/.
You say "we'll have tapped into the same emergent property that generated Wikipedia" - well that specifically has already been tapped into, see http://dbPedia.org
There have been recent developments around expressing data in HTML: microformats, RDFa and HTML5 microdata (which can all be interpreted as RDF), and it's those that Google and co. are mostly keying off. But the basic ideas haven't changed.
Finally, you're absolutely right to say "It's not that complex. It doesn't require any exotic programs.".
Posted by: Danja | Tuesday, April 17, 2012 at 08:15
To go along with Danny, I don't know where you get the impression of this "grand top-down vision". From the beginning, the Semantic Web has rather been developing along the lines of ontological diversity ... maybe too much diversity and lack of coordination, actually ...
And Google has not *built* this "bottom-up directory of meaning", it has just cleverly *harvested* it like it has harvested everything else of value on the Web.
Posted by: Bernard Vatant | Tuesday, April 17, 2012 at 09:04
Yep. Got it. I see the problem with writing rules that allow the "wrangling of every fact" (allowing for the diversity you speak of) as a top-down problem. It's created an infrastructure (XML/RDF/OWL/CURIE, etc.) that anyone can use, but it hasn't created incentives for enough people to use that infrastructure. We built it, but they didn't come.
So I don't want to argue over whether dbpedia is an emergent property of Wikipedia (my point was that cooperation was the emergent property) or a better representation of the structured information within Wikipedia, or whether data (however acquired) has value outside of a structure. I'm interested in what we can do with this motion.
Posted by: Dan | Tuesday, April 17, 2012 at 12:58