Abstract: This 3,000-word essay proposes and describes an economy based on exchanges of intelligently structured data; argues that the path to that economy will not be imposed from above but will emerge like a crystal lattice in a super-saturated solution; explains why a Semantic Content Management System for the publishing industry could be the profitable catalyst for this development; discusses the outline of such a system.
Clay Shirky blew up the grand vision of The Semantic Web in 2003, which is probably why serious people aren't particularly alert to the possibility of a semantic revolution building in response to current conditions. That's probably because the general perspective on semantic architecture is that it must be grandiose and top-down. I believe the route to our semantic future comes from the opposite direction, and I've tried to make that case by imagining and describing a publishing system that would create information structures that could spread because the publishing system would make those structures profitable.
But the widespread inability to imagine the benefits of such tools points to a more fundamental lack of understanding, and it occurred to me recently that maybe it's time to address it.
It's the Semantic Economy, Stupid.
Sure, everyone knows that we live in an information age, and that networked media is accelerating its expansion. But what goes unsaid about that notion is how horribly inefficient the information age has become. Networked media gives us instant (and too-cheap-to-meter) access to generally relevant answers, the tech industry gives us unprecedented and highly affordable processing power, and everyone with an ISP has as much publishing capability as they need. It sounds like the beginnings of a highly productive, highly profitable era – except it's neither profitable nor productive at the moment (at least not if you define productivity by the dollars generated from work).
The flaw in this equation should be obvious: Practically all of the useful data in this “information” economy is bound up in one of two obsolete formats: unstructured, natural language text documents, or proprietary, unpublished information structures. We call the first format “articles,” and the second one (in the generic sense) the “deep web.” In this instance, I'm speaking primarily of databases that connect to the Internet, but whose contents are not typically indexed.
It's as if we've come up with all the technologies needed to create a modern automobile, except it's 1870, and the only fuel source we have for this remarkable new machine is coal.
To carry this energy industry analogy further, the raw material of this information economy is essentially like oil shale: the latent value is obvious, but the cost of extracting these information resources from today's existing deposits (think web archives) is so high given today's technology that no one is going to spend a dime to start the project. The value of all things semantic is seen either as a series of data-recovery issues, to be solved in time by some future technology, or as a W3C pipedream, driven by top-down folly.
Robust data, the fundamental building block of any fully functional information age, remains disconnected from our business models.
Consequently, we are stuck with a contemporary global economy in which very few people are reasonably well compensated for producing anything. In the same sense that our manufacturing sector has collapsed because third-world factories charge next to nothing for labor (often with resulting deficits in quality), so has our media economy collapsed because our only proven model for funding the creation of news and information comes from renting consumers' attention to commercial interests. Since most people are happy to pay attention to low-grade schlock, the business case for producing high-quality, useable information is increasingly weak. Why invest in an expensive product with a lower rate of return when the cheap product makes you more money?
And, since the supply of things to which we can pay attention is expanding as if governed by Moore's Law, so too will the rates we charge for shares of audience attention continue their industry-killing plunge. The media-industrial complex seems to have accepted this fate over the past year, proposing survival plans based primarily on cost-cutting. The deep irony of these visions of the future is that they foresee an expanding information age – without a sustainable revenue stream for the creation of reliable new information. Plus, as depleted as these corporations are now, there's little to be spent on the research and development that could lead the way out of their black pit of despair.
Hence, we hobble through this age of information glut via what is essentially a mid-1990s technology: search. Search is our tool for cutting the Web down to useful slices, and it's woefully obsolete. It patrols a universe of unstructured, natural language text, searching for keywords that bots view as strings of symbols, not units of meaning. It sorts them by proprietary algorithms, generally ranking keywords by their popularity rather than by their relevance. A search engine that scores 65 percent relevance is doing quite well, and there's little that can be done within the current system to improve dramatically on the redundancy, timeliness and other factors that make a search result more or less useful.
In other words, in an age of superhuman information flows, we lack the superhuman tools required for managing those flows. The best we can do, it seems, is to give users free search tools and results that – while better than nothing – still require detailed professional attention and expensive additional investigation. A 65 percent confidence rate might be good enough for a search company that exists to rent your attention by the keyword, but the unimproved results are not commercially useful to the person consuming them, and they're nothing but gibberish to machines.
Ironically, Microsoft's “search overload” marketing campaign for Bing effectively describes this global problem, while promoting yet another tool that fails to solve it – or even address it. We don't need another search engine, much less Bing's under-delivered “decision engine.” We need definitive data that is delivered in a universally usable format with all its rich context intact, ready to plug into machines that produce profitable answers instead of expensive puzzles.
Moreover, it's incredibly difficult to pay people for their contributions in this non-semantic economy. Forget about the absurdity of modern copyright law for a moment and consider nothing more than the concept of intellectual property. Despite the Web's clear hostility to middlemen, it's the legacy publishers and parasitic aggregators who still tend to get paid in today's system, not the originators and collaborators who make our discoveries, produce our insights and provide the raw materials and context for the next breakout success. Those subtle lines of intellectual and artistic begetting are traceable to careful scholars, but we simply don't have the practical tools that would allow us to consider systems for compensating these creators for the ongoing value of their contributions.
So that's the problem. Now comes the imaginative part.
Imagine a global economy in which every piece of information is linked directly to its meaning and origin. In which queries produce answers, not expensive, time-consuming evaluation tasks. Imagine a world in which reliable, intelligent information structures give everyone an equal ability to make profitable decisions, or in many cases, profitable new information products. Imagine companies that get paid for the information they generate or collect based on its value to end users, rather than on the transitory attention it generates as it passes across a screen before disappearing into oblivion.
Now imagine copyright and intellectual property laws that give us practical ways of tracing the value of original contributions and collecting and distributing marginal payments across vast scales.
That's the Semantic Economy. It's based on the notion that superhuman information flows require superhman tools that scale to the size and speed of the global production of useful data, and it could re-establish in the 21st century the connection between the value of things and the act of producing them. It's based on the idea that every exchange of properly coded information is an exchange of value, and that the institutions that broker and enable such exchanges will capture enormous profits.
The good news – the truly astounding news – is that all of the essential tools and infrastructure required to launch such an economy either exist today or (in the case of laws and regulations) are merely waiting for the emergent properties of such an economy to become apparent before beginning their own accommodating transformation.
To succeed, make money
And here's the stunning thing: Every bit of this economy can be extrapolated from the relatively simple act of creating a Semantic Content Management System for the publishing industry. Because you don't have to create and propagate the perfect W3C standard to launch the semantic revolution. You merely have to show that adding a semantic layer to your existing information gathering operations will boost your profits. With apologies to “Field of Dreams,” the quote should be, “If you make a buck, they will come.”
The question now is: At what point will someone outside of the legacy publishing industry step up and build a tool that gives writers and editors an efficient workflow for the creation of in-line semantic markup?
By using that product, intelligent publishers will be able to add extensible machine-readability to the documents they currently produce. In doing so, they will create unique, curated data-sets that have independent and lasting value, while networking facts at the data level instead of merely the document level. The first system to do so will add a persistent new revenue stream to the publishing business, thereby establishing a new commercial value for information.
This is not to say that the Semantic Economy will be or should be based solely on the publishing industry. The publishing industry is merely the most likely place for such an economy to put down its first roots. Once the publishing industry is transformed by the bottom-up creation and marketing of profitable semantic products, other industries will realize that they, too, produce information that has value to someone. And once they begin capturing or repackaging their information in profitable, interchangeable formats, the global Semantic Economy will begin to emerge.
The obstacles to such a development should be obvious, particularly to anyone who has spent a significant amount of time in the news business. But once the initial hurdles have been cleared, once the tool and its resulting structures have been implemented and propagated, the Semantic Economy should expand like a crystal lattice in a supersaturated solution.
How might such an economy function? Let's examine its application to Chris Anderson's idea of The Long Tail for one line of answers.
Anderson's original insight suggested that it might be possible to create a new value chain, connecting niche consumers to niche content in affordable ways that would compensate the original producers. Kevin Kelly later expanded and redirected this line of thinking with his “Thousand True Fans” concept. So far neither idea has approached anything resembling real-world implementation.
The unrealized potential of The Long Tail is a semantic issue masquerading as a marketing problem. Sure, lots of people are producing plenty of content that could have significant value – if only they could convey it to the right consumers. The flaw? The cost of making those intelligent, personalized connections remains so high that it removes all profit from the equation. It's the unit-cost of connections that renders the Long Tail irrelevant to the producers of niche content.
In the absence of a system that conveys meaning and context, consumers are stuck with marketing claims, familiar brands and name recognition. We may not find our secret heart's desire in our known mainstream channels, but they deliver higher signal-to-noise ratios than mere random poking about. The answer to the question “Why are certain things popular?” is quite directly “Because they were already popular.” I call this system and its resulting culture “the Celebrity Heuristic.”
While popularity will still be popular, the Semantic Economy will enable the profitability of alternate distribution channels in the same way that digital publishing tools made such channels possible in the first place. Today's Celebrity Heuristic, which keeps all the accessible wealth in the Short Head of the revenue curve, produces the equivalent of a creative banana republic: a small percentage of hyper-wealthy successes, a vast majority of talented but uncompensated creators. But the Semantic Economy, by efficiently engaging the power of the Long Tail and connecting exchanges of value to the producer of the original product, will allow for the rise of a new group our century desperately needs: the Creative Middle Class.
That's one line of development. But what about those other producers of information – for-profit companies, tax-funded government agencies and non-profit institutions of various forms?
Organizations fund and produce most of the world's information, and many of them take great pains to mark up their documents and files according to standardized metadata schema. Yet these efforts seldom produce significant revenues for these organizations.
The Semantic Economy will differ from our current system by providing the first for-profit exchanges of data-sets within interchangeable information structures. As these exchanges and formats emerge from the publishing industry, the jobs of determining value, connecting buyers and sellers and creating effective info-structures will be filled by companies looking to make a buck. And when enough money is being made on both ends of these mediated exchanges (in savings and opportunities by buyers and in direct payments to sellers, not to mention the transactional fees collected by mediators), organizations will see the value in their data and act in their self-interest, as stockholders will no doubt insist.
In other words, amorphous “markets” for information exist, but the infrastructure that would provide a practical “information market” does not. We are like Pre-Enlightenment Europe in that regard: 16th century companies needed capital and aristocrats needed ways to make use of their New World fortunes, but stock markets didn't exist yet. A 16th century Dutch importer who needed cash to expand his warehouse space provided a “market” for capital, but until the development of a financial system (including stable currencies) that could support private trading of equity, our poor importer was essentially limited to waiting for the establishment of the Amsterdam Stock Exchange in 1602.
An Age of Discovery preceded The Enlightenment, providing the new resources that fueled the 17th and 18th centuries' simultaneous leaps in finance, economics, science, political theory and the arts. This historical situation is not unlike the one in which we find ourselves today, only in a highly compressed timescale.
The next steps
I first came in contact with these ideas in 2005, and I have been writing about them off and on ever since. In 2009 those essays led to a job with a consulting company, which farmed me out to a media-company client that was one of the only news-industry organizations in America actively seeking to mine the profit potential behind these concepts.
I learned invaluable lessons from that experience. First, it became sadly apparent that the obstacles to developing and implementing these technologies within existing media companies were overwhelming. Even an integrated media company with a committed emphasis on technological innovation lacked the cash, resources and staff expertise to sustain such a program. Second, I concluded that the relatively simple act of producing a publishing dashboard with a limited set of integrated semantic-markup tools would give cost-conscious publishers the ability to efficiently re-purpose content across platforms. Finally, I realized that such capabilities would also enable the creation of a revolutionary new class of revenue-streams – with implications that extended far beyond the news-publishing industry.
In my off-hours, I began developing the specifications for such a publishing system.The more I studied and discussed these ideas with like-minded semantic geeks around the world, the more I began to see the practicality and profitability of this system I was describing.
Unfortunately, the project I had joined “already in progress” was too far along to incorporate the concepts (particularly the role of RDF) that I considered to be foundational, and when my contract expired I didn't attempt to renew it.
For months I was too disappointed to reconsider those ideas, but after a few encouraging conversations and some time spent lurking around several discussion threads in the fall of 2010, I began to see that there might yet be another way to implement my system. One simply needs to look outside of the media-industrial complex, where short attention spans, ignorance of semantic principles and hostility toward non-narrative alternatives turn every attempt at describing novel solutions into a brutal uphill slog.
Consequently, I've been working on a new description of this system, which is based on the intuitive creation and management of inline RDF and its requisite directories of semantic namespaces. Such a system would be designed for customized implementation within any segment of the publishing business, and would operate essentially as a middleware dashboard, creating, ingesting, remapping and exchanging XML files across multiple input and output systems. Additionally, I've refined the original target market to such a system's natural first application – and it ain't newspapers.
Because this system is effectively an undeveloped product, I will not be describing it publicly under this site's Creative Commons license. Anyone wishing to learn more about my work toward a functional spec for a practical SCMS is encouraged to contact me at firstname.lastname@example.org.
Further reading: My essays on media topics, including semantic concepts, are described in this directory.