Since the new-media conventional wisdom machine is having another loose conversation about the “atomic unit” of journalism (thank you, Jeff Jarvis, for kicking this one off), let's use this fleeting moment of attention to advance the subject toward its ultimate destination.
Future journalists are going to be in the information business, not specifically the storytelling business, or the analysis business, or the Tweeting business, or the liveblogging business. What separates the information contained in all these existing journalistic forms from the journalism that will be valuable in the future, is that the future will require us to store the new information we report in ways that are efficiently usable by computers.
So thank you, Mr. Jarvis, for pointing out that quality reporting need not result in an article. Thank you, Jonathan Glick, for noting that mobile interfaces are changing the way we consume news. Thank you, Amy Gahran, for saying that we need better word-processing and browser tools. These aren't exactly new ideas (Jarvis, Gahran and many others have been making similar points off and on for years now), but the recent cascade of discussion makes this a noteworthy moment.
The flaw in this line of conversation is that it ends at the water's edge, by the banks of a river of change that separates the confused state of modern journalism from a future that may offer astounding rewards.
What Gahran is pitching (a Lego approach to storytelling) is innovative and interesting, but in the end, it's still just storytelling. No matter how artfully assembled and thoughtfully edited, a package of Tweets and posts and photos and videos and stories and analysis is merely an adaptive 21st century extension of our old 20th century theory of journalism. Should we do what she recommends? Absolutely. But the results will not fundamentally change a status quo that is in dire need of a revoution.
For all you commenters out there readying your flamethrowers, here come the necessary disclaimers. I'm a fan of storytelling with 20 years in the news business. I've been blogging for eight years (12 if you count the blog-style news update I created with five or six other journalists and techies as the outer bands of Hurricane Floyd whipped Charleston one night in 1999). I not only “get” the pro-am approach to curating news and information, I've done it. Repeatedly. So has my award-winning wife, who put pro-am teams to work covering everything from elections to opera. So please, please don't tell me that I don't understand the power of story, or blogging, or social media, or this or that new software platform. I get their value, but I also know their limitations. First hand.
Got it? I love bicycles, too, but I wouldn't recommend them as a spacefaring technology. And the task before us is not just some search for a cool new way to get around the neighborhood.
Today's journalists report information and file it as natural language text in all the formats mentioned above. They do this because all the tools we have for journalism are based on workflows that were created to get news out to groups of human beings that advertisers want to convert into consumers. In that business, the attention of the consumer group has value, not the information that attracts the group. The vast majority of our journalistic traditions are based on this model. It forms our media culture. It filters the pool of talent that enters our profession.
This status quo is, as Dave Slusher has pointed out, the media equivalent of kerosene. Kerosene production was a huge industry in the 19th century – so successful, in fact, that it wiped out the whaling business in North America. One of its waste byproducts was a volatile liquid called gasoline. It took about two decades of development in the automotive industry (weeding out steam and electricity as competing power sources) before people in the kerosene business began taking gasoline seriously in the early 20th century.
Today we live in a global economy with an appetite for information that can be used and reused. But so long as we limit our thinking about the information we report to the production of kerosene journalism in varying grades of quality, we will never tap into this new source of abundance and possibility.
The fact is, we do need new word processing tools – specifically, a writing tool that marks up the information communicated in standard kerosene journalism so that computers can process and store it. Not metadata that helps searchers find stories, mind you. Metadata that lets programmers tap into the complete set of discrete bits of knowledge that a news organization has produced over time.
The technologies and standards required to begin this act of creation are available today as the byproducts of systems that were designed for other purposes. Unlike kerosene journalism, which is economically valuable only during the brief moment when consumers are interested in it, the approach I describe produces value that that increases with the size of its resulting data set.
Because ultimately, the atomic unit of journalism is the meaningful, useful, reliable answer. For that unit to have value in a global information economy, we must store the answers we derive in ways that will satisfy questions we haven't even considered yet. For this to work, we must connect the question “what's new?” to the question “what do we know?” so that the first feeds the second and the second informs the first.
Once we do that, then Chuck Peters' wild-eyed dream of a media company that connects to anyone it touches via every aspect of community life won't look so daunting. Steve Buttry's models for community engagement will become increasingly profitable. And Jay Rosen's continuing quest for a new theory of the press will get even more interesting.
Until then, we'll keep running on kerosene.
(This is my bio. This is a list of my essays on these topics. This blog uses a Creative Commons license.)
Favorite quote ever:
Posted by: Danielbachhuber | Monday, June 06, 2011 at 21:44
You make a very good argument, and I agree with it in the sense that "answer" is a good atomic unit (though stories will continue to be popular, but you know that.) But I wonder how structured our structured data really has to be before it's useful. Consider how well IBM's Watson does in answering questions from a huge database of unstructured text. I don't think we really yet know enough about knowledge representation to get ambitious about representing the world as metadata.
Posted by: Jonathanstray | Monday, June 06, 2011 at 23:52
Thanks Dan. My favorite quote is "we must connect the question “what's new?” to the question “what do we know?” so that the first feeds the second and the second informs the first."
In this fast developing globally networked world, our kerosene approaches are holding us back, and are not much fun. To have more fun, and make more progress, we need to approach our work with a focus on the information network, not any existing product.
The technological issues are within our grasp. The cultural addiction to kerosene is much more difficult.
Wild-eyed?
Chuck
Posted by: Cpetersia | Tuesday, June 07, 2011 at 06:38
Jonathan, I took a different lesson from the Watson stunt. It took IBM years and an estimated $2 billion to build a "computer" that consisted of 90 servers in 10 racks, and that computer was barely better than humans in a speed contest that involves low levels of confidence and -- in this case -- had to have specially selected questions so that Watson would even stand a chance.
This is not to say that we won't have true AI someday. But we don't today. And while everyone has been pinning their hopes on Natural Language Text Analysis engines, my experience is that none can produce results with the level of confidence (better than 95 percent) that you'd need to build reliable systems.
The one system that I know that produces information with those kinds of degrees of confidence is the human news reporting system. Not that we journalists approach 95 percent confidence in "breaking" information, but that as we move through the process, our confidence on discrete statements and sourcing is supposed to reach these levels.
So rather than spend billions of dollars and years of development cycles on Watson-style AI, why not invest a few million on relatively simple upgrades to word processing software and web/print publishing CMS? Embed the coding steps in an intuitive, machine-assisted workflow. That's the foundation of the business idea I'm working to develop with Abe Abreu.
And I disagree bout what we know about representing the world... so long as we're building that representation pixel by pixel with discrete RDF triples, not block by block with massive, dependent summaries or top-down, heavy ontologies.
Chuck, I think getting wild-eyed every now and then is an absolute requirement. It's where visions come from.
"The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man." -- George Bernard Shaw, 1903.
Posted by: Dan | Tuesday, June 07, 2011 at 11:48
Really interesting post Dan. I wonder what your thoughts are on this somewhat related post by VC Albert Wenger: http://continuations.com/post/4158738112/embrace-messy-data-to-reach-internet-scale
The point being that you can only enforce so much structure if you want to achieve real scale.
I tend to agree with you on the whole and think that a professional creator class has the incentive to structure their data properly if it results in economic value for that professional organization. I wrote about that here: http://goldbergfam.info/blog/2011/04/20/an-open-standard-for-commercial-content-syndication/
On a related note: what is the business idea you're working on with Abe Abreu?
Posted by: Davidgoldberg | Tuesday, June 07, 2011 at 16:20
I interpret Watson in a different way: doesn't matter what it cost. Point is they demonstrated and advanced the state of the art in extracting useful knowledge from all available data sources, structured and unstructured. And when you read the technical papers, structured data wasn't at the heart of that success. General ontologies and DBPedia tables worked well for certain closed domains (say, presidents, countries, species, basic constraint relations like country-isnt-a-person, etc.) but weren't the main knowledge store or type inference engine. The main knowledge store was unstructured text from a huge variety of places (Wikipedia, newswire archives, web crawls, etc.) plus an open type inference system based on statistical patterns of word use (the algorithm is called PRISMATIC.)
Yeah, structured data is great. But what do we want to use it for? Answering someone's question, right? At the moment, the best open-domain question answering techniques are in Watson, which improved the state of the art accuracy from about 30% to >80% in half a decade. As for the hardware required: every time you do a google search you use that much computing power. If I was going to base a startup off of answering people's questions, I would be stockpiling unstructured data -- and smart humans, as you so rightly point out.
Posted by: Jonathanstray | Wednesday, June 08, 2011 at 00:20
Ah. Perhaps I see now the difference between our premises. My assumption is that question answering by smart humans is going to be so massively amplified by algorithmic question answering systems -- really they're just very clever search engines -- that that's where we want to focus our investment at the current time, if we want the fastest possible increase in the general quality and fastest possible decrease in the general cost to get a question answered for a random member of the majority of humanity.
Or, let me put a question to you instead: how do you forsee this massive new quantity of structured data being used during the question answering process in the future?
Posted by: Jonathanstray | Wednesday, June 08, 2011 at 00:31
David, I'd say that the excitement in the Web 2.0 period of the past decade was around distributed knowledge. We were all excited by the power of folksonomic systems, hashtags and ad-hocracies. There was a sense that we had discovered a self-assembling principle for information. This messy looseness was a strength, because it enabled connections that occurred without top-down control.
I think the new experience is of the limits of these systems, particularly from an economic viability standpoint. As powerful and democratic as they are, they're not assembling the large audiences that are required to produce traffic that investors can get excited about. Very few of us are making money off content via these systems.
So I see messy data as abundant and valuable and easily acquired, but the processing costs of using it for anything other than low-value, ad-supported media are probably higher than you'd want them to be. Since most people are focused on trying to drive down overhead so that low-value media can make profits, that's probably a good place to put some attention.
I'm just not particular attracted to the low-cost model, because I don't like where I suspect that leads. And the things that interest me tend to be information applications that value the information instead of the audience.
On your post about open standards for commercial content, two small thoughts:
First, I think your parting thoughts about social media "finding" content for people are top-level important. Because the idea that news finds the consumer is becoming increasingly true, and the channels by which that news finds us are increasingly proprietary.
Second, I'd be willing to use any open standard that works, but the big issue to me is the value of the content, and who it really belongs to.
I think the value of data lies not in each individual point, but in the structure that gives it context and meaning. But can we apply the same standard to unstructured news? Who does "news" belong to, and how long does that "ownership" last? The answers that I get when I ask those questions are so ephemeral that I haven't wanted to invest great energies in exploring them.
So yes, if you've got content that you can value and clear ownership of that content, then you could make progress in these areas. But my concern is that it won't be the best open standard that wins, but the agreement on a standard -- any standard -- between the industry organizations that have the most to gain or lose.
The business that I'm working to develop with Abe is one that would demonstrate and develop a system for embedding and publishing machine-readable meaning in regular old human-readable documents.
Posted by: Dan | Thursday, June 09, 2011 at 11:33
Jonathan: Now THAT could be a very successful business! Smart humans, working with algorithmic tools producing FAQs that can be easily generated and retrieved. Very interesting idea you have there (if I've read it right).
Here's how I think my structured approach might play out:
A news company starts using s workflow that not only tags factual statements with machine-assisted RDF statements, it records those RDF statements in a directory. The directory belongs to the company, but the majority of the directory is publicly available to any user (or robot).
And then magic happens. ;-)
I kid about this because it's the thing I hear a lot: "You can't just black-box what happens next, because that's not a business." Which is absolutely true.
BUT: it's also true that if I owned a directory like this, I'd be doing two things: 1. I'd be writing apps and APIs that could parse this data into things that answer questions that certain groups of people consider valuable; and 2. I'd be expanding, shaping and improving the data I capture so that I can filter it into datasets that I can sell to institutional clients.
I don't think THAT'S magic. I think that's an economic process of discovering what data has the most value and then adapting your business to make that transaction profitable.
The magic part would be the secondary, emergent aspects of these transactions. If any organization can use a similar workflow tool to mark up its valuable data, wouldn't it make economic sense for those organizations to develop cooperative, open standards that allow for the buying and selling of their data? I think that's where things would get interesting, because the market would bend toward datasets that offered truly authoritative answers. I think you'd find authority through cooperation.
So I don't think you start by imagining every standard. I think you start by creating a tool that makes it practical for groups of humans without special training to mark-up text, re-use existing knowledge, and customize their information models without having to write a work order to the I.T. department every time.
Posted by: Dan | Thursday, June 09, 2011 at 12:03