XARK 3.0

  • Xark began as a group blog in June 2005 but continues today as founder Dan Conover's primary blog-home. Posts by longtime Xark authors Janet Edens and John Sloop may also appear alongside Dan's here from time to time, depending on whatever.

Xark media

  • ALIENS! SEX! MORE ALIENS! AND DUBYA, TOO! Handcrafted, xarky science fiction, lovingly typeset for your home printer!



Blog powered by Typepad
Member since 06/2005

Statcounter has my back

« Why comments suck (& ideas on un-sucking them) | Main | Playing the Same Place »

Monday, May 11, 2009


Feed You can follow this conversation by subscribing to the comment feed for this post.


I asked this on Twitter as well, but have you given any thought to defining a RDF namespace to create a standard for marking up information within an article? A couple of friends (@kenkeiter and @stevenwalling on Twitter) see a lot of potential and started drafting a spec last Friday. Thanks in advance

Paul Balcerak

Phenomenal post, especially the point that journalists' value no longer comes from being eloquent historians, but efficient distillers of gobs of information.


Daniel: I replied to your Tweet before I saw this comment. To repeat it, I've never done anything hands-on with RDF, so all I know is the overview. I would very much like to see your thoughts on this and help out if I can be helpful.

Paul: Thanks. I think we should still aspire to be eloquent historians, but we have to accept useful structure if we want to relevant. It's like learning AP Style: It doesn't make you a great journalist, but it's the price of poker.


Also on the RDF theme: I'd written previously that the primary output of news organizations should be a news flavor of XML, until a friend suggested RDF might be a better option, and I think this was the reason -- the ability to markup subjects within the natural language "story."

I still tend to think in terms of XML, though, with info floating around the story.

And I deliberately didn't go into the distinction between the data companies would reserve and the data they would publish. That gets into the larger Semweb story, and this post was already goat-choking long.


Here's a critique of this post, but I can't reply to it there, so I'm bringing it back here. Brandon's thought is that I want to "replace journalism with informatics."

"One, invoking evolution as a guiding principle: please stop. Evolution goes down plenty of non-productive paths. Many adaptations are irrelevant or counterproductive. Just because a proposed system builds on the old is no guarantee of success.

"Two, the situation used to illustrate the proposed new journalism form — covering a local home fire — is cherry-picked. Journalism as database-friendly fact gathering could arguably work in this case, but more complex stories — politics, culture, anything involving enterprise — would not likely lend themselves to that sort of codification.

"Three, why do pay-for-content models amount to the creation of “artificial scarcity”? Reportage doesn’t grow on trees.


My takeaway: I don't want to replace journalism with informatics, I want to add informatics to journalism and use it to sustain and improve what we do.

He's right about some things not being as well-adapted to an informatics model... but some level of structure improves everything. Plus adding informatics based products doesn't preclude writing about any subject. It just gives you money to do so.

As for evolution, that's Kurzweil.

Jason Preston

Overall: genius.

In a data-driven society, there's definite value in accurate, *structured* data-sets. That's why companies like Comscore have a business.

I think this is one of many forks that journalism is going to take going forward, and probably one of the more lucrative ones. I also think, however, that the informatics model will not be thought of as "journalism," as much as other models that arrive (such as high-end storytelling that charges the readers for access).

The internet will support a number of models that work at various scales. Unlike the former system, where economics limited broadcasting options to certain mediums, the internet allows pretty much anyone to try pretty much anything and scale it to the level that works.


Three, why do pay-for-content models amount to the creation of “artificial scarcity”? Reportage doesn’t grow on trees.

Once an article is posted to the internet, it is *abundant* - the initial cost is irrelevant, because each additional (perfect) copy is free and easy to create. "Artificial scarcity" is: artificially limiting the number of people who can see it, in the hopes that you can then charge for it.

Alan M.

I'm relatively new to this subject, Dan, so I'm hoping you'll indulge me a little.... If the secret for journalists is to "own your data" because the data has value, then why would anyone give it away to the local news hounds? Sure, you could cobble together a data set on house fires that you could then sell to insurance companies. But infomatics on local restaurants, schools, businesses, et al -- why wouldn't these enterprises want to control their own information too?

Put another way: We're all already involuntarily giving away gobs of information about who we are, what we buy, what we search for & click on & read. Google & Friends have access to that information, but journalists don't. Now you're suggesting journalists scoop up more data to sell -- but this time getting the goods will require individuals & organizations to *voluntarily* divulge those goods. ... I'd talk to a reporter if I felt like he was serving some public good. But if the people formerly known as ink-stained wretches now show up at my door scraping for data to sell to Allstate -- well, that's a game I'm not really interested in playing.

Again: If the information has become so valuable, why would anyone voluntarily give it away?

Thanks in advance for stepping me through this one.


This is an absolutely fundamental question, and it's something Dave Winer and others have been asking for several years now. If news organization want to charge for content, shouldn't they also pay for information? These are the equity questions I referenced in passing.

I don't have one great answer, but several smaller ones:

First, the value of each bit of information is so marginal as to be difficult to meter. What has value is the structure that people add to large pools of information.

Second, much of the information in my example is public information, generated by public sources. It has to be provided to anyone, which means anyone can collect it and structure it for any purpose. Some of this info is already quasi structured, but much of it isn't.

So one answer to your specific question is that when I come around your door asking questions about the fire and your experience of it, that's really not the data that Allstate cares to buy. That's the semi-structured/natural language information that belongs in a story.

Allstate and State Farm, etc., want to know property values and damages and causes and square footages and responding departments and whether there were smoke detectors, etc. , and these companies need the sources for that info to be official sources, not some sobbing fire victim in her pajamas.

Because -- and this is an important clarification -- most of the potential clients of these commercial data products are already collecting much of this information, or paying specialist to do it for them. The business model here is most likely going to be based on a news organization's ability to do that job for LOTS of clients, better and more efficiently.

But that really doesn't address the deeper part of your question: Why should anyone volunteer information to what is a commercial endeavor? This question has always been relevant, but it's even more directly so in this case, because I'm proposing that content has value independent of its advertising value.

So your comment hits the target: There MUST be a public good from this, and it ought to be a better answer than just the promise of "better coverage of your community."

I think I could offer via all sorts of free information tools, plus higher value information tools you get via subscription/membership, etc.

As for local restaurants, schools and busineses wanting to own their data too: They can! And if it's cost effective to do so, they should! The issue here is that reporting and editing are expensive, and the value of these products only emerges once you've got a whole bunch of it.

In other words, collecting, structuring, organizing and publishing information is an important 21st century business, and anyone can join in. But most entities will choose to outsource that function because the revenues won't justify the overhead.

News organizations are a decent candidate to pick up this role because they're already paying that overhead cost in order to provide analog reporting. If you add informatics journalism to the workflow, you add a revenue stream with only a marginal increase in costs.

There are other groups who could compete. If I were a local Chamber of Commerce, I would "own my data" and use the proceeds to reduce membership dues. And if I were the local news org, I would probably buy a subscription to that Chamber data with a license to use it in my data products.

See how this works (in my head, at least)? Hope that helps.

Alan M.

That helps a lot, Dan. Thanks. ... It leaves me wondering: Given that the value of a database increases with its size, could news organizations possibly generate enough data by piggybacking on what reporters would collect over time? I doubt it.

Re: aggregating data from the Chamber of Commerce et al-- that would mean serving up information pre-structured by others, which is a business that begs to be 'disintermediated' by another business not bogged down by the costs of gleaning meaning & telling stories based on that data. I can't imagine news orgs want to be that (bloodied) middle man yet again.

Alan M.

P.S. I should add that I generally like the idea of mining the value of everything a journalist gathers while reporting a story, and structured data could certainly be part of that new business equation. But I keep thinking we're looking in the wrong direction. Journalists keep wrestling with the product we're delivering instead of the people we're delivering it to.

I know the word "community" is hardly a new one in these future of journalism discussions, but imagine you're standing in front of that community -- say, the hundreds of thousands of people who still get the Washington Post. The microphone is in your hands, all eyes & ears are on you. What do you say to help that community cohere?

The great lesson of Obama, I think, is to stop talking solely about the product you're selling, and more about the big story you're telling. By contrast, Hillary did what Bill did: she ran a retail campaign -- Social Security for elderly voters; college scholarships for the kids; trade concessions for the unions. Aggregate those niches, she believed, and a base would be built. Obama certainly didn't ignore this retail sell, but he wrapped it inside a narrative. "In the year of America's birth, in the coldest of months, a small band of patriots huddled by dying campfires on the shores of an icy river...." he said at his inauguration. Rendering that big picture was a keystone of his campaign, and the view was spectacular enough to draw a lots of us in. "Hey," you think, "the story he's telling -- that's MY story too. I'm not watching this drama, I'm *living* it."

I feel like journalists are stuck playing Hillary's game, zooming in on the hyperlocal but failing to complement that tight focus with an inspiring wide-angle shot.

Quick story: Famous non-fiction writer is sitting on a plane. His seatmate recognizes him, chats him up, and says: "My daughter wants to be a writer. Any advice?" The famous writer says: "She should do three things. First, read a lot and see how writers write. Second, she should travel, get out of her comfort zone, and discover the way other people see the world. Third, and perhaps most important, she should figure out who she is and write from within that; otherwise all she'll be doing is passing along information, and that's something of which the world is in no great need."

Journalists tell stories, and we need a big one.

What's our Story?


Your ideas about narrative and its value are good ones. Please understand I'm not dismissing them, just writing THIS post about a specific thing, which is a possible revenue stream.

There are all sorts of things I would tell that audience if I stepped in front of that microphone. And I'd start by making it clear that the old system, the one that you and I were raised in as professionals, had failed them.

That's the biggest difference between the way I think now and the way I thought back in the day. I used to thing that the old system had failed JOURNALISTS.

Alan M.

Thanks, Dan, for the insights on structured data, and for the details on how your thinking has evolved on all this stuff.

BTW: After reading the Xarker Manifesto (quite an impressive document), I'd love to know what you'd say in front of those 890,000 subscribers to the (Sunday) Washington Post. You on stage, microphone in hand, for one speech to rally the troops. Thirty minutes. No holds barred. I'd certainly buy a ticket.

Steve K


While I was still trying to digest this post, I ran headlong into Dave Winer's piece urging Google to wake up to Twitter, because

"the place people turn to for news is shifting. It never was Google, that wasn't something it ever did well. But it is something Twitter does, and at this point it doesn't do it very well. But the path is very clear, the information they need now flows through their servers. They just have to figure out the user interface. They will eventually figure it out. That's the half of the problem that Google already knows how to solve. But Google doesn't have the users. None of its products have the kind of flow that Twitter has, nor the growth that Twitter has. That's what Google has to get busy building. Once Twitter is delivering the news search that Google can't, it will be way too late."

Got me thinking that Google/Twitter would/could immediately satisfy the scale problem, and with some smart data organizing behind Twitter, start assembling the data into useable (sale-able?) format. Maybe.

Don't know if it'd be journalism. Don't actually know if this makes any sense. But I was really struck by the parallel logics here and over at Winer's place.


Steve, I think that's because there are lots of people working out of the same logic tool kit. Sometimes I think my main purpose in life right now is translating that logic to people who are taking their first steps outside the newsroom world.

Like most visionaries, Dave is probably as right as he is wrong. But the visionary game is more like hitting a pitched ball than free-throw shooting. You're a lousy shooter if your percentage is below .800. You're a great hitter if your career percentage hovers around .300. Dave hits with both percentage and power, so I pay attention to what he's thinking.

Twitter has an opportunity to "own" real-time search, because they literally own the channel in which the communication takes place. Google doesn't. The limitation with Twitter is the Twitterstream -- it's huge, but it's a tiny portion of the whole.

Alan: Gracias. That might be a post in the near future... just gotta get through a bunch of meatspace to-do items first...

Brad King

Dan: I've been pushing this tune for a few years, but this is one of the more erudite posts on the topic. It's not just about data, it's about the structured data. I've got this post saved :) I'm sure I'll be sitting down and making the editors and publishers I work with read it. Slowly. Twice.

Adrian Holovaty

Hey, you should check out this essay I wrote in 2006:


It's very similar to what you've written here, in that it advocates that journalists structure their data.

I'm a journalist and developer who has been doing this sort of work since 2002. I've been blogging/presenting about it for a number of years now, and there's been some uptake at a couple of news organizations but nothing large-scale. I did a presentation at The Guardian last year, which you can read about at http://www.guardian.co.uk/media/pda/2008/jun/06/futureofjournalismadrianh -- and check out the photo there, which coincidentally captures my slide that talks about the granular bits of data within a police story (an example very similar to your fire example).

Also, you might be interested in EveryBlock.com, my latest effort in this area -- structured news at the sub-neighborhood level in selected American cities.

Anyway, it's great to see this philosophy getting some more attention -- really nice essay!



Dude. Of course I know who you are. I'm your geek fanboy. Thanks for the read.

Account Deleted

Blogs are so interactive where we get lots of informative on any topics nice job keep it up !!

The comments to this entry are closed.