XARK 3.0

  • Xark began as a group blog in June 2005 but continues today as founder Dan Conover's primary blog-home. Posts by longtime Xark authors Janet Edens and John Sloop may also appear alongside Dan's here from time to time, depending on whatever.

Xark media

  • ALIENS! SEX! MORE ALIENS! AND DUBYA, TOO! Handcrafted, xarky science fiction, lovingly typeset for your home printer!



Blog powered by Typepad
Member since 06/2005

Statcounter has my back

« Does this sound like government bullying? | Main | Dan's viral video pick du jour »

Tuesday, July 18, 2006


Feed You can follow this conversation by subscribing to the comment feed for this post.



I found this article thoughtful and exciting. However, I need a little clarification: how does this technology differ from, say, Amazon's very strange ability to predict what I want to buy OR what I might like even though I don't know it exists (and they are surprisingly right), before I know based on my purchases cross-referenced against similar consumers? Is this a different filtering system? (This is not a criticism, I'm genuinely interested).



Very similar. But suggesting something like related titles (Amazon) or movie preferences (Netflix) is operating in a very limited environment -- a few thousand options with relatively obvious correlations.

What this suggests is that by tracking your online choices, software can essentially intuit deeper patterns and offer non-obvious suggestions. A hypothetical: If I'm searching for used car deals and commenting regularly on global climate blogs, maybe there's a non-obvious correlation between thriftiness and environmetal concerns and some third interest. Like Amazon and Netflix, such a system could give me suggestions and learn from my responses.

Even something as simple as sorting Google News search results based on individual click patterns is potentially playing with far more subtle relationships than Amazon's recommendation system, which is still revolutionary in its own right. Because if it's reading my click paths in response its ranked results, it's learning things about me I may not even know.

Another thing: Such systems can make highly accurate predictions without identifying the values and variables that produce the accuracy. Bob Chapman has a neural network setup that accurately forecasts shrimp harvests when you pour a bunch of data into the front end. Bob didn't write the program to weight certain variables above others: He wrote the program so that the software can "learn" from previous results.

Anyway, similar principles, more complex applications.


This is absolutely fascinating. I remember my high school biology teacher warning us about the speed that information would double, back then I couldn't see how it could possibly be a problem. Now, all I have to do is need a specific fact and I have to wade through piles of sources.

On a side note, I met a very interesting man from NJ, the other day. While pumping gas he told me about plans for a plant that will "help" our local shrimping industry. It was quite possibly the most fascinating conversation I've had with a stranger in years.


This is genomics...revisited. The bioinformatics field that has been essential to genomics - actually, all of the -omics (proteomics, metagenomics, metabonomics) - allows us to make sense of all of the sequence or peptide data we generate. The amount of data is mind-boggling. One microbial metagenomic library could generate the full genome worth of sequence data for 30 microorganisms...that's one sample. In science, we're already fully immersed in the data age - and let me add, it's pretty fun. But you have to let yourself take a breath and come up for air from time to time. There's no limit - we can fish for anything now. One evolving problem: a single person or lab can generate so much data that they can't process it all - making that data accessible to the public speeds up science, but then limits opportunities for the lab that generated the data to begin with. Science needs to change with respect to how we are measured - it won't only be "I got 'x' number of publications last year" but "I got 'x' number of publications last year which generated 'x' number of publications from other laboratories." There's reluctance - it's not so much than scientists don't want to share, but they just don't have confidence that the intrinsic value in the data they obtained - data for data's sake - will be valued.


Yes: Genomics begat bioinformatics, which begat the more generic application as Discovery Informatics.

Jean McGreggor

Maybe it's because I'm not a tech-geek or perhaps it speaks to my deep seated distrust of large corporations and the government but I find this slightly creepy.

The comments to this entry are closed.