I've been in catch-up mode since returning to Charleston, and just now noticed this item in Poynter's E-Media Tidbits from Saturday: Google News is offering its Dutch-language users a feature that will adjust adjust news searches to meet the user's individual preferences.
This isn't some "customize this page" button that lets you fiddle with fonts and colors. Instead, Google News' Dutch beta is the first little step toward a media future that is fast approaching all of us.
Here's the revolutionary concept: Rather than ask you for your preferences, this new class of tools figures out your preferences without your conscious input. This is going to be a Future Shock moment for many of us, because our unconscious preferences are likely to be far more accurate in delivering us the stuff we really want.
Poynter contributor Fons Tuinstra concedes that point, but raises the obvious ethical question:
All right -- there might be a fair chance that Google might know what I want better than I know myself, in my chaotic, incoherent way of searching for news. But if that's the case, do I want to know they know me so well?
Allow me to rephrase: Yes, we want to know that they know us so well. The real questions are, What control do we want to exercise over that ability? What protections do we require from unauthorized "knowing?" How can we use these powerful and beneficial 21st century tools without being victimized by them?
Mega-media and Discovery Informatics
One
of the reasons that media is in such chaotic disarray in the early 21st
century stems directly from our inability to grasp how rapidly the flow
of information increased in just the past 12 years. In the early 1990s,
researchers struggled to find the data they needed. Today, researchers struggle to filter out the glut of unneeded data. It's as if we moved from the Sahara to the Amazon but forgot to update our wardrobe.
Consequently, debates about Old Media versus New Media are rife with romantic notions about the ability of educated, alert media consumers to separate good information from bad. Don't bet on it. The volume of news-media content and blogosphere commentary already exceeds the capability of unassisted human intelligence, and news is just one slice of the larger information pie. When I first wrote about this subject in February 2005, our global civilization was producing 20 terabytes of information -- the equivalent of all the books in our Library of Congress -- every single day. Chew on that for a minute -- and know that in that minute, more information than you will ever read in your life was just created and communicated.
Putting that information to work for humanity requires tools that scale to the size of the problem. And that's what this new Google project represents: the application of non-human intelligence to a human environment that is evolving at a super-human pace. But just as human tools created this chaotic information glut, so too can human tools bring order to it.
The science behind these new tools is so new it didn't even have a
name (Discovery Informatics) until 2003. To understand its basic
concepts, set aside your ideas about computers as number-crunching
tools that help us answer questions. Discovery Informatics applies
non-human intelligence to find better questions.
Scientists and cops were the first to require such pattern-seeking
tools, but it was already obvious in early 2005 that it was only a
matter of time before the rest of society would need them, too. As I
wrote back then:
The ability to find digital needles in data haystacks is a nifty trick for a variety of government interests: intelligence, counterintelligence, law enforcement, etc. (Jim Young, director of the Discovery Informatics program at the College of Charleston) points out that the National Security Agency, which specializes in eavesdropping on international communications, has become one of the best job markets for new statisticians.
And guess who else is snapping up graduates these days? Google.
From today's perspective, the search-engine giant's interest in DI graduates looks obvious. First Google brought us the world. Now Google will attempt to make sense of it for us.
What Google says
Here's Google's Dutch introduction page. I don't speak Dutch, and the robotic translation is predictably clunky and constrained by Internet-adapted language (does "bladwijzers" mean "booklet indicators" or "bookmarks"? Does "labels en opmerkingen" mean "labels and observations," or is this the Dutch way of saying "tags and categorizations?"), but what follows is my translation of the robotic translation, warts and all:
*We provide the search results that are most relevant for you. Google personally arranges search results on the basis of your previous searches. In the beginning you will not see much difference, but the more you use Google, the more your search results will improve.
* Your searches will be managed to reflect the web pages, images, headlines and Froogle-results on which you have clicked in previous searches. You can remove items at each desired moment from your search history.
* You can make online bookmarks for your favorite Internet sites and share tags and observations that you can use everywhere. You can search later in your tags and notes, and access these bookmarks from any computer by logging on.
What it means
This
may strike some of us as weird, but we already have machines that
learn. Consider the US Postal Service's problem: How do you teach a
computer to recognize the handwritten number 9? Answer: Show the
computer a few million examples. Postal Service software has been doing
this now for years, and the result is illuminating: its handwriting
recognition system now exceeds human abilities to decipher messy scrawl.
What Google is proposing in Holland sounds an awful lot like something that I discussed with Jim Young -- the Discovery Informatics visionary at our local college -- over coffee late last year. I was doing recon for my current assignment -- helping my executive editor chart the future of online news -- and Jim was there representing his unique program.
I wanted to build high-powered information tools for Charleston.net users, but Jim insisted that I grasp the larger concept: Why not build tools that anticipated a user's needs by searching for deep patterns in an individual's site usage? The technology is available, he said, and the algorithims that could be adapted to such a project are already performing Non-Obvious Relationship Analysis for the government or rearranging grocery displays at your local Harris Teeter.
We brainstormed several products and features, including an "on-off" switch and a detailed history editor for protecting private information -- an absolutely essential control. There are all sorts of benefits to having a computer that anticipates my needs, but without human feedback on what it learns about me, there's a likelihood that it will someday behave like an overly precocious child. I want a computer that finds the best discussions of the latest Paul Graham essay for me, but I most definitely don't want a chipper cyber-agent that blurts out things like "I've found some great sub-continent group-sex porn you'll really like!" while my kids are sitting in the room.
If I've translated the Dutch properly, the Google News personal
agent beta sounds an awful lot like the dream prototype that Jim and I
sketched out that day at Kudu Coffee,
but with an added feature: the ability to store results, preferences,
favorites and tags in an online account that would be accessible via
your unique log-in. Also, rather than anticipating all kinds of user
needs, this product seems to focus on re-sorting the results of news
searches to better target an individual's demonstrated interests. We're
not witnessing the birth of Skynet here.
Yet we shouldn't mistake the historic significance of this moment. In the same way that cars expanded our ability to travel and Lotus 123 extended our ability to manage business information and e-mail expanded our ability to communicate and blogs expanded our ability to publish, so too will these tools expand our ability find the things that we want: stories, products, people, relationships, pets -- you name it. Within 10 years, you'll have these cyber agents performing all sorts of tasks on your behalf, and it's a safe bet that many of them will have "Google" in their name.
Will it be a good trade? Well, that depends on us. Can we avoid passionate arguments about romantic media ideas that have been obsolete for five years? Can we address the conundrums of electronic privacy in ways that account for technology? Can we write laws that allow individuals (and groups) to make the most of the information resources at hand while protecting our civil rights in the process?
Who knows? But the discussion is no longer just abstract talk in a quiet coffee house.
AUTHOR'S POSTSCRIPT: Here's my column, Digging for Truth in the Data Age, which was published on Aug. 30, 2004, in The Post and Courier. The column is not available online.
It's time to write a fond epitaph for the Information Age. Like it or not, we've entered the Data Age, the era in which we recognize that a glut of information doesn't make us smart, just like buying a dictionary doesn't make us Shakespeare.
Scientists Fred Holland and Paul Sandifer use the term in their work at the Hollings Marine Lab on James Island. They live in a world saturated by data — more pieces of information than the logical human mind could ever order, arrange or imagine.
Want to understand what is happening to the Lowcountry shrimp harvest? Have at it. Thanks to modern technology, we have easy access to everything from satellite images to historical weather logs to digitized shrimp gene sequences.
That's what the Information Age was supposed to do: give us the scattered puzzle pieces that fit together to form The Big Picture.
But here's a more appropriate analogy: Information Age technologies have proven instead to be wildly efficient at burying us in the pieces from millions of jigsaw puzzles, all mixed up and practically indistinguishable.
This data surplus is most obvious in the world of science. Holland, the director of the Hollings Marine Lab, frames things this way: "So we have all this data. The challenge is, how do we add value to it and make sense of it?"
A dramatic example of this process comes from University of South Carolina physics professor Dave Tedeschi. In 2003, Tedeschi and a group of colleagues announced that they had found evidence confirming the discovery of an exotic new subatomic particle — not in a lab somewhere, but hiding in old data they just happened to have lying around.
It's not like the physicists were slack the first time around: The data from their particle accelerator experiments is measured in terabytes, a million million binary bits of computer information. You need a machine to recognize a pattern against that much background noise — unless you're very, very intuitive.
And at least the scientists are professionally equipped to deal with the challenges of the Data Age. The rest of us are struggling.
Example: One explanation for the increasingly harsh tone this election year is the accelerating fragmentation of political media, a potential blessing but an enormous test of society's ability to process conflicting data. Hate President Bush? Google can provide in seconds any number of Web sites that will provide you with facts to support that feeling. Hate anybody who criticizes Bush? Ditto. Just turn on the radio.
Without functional institutions equipped to integrate the complex data of 21st-century life, citizens typically wind up just picking sides. Raw data becomes a cultural Rorschach test, and what we see is generally what we expected to find in the first place.
So we're not just disagreeing — we're speaking in different languages.
The promise of the Data Age is that the truth really is in there, somewhere. But our age has a curse, too: apophenia, the tendency to see patterns that may or may not exist. As science-fiction visionary William Gibson wrote in his blog earlier this year: "Want to see the Virgin Mary on a tortilla? Look long enough."
The model of the Information Age was the computer network, but the new model looks a lot more like an old analog radio dial, searching for a signal in a vast sea of static. The future belongs to those who prove most adept at finding it.
Or hiding it.
Dan,
I found this article thoughtful and exciting. However, I need a little clarification: how does this technology differ from, say, Amazon's very strange ability to predict what I want to buy OR what I might like even though I don't know it exists (and they are surprisingly right), before I know based on my purchases cross-referenced against similar consumers? Is this a different filtering system? (This is not a criticism, I'm genuinely interested).
jms
Posted by: jmsloop | Tuesday, July 18, 2006 at 23:05
Very similar. But suggesting something like related titles (Amazon) or movie preferences (Netflix) is operating in a very limited environment -- a few thousand options with relatively obvious correlations.
What this suggests is that by tracking your online choices, software can essentially intuit deeper patterns and offer non-obvious suggestions. A hypothetical: If I'm searching for used car deals and commenting regularly on global climate blogs, maybe there's a non-obvious correlation between thriftiness and environmetal concerns and some third interest. Like Amazon and Netflix, such a system could give me suggestions and learn from my responses.
Even something as simple as sorting Google News search results based on individual click patterns is potentially playing with far more subtle relationships than Amazon's recommendation system, which is still revolutionary in its own right. Because if it's reading my click paths in response its ranked results, it's learning things about me I may not even know.
Another thing: Such systems can make highly accurate predictions without identifying the values and variables that produce the accuracy. Bob Chapman has a neural network setup that accurately forecasts shrimp harvests when you pour a bunch of data into the front end. Bob didn't write the program to weight certain variables above others: He wrote the program so that the software can "learn" from previous results.
Anyway, similar principles, more complex applications.
Posted by: Daniel | Wednesday, July 19, 2006 at 01:30
This is absolutely fascinating. I remember my high school biology teacher warning us about the speed that information would double, back then I couldn't see how it could possibly be a problem. Now, all I have to do is need a specific fact and I have to wade through piles of sources.
On a side note, I met a very interesting man from NJ, the other day. While pumping gas he told me about plans for a plant that will "help" our local shrimping industry. It was quite possibly the most fascinating conversation I've had with a stranger in years.
Posted by: Heather | Wednesday, July 19, 2006 at 10:04
This is genomics...revisited. The bioinformatics field that has been essential to genomics - actually, all of the -omics (proteomics, metagenomics, metabonomics) - allows us to make sense of all of the sequence or peptide data we generate. The amount of data is mind-boggling. One microbial metagenomic library could generate the full genome worth of sequence data for 30 microorganisms...that's one sample. In science, we're already fully immersed in the data age - and let me add, it's pretty fun. But you have to let yourself take a breath and come up for air from time to time. There's no limit - we can fish for anything now. One evolving problem: a single person or lab can generate so much data that they can't process it all - making that data accessible to the public speeds up science, but then limits opportunities for the lab that generated the data to begin with. Science needs to change with respect to how we are measured - it won't only be "I got 'x' number of publications last year" but "I got 'x' number of publications last year which generated 'x' number of publications from other laboratories." There's reluctance - it's not so much than scientists don't want to share, but they just don't have confidence that the intrinsic value in the data they obtained - data for data's sake - will be valued.
Posted by: Pam | Wednesday, July 19, 2006 at 19:38
Yes: Genomics begat bioinformatics, which begat the more generic application as Discovery Informatics.
Posted by: Daniel | Thursday, July 20, 2006 at 09:50
Maybe it's because I'm not a tech-geek or perhaps it speaks to my deep seated distrust of large corporations and the government but I find this slightly creepy.
Posted by: Jean McGreggor | Friday, July 21, 2006 at 17:59