In light of yesterday's news that the Tribune Company (Chicago Tribune, Baltimore Sun, Los Angeles Times) is about to file for bankruptcy. I'd like to take some time to introduce you to your new newspaper editor. He's fairly terse, somewhat cantankerous and unpredictable, but he has "his finger on the pulse" so to speak. Actually, he's not really a "he", your new editor a python function and collective sentiment. Here, meet your new "Arbiter", it's a python method:
On his own he's not much, but when this crowd of people starts voting on stories, he'll quickly serve up a mixture of catchy and relevant news:
At the end of this process, we're going to end up with something like Reddit (shown below). At first, it'll be slow going and the news you read will be a product of the people that start to use the site. Don't be surprised if we miss a few stories that don't affect the readership. People tend to ignore stories about far flung disasters or general "downer" stories about famine and war. Expect a number of "outrage" stories mixed with some funny pictures of kittens with subtitles.
Wait... Is This News?
As the economics of news gathering and delivery continue to fail, we'll see more and more people starting to wonder out loud if services like Digg and Reddit are adequate replacements. In my interview with Rich Gordon, he alluded to the idea that "journalism" may extend to cover these aggregated, algorithmic attempts at collective editorial oversight. While I'm convinced that Reddit and Digg are interesting, I hesitate to call them "news" or even "journalism", and I wonder if the idea of collective sentiment is going to start wearing thin as people start to establish ever more isolated "preference clusters".
It isn't news, but it is certainly interesting.
Over the next few days, I'm going to write about some of the trade-offs in this new collective, collaborative approach to information gathering. Is it transparent? Who is leading this market? And, what are the trends and companies that have emerged as the new masters of collaborative filters? The answers to these questions are not obvious, and the "benefits" of collaborative filtering are still up for debate. Is this new collaborative approach to filtering really a replacement for news? Is collective intelligence an appropriate label for this technology? Is collective intelligence making us more or less "intelligent"? Are venues such as Reddit, Digg, and Google neutral actors providing platforms that aggregate votes without injecting bias or bestowing favors on specific participants?
Core to my theory is that what was once known as "The Web" - A collection of interconnected sites - is now being replaced by an ad-hoc collection of aggregate sentiments that defies instantaneous measurement. While a link's placement on the geography on the web still retains a level of importance, we're starting to see collective sentiment sites like Digg as playing a larger role in what gets noticed by the masses. Al Gore's "Information Superhighway" and the general idea of "The Web" as a replacement for traditional broadcast media implies something static. The maturation of Reddit and Digg suggest a more dynamic web - a Quantum Web, characterized by the superposition of aggregate identity.
(Wait, Quantum Web™ - The Web characterized by rapid, unpredictable shifts in sentiment and the superposition of the collective sentiment.)
News vs. Collective Intelligence: Issue #1 Transparency + Accountability
Issue #1: No accountability or transparency in most collective intelligence applications. You don't know why a collaborative filter such as Reddit or Digg is showing you a particular story. That statement might seem naive given that Reddit has gone and released source for the entire application, but take a look at the source for Reddit and you'll notice some pretty important gaps.
If you are interested in seeing some of the logic behind Reddit, check out recommendation.py and normalized_hot.py. It is very straightforward logic, essentially every subreddit maintains a list of links. If you read the FAQ, you'll note that they've omitted some of the "anti-cheating/spam" code:
Reddit source files admintools.py and vote.py suggest a certain gap in the open source version of Reddit, and the parameters to the update_score method suggest that karma likely plays a part in the ranking of a story on Reddit. I'm going to go out on a limb and state that the "stuff" that makes Reddit work. The real value of Reddit is likely in the r2admin package. This anti-cheating/spam stuff is probably more accurately described as "the secret sauce that makes our rankings valuable". And, it works, Reddit is interesting, but I wonder what goes into this "anti-cheating" logic... without full transparency, how can we be sure that the real implementation of admintools.py doesn't include some magic switches that allow admin users to promote stories?
Why is this important? Well, if you use Reddit enough, you'll start to notice that karma plays some role in the voting/ranking mechanism. It seems that the more karma you have, the more likely it is that your submissions will get noticed. Proof that this is true or not isn't the issue, and I'm not proposing that Reddit is being unethical or breaking any rules. What I will point out is that the algorithm, the control of a collaborative filter is seldom transparent. Even the most extreme case, Reddit, which has released code to the outside world, still lacks accountability. As "The Web" is increasingly defined by these collaborative filters, we're going to stumble upon instances where the controlling corporation or entity is pressured to tweak the algorithm to favor advertising partners.
Without complete transparency, who is to say that this hasn't already happened? If it did would you care? Would you care if record companies paid for air time? How about game shows? What if that quiz show you've been watching is just a fake? Should we have the same expectation of neutrality for a collaborative filter?
Reddit and Digg are both businesses, and they are both useful and valuable sites which I happen to use every single day. Despite their usefulness, there is little guarantee that the operators of these sites are not rigging the ranking algorithms or finding ways to feature paid partners.
And, what if they were manipulated? Would it matter? Would you stop using them?