The New Newspaper Editor: Your Neighbors and Some Python Code

By Timothy M. O'Brien
December 9, 2008 | Comments: 14

In light of yesterday's news that the Tribune Company (Chicago Tribune, Baltimore Sun, Los Angeles Times) is about to file for bankruptcy. I'd like to take some time to introduce you to your new newspaper editor. He's fairly terse, somewhat cantankerous and unpredictable, but he has "his finger on the pulse" so to speak. Actually, he's not really a "he", your new editor a python function and collective sentiment. Here, meet your new "Arbiter", it's a python method:

On his own he's not much, but when this crowd of people starts voting on stories, he'll quickly serve up a mixture of catchy and relevant news:

collective-crowd.png

At the end of this process, we're going to end up with something like Reddit (shown below). At first, it'll be slow going and the news you read will be a product of the people that start to use the site. Don't be surprised if we miss a few stories that don't affect the readership. People tend to ignore stories about far flung disasters or general "downer" stories about famine and war. Expect a number of "outrage" stories mixed with some funny pictures of kittens with subtitles.

collective-reddit-sample.png

Wait... Is This News?

As the economics of news gathering and delivery continue to fail, we'll see more and more people starting to wonder out loud if services like Digg and Reddit are adequate replacements. In my interview with Rich Gordon, he alluded to the idea that "journalism" may extend to cover these aggregated, algorithmic attempts at collective editorial oversight. While I'm convinced that Reddit and Digg are interesting, I hesitate to call them "news" or even "journalism", and I wonder if the idea of collective sentiment is going to start wearing thin as people start to establish ever more isolated "preference clusters".

It isn't news, but it is certainly interesting.

Collaborative Trade-offs

Over the next few days, I'm going to write about some of the trade-offs in this new collective, collaborative approach to information gathering. Is it transparent? Who is leading this market? And, what are the trends and companies that have emerged as the new masters of collaborative filters? The answers to these questions are not obvious, and the "benefits" of collaborative filtering are still up for debate. Is this new collaborative approach to filtering really a replacement for news? Is collective intelligence an appropriate label for this technology? Is collective intelligence making us more or less "intelligent"? Are venues such as Reddit, Digg, and Google neutral actors providing platforms that aggregate votes without injecting bias or bestowing favors on specific participants?

Core to my theory is that what was once known as "The Web" - A collection of interconnected sites - is now being replaced by an ad-hoc collection of aggregate sentiments that defies instantaneous measurement. While a link's placement on the geography on the web still retains a level of importance, we're starting to see collective sentiment sites like Digg as playing a larger role in what gets noticed by the masses. Al Gore's "Information Superhighway" and the general idea of "The Web" as a replacement for traditional broadcast media implies something static. The maturation of Reddit and Digg suggest a more dynamic web - a Quantum Web, characterized by the superposition of aggregate identity.

(Wait, Quantum Web™ - The Web characterized by rapid, unpredictable shifts in sentiment and the superposition of the collective sentiment.)

News vs. Collective Intelligence: Issue #1 Transparency + Accountability

Issue #1: No accountability or transparency in most collective intelligence applications. You don't know why a collaborative filter such as Reddit or Digg is showing you a particular story. That statement might seem naive given that Reddit has gone and released source for the entire application, but take a look at the source for Reddit and you'll notice some pretty important gaps.

If you are interested in seeing some of the logic behind Reddit, check out recommendation.py and normalized_hot.py. It is very straightforward logic, essentially every subreddit maintains a list of links. If you read the FAQ, you'll note that they've omitted some of the "anti-cheating/spam" code:

collective-reddit-cheating-code.png

Reddit source files admintools.py and vote.py suggest a certain gap in the open source version of Reddit, and the parameters to the update_score method suggest that karma likely plays a part in the ranking of a story on Reddit. I'm going to go out on a limb and state that the "stuff" that makes Reddit work. The real value of Reddit is likely in the r2admin package. This anti-cheating/spam stuff is probably more accurately described as "the secret sauce that makes our rankings valuable". And, it works, Reddit is interesting, but I wonder what goes into this "anti-cheating" logic... without full transparency, how can we be sure that the real implementation of admintools.py doesn't include some magic switches that allow admin users to promote stories?

Why is this important? Well, if you use Reddit enough, you'll start to notice that karma plays some role in the voting/ranking mechanism. It seems that the more karma you have, the more likely it is that your submissions will get noticed. Proof that this is true or not isn't the issue, and I'm not proposing that Reddit is being unethical or breaking any rules. What I will point out is that the algorithm, the control of a collaborative filter is seldom transparent. Even the most extreme case, Reddit, which has released code to the outside world, still lacks accountability. As "The Web" is increasingly defined by these collaborative filters, we're going to stumble upon instances where the controlling corporation or entity is pressured to tweak the algorithm to favor advertising partners.

Without complete transparency, who is to say that this hasn't already happened? If it did would you care? Would you care if record companies paid for air time? How about game shows? What if that quiz show you've been watching is just a fake? Should we have the same expectation of neutrality for a collaborative filter?

Reddit and Digg are both businesses, and they are both useful and valuable sites which I happen to use every single day. Despite their usefulness, there is little guarantee that the operators of these sites are not rigging the ranking algorithms or finding ways to feature paid partners.

And, what if they were manipulated? Would it matter? Would you stop using them?

IMAGE CREDIT: Picture of Crowd (walking toward Obama's Grant Park Victory Rally) from VinceHuang's Photostream on Flickr. Original Photo. Photo license Creative Commons 2.0 Attribution.


You might also be interested in:

14 Comments

Thanks for this post. It's forward thinking and perceptive. But I think your analysis is best served with the understanding that sites such as Digg and Reddit are not news sources.

"As the economics of news gathering and delivery continue to fail, we'll see more and more people starting to wonder out loud if services like Digg and Reddit are adequate replacements."

That's like wondering if a newsstand is an adequate replacement for a newspaper. A newsstand carries what people want to read - at least that's the theory of market-driven inventory. Digg and Reddit are digital newsstands with extremely responsive market-driven inventory. Like a newsstand, they don't create content. They require publications that have functioning business models. Newsstands fail without access to adequate content producers.

Citizen journalists can provide fantastic eyewitness content, but they're not branded or often qualified to be definitive sources.

Without definitive sources, the community has little consensus for constructive conversation. And that's a problem in a democracy.

I'm a big fan of both Digg and Reddit, and other aggregators such as techmeme. And please take my opinion with a fat grain of salt, because I work at a newspaper (which BTW is hardly transparent in what it chooses to cover.)

But I think it's important to understand fundamentally that Digg and Reddit are not news sources; they're newsstands.

Nathan - I appreciate your perspective.

Will newspapers exist if we stop buying them and move to newsstands with info on demand? aka Digg, Reddit, GoogleReader, Twitscoop and the list goes on

And how will that impact the larger economy and most importantly the journalists?

Will more freelance and thus make it easier for JohnQPublic to be definitive?

Nathan has a point. A huge proportion of sentiment aggregation is of original content prepared by professional journalists.

Unfortunately, these professional journalists don't necessarily work for old school newspapers.

I don't have numbers for the industry as a whole, but the Techmeme leaderboard is a good source of stats for an industry vertical like tech: http://www.techmeme.com/lb

Old-school sources represent only a couple of percent of top stories (NYT and WSJ together (including Kara Swisher's Boomtown) add up to about the size of Techcrunch.) The top sources are mostly new media startups with lightweight business models. The NYT can't survive on a couple of million dollars of ad revenue, but to Techcrunch, and Mashable, and RWW, that's good money.

In the case of YouTube, there's a much more organic, broad-based crowdsourcing going on. But even there, I bet that we see new professional sources eventually coming to dominate.

For a better example of how computer programming can aid in the practice of original journalism (rather than just aggregating news), it's worth looking at Adrian Holovaty's work. See http://www.ojr.org/ojr/stories/060605niles/

@Nathan Halverson - See my post today about the Drudge Addiction. Drudge is very similar to Reddit in that the content isn't necessarily in the stories that it links to. The "content" in Drudge is in the headline, and the headlines tend to affect the way that someone reads a story.

All too often, Drudge would craft a nefarious, scandalous headline and link to original reporting. Even if the story linked to didn't make a clear connection to scandal, the mere presence of a headline like "Corrupt Blago under Fed Investigation" was enough to set the wheels in action. I'm starting to think that this "headline function" is more important that you would think. Reddit and Digg have essentially "usurped" the power of the headline from the traditional news sites they link to.

I would agree that Reddit and Digg are more like newstands than news sources. But, like terse Twitter messages that put a slant on a URL they contain, sites like Reddit can influence the news in unpredictable ways. Where a NYTimes or The Press Democrat would have a headline like "Blagojevich Arrested on Conspiracy to Solicit Bribery", Reddit might have five headlines along the lines of "Corrupt ***-hole Gov in IL Caught Swearing and Selling the Senate on the Phone. Vote Up if We should Show No Mercy!".

I'd say they are more like News "Criers" than news sites, and they put a particular spin on a story that affects how people read the content linked to. Reddit is like a newstand run by a crowd of disaffected 24 year old programmers who are all in love with Ron Paul.

@tim_oreilly, I was involved with a video startup a few years back. They were going to do promotional social media, the more I looked into that space, the more I became convinced that there is more professional advertising content in YouTube than most realize.

The "if items:" test in the function is not needed.

@Terry Jones, tell that to the Reddit team: code.reddit.com. Can always count on an audience of programmers to notice code errors. Thanks for the attention.

Newstands or water coolers?

lots to think about here. for one, what is a journalist in today's environment? and, just as important, who is a journalist? i agree that transparency is key, but so is credibility. reddit and digg can elevate stories and blog posts, but they can't always order according to credibility or, as flickr says, interesting-ness. anyone can be a blogger, but being a blogger does not automatically make one a journalist or a credible expert. perhaps what surprises me most is that, for an industry of reporters, there are so many who seem surprised by this turn of events. new media isn't even new anymore. the shake-out is just beginning...

(Wait, Quantum Web™ - The Web characterized by rapid, unpredictable shifts in sentiment and the superposition of the collective sentiment.)

I suspect that we will never be able to unravel the conundrum represented by the degree to which peoples' "sentiments" and / or collective intelligence have been (deeply) formed by the preceding ten or twenty years of "news", advertising and propaganda.

And I'll add to that the question of how much attention may or will be paid (on a widespread or collective basis) to deep and fundamental analysis of these issues. Of course some will pay attention, and the odd deep thinker will make some impact .. but "the crowd" will go merrily on its way, I suspect, surrounded and imaginationaly-ensnared by images, text, links and sentiment.

My newspaper has gone through 6 rounds of layoffs in two years. Most of my colleagues are crestfallen about the future of regional journalism. I am not. I think we’re nearing a mass resurgence in quality regional journalism – although it almost certainly won’t come from the newspaper biz model.

Regional journalism will find new roots with models such as Politco.com, tools such as everyblock.com, and, importantly, it will find ways to leverage a brand of trust across several revenue streams similar to what O’Reilly Media has accomplished.
Newspapers are bloated organizations with high costs of operations due in large part to delivery. Ward Bushee, editor of the SF Chronicle, said last week it costs $10 to deliver the Sunday paper to someone’s doorstep.

The Internet, micro print-on-demand, handhelds, and other technologies can significantly lower that cost, and also lower the cost of entry for competitors. Eventually the highest cost of entry will be establishing a brand of trust – which is no small task.

Beyond costs, new technologies will enable readers to digest more information in less time. From the consumer’s standpoint, that is the most important advancement. Readers are time sensitive, and they want to consume as much relevant information as possible in a given period of time. As a newspaper writer, I’m forced to write a story as if you knew almost nothing about the topic. I’ve got to put in all the background you probably already know. That means someone who has been following an ongoing story is forced to endure the slow pace of others – kind of like public education. Online, I can use tools that allow people to easily catch up and get the basics without subjecting every reader to that time delay. I also can let people with extreme interest in the story delve deeper, without subjecting everyone to the time-gobbling nuances.

Newspapers have long been compiled on the premise of the long-tail. You pay 50 cents for a paper with the understanding that out of the paper’s 100 articles, only a select few hold real value for you. The front page usually holds social value – especially at regional papers – because it facilitates water cooler talk and shared experience. Then somewhere inside are articles aligned with your interests.
All of these value propositions are better served online.

The problem with newspapers is years of stagnation, and lack of innovation. Newspaper publishers have largely promoted individuals based on their ability to maintain the status quo and not upset decades of 20 to 30 percent profit margins. Innovation was not only considered irrelevant, it was eschewed. The lack of responsive management is a primary factor for newspapers falling so hard so fast, IMHO.

But professional regional journalists aren’t going anywhere. They are too necessary. The only thing that changes is who they work for, and the medium of their reporting. And I, for one, am excited at that prospect.

"journalists aren’t going anywhere. They are too necessary. "

oh yeah? so necessary that no one wants to pay for new sites? who's going to pay?

The paradox states that regardless of online audience reach, general interest newspaper websites cannot generate an economically sustainable advertising revenue stream. Acceptance of the paradox implies that in order for news sites to become profitable they must change their content to something more valuable. Absent such a content change, newspapers must subsidize their unsustainable business with outside money. Absent either of these changes, newspapers will cease to exist as business entities.

cheers,
Robert
http://metaprinter.com/?p=1075

If we compare regional newspaper web sites with the likes of Techcrunch, Mashable, or RWW we will see that some regional web sites lack in functionality. Savvy sites are integrating (mashing up) with video comment providers like Seesmic and social network notification like Twitter and Facebook.

I live in Windsor, CA, USA so I am reflecting on the local situation. I've been talking to North Bay Business Journal for 3 years about adding RSS Feeds. I even built a screen scraping mashup that does it for them. ( http://pipes.yahoo.com/pipes/pipe.run?_id=QqMrssb23BGp6uN3Le2fWQ&_render=rss or http://feeds.feedburner.com/NorthBayBusinessJournal ) They told me that New York times, their parent company manages their technology for them. We'll, that why they are slipping. The parent company is not moving fast enough. People are spending time on news that is more convenient to access. I want to be updated via Twitter http://twitter.com/busjrnl .

Crowd aggregated news is not a silver bullet. I remember when Digg users posted and reposted the "secret" DRM Key in the infamous "Digg DRM Revolt" ( http://www.forbes.com/technology/2007/05/02/digital-rights-management-tech-cx_ag_0502digg.html ) . It was a funny thing to observe, but does not fit within the framework of news I wanted.

I think the future of regional news papers will be around top notch reporters who create a name for them self in the region. I delegate the task of being in the know in context of regional technology to http://twitter.com/paperwords .

Something that regional reporters can do well is put on regional events. To draw a parallel, look at O'Reilly. It's in the book publishing business and taps in to it's customers to put on huge tech events. Why do people go to tech events? To lean something new, but more importantly to have face to face social interaction. To create and maintain strong relationships with people who have similar interests. This is a revenue generating opportunity.

I understand the news paper company has to make enough money to pay it's top notch reporters well in order to keep them. To do this, the company should flex it's PR mussel and take on the role of an agent for the reporter. Have the reporters write books and speak at conferences. Again, learn from O'Reilly business model.

Reporters need to build their own portable social capital. I think Kevin Rose ( http://twitter.com/kevinrose ) of Digg is a great example. His social capital weighs in at 80,000+ followers on Twitter. I look at him as the editor/reporter for Digg.

===================================================
Arsen Yeremin arsen3d@gmail.com T: (707)703-1584
http://www.linkedin.com/in/arsenyeremin F: (270)682-3873
===================================================
Quote: A teacher is one who makes himself progressively unnecessary

Hello .
My name is KOBI Barak, I am an electronics engineer and pilot
A year and a half ago I developed an electronic unit that requires a unique compression..
I decided to try to develop compression algorithm by myself and, about a month ago I finish to develop a compression algorithm that able to compress in rate of thousands percent.
It can be used in audio lossless compression applications, video and archiving..
My algorithm is able to shrink every file to the size of 4K (in price of time).
End working. it hear a little imaginary, but it is real
I will be happy to represent what I do in the subject and I know that you can use the algorithm to do you're broadcast more efficient.
Sincerely yours.
kobi Barak.
Jbarak@actcom.co.il.

I did like to take some time to introduce you to your new newspaper editor. He's fairly terse, somewhat cantankerous and unpredictable, but he has "his finger on the pulse" so to speak. Actually, he's not really a "he", your new editor a python function and collective sentiment, Thanks.


walmart checks

News Topics

Recommended for You

Got a Question?