<body><script type="text/javascript"> function setAttributeOnload(object, attribute, val) { if(window.addEventListener) { window.addEventListener("load", function(){ object[attribute] = val; }, false); } else { window.attachEvent('onload', function(){ object[attribute] = val; }); } } </script> <iframe src="http://www.blogger.com/navbar.g?targetBlogID=36514186&amp;blogName=10%2C000+Words+%3A%3A+where+journalism+and+...&amp;publishMode=PUBLISH_MODE_FTP&amp;navbarType=SILVER&amp;layoutType=CLASSIC&amp;searchRoot=http%3A%2F%2Fblogsearch.google.com%2F&amp;blogLocale=en_US&amp;homepageUrl=http%3A%2F%2Fwww.10000words.net%2F" marginwidth="0" marginheight="0" scrolling="no" frameborder="0" height="30px" width="100%" id="navbar-iframe" allowtransparency="true" title="Blogger Navigation and Search"></iframe> <div></div>
HOME | ABOUT | CONTACT | TWITTER

Data centers, APIs and what they mean to journalism

Tuesday, September 08, 2009

For journalists, creating a database means sifting through tons of raw, often unorganized data, presenting it in an indexable way and sometimes finding the stories buried deep in the data. This is part of the long tradition of journalism: synthesizing information before it is presented to the public. The latest trend of posting raw data to the web means the public can examine news and statistics without filter and find their own stories without having a group of journalists figuring it out for them.

The online presentation of raw data has taken many forms. Mainstream news organizations like the New York Times, the Guardian and Advertising Age have created online data centers where large collections of numbers and statistics are available to the public to peruse at their leisure or, better yet, to mashup into their own databases and visualizations.



This is, of course, a part of a larger trend on the web of making data available for anyone who wants to view or use it. Data.gov, a project of the US government, houses data on everything from tax information to natural disaster statistics and makes the information available in a various digital formats including CSV and XML. The recently announced DataSF, a collection of data published by the city and county of San Francisco, California, has more than 100 datasets available for public use — everything from bridge locations and bodies of water to crime statistics and public works projects.



Posting raw data has its advantages over traditional journalism in that it gets the public involved and uncovers stories that even a team of journalists could not discover themselves. Earlier this year, the Guardian posted more than 450,000 pages of data on UK government officials' expenses and asked the public for help in finding interesting tidbits or information. Based on the public's findings, the staff created a series of stories that delineated outlandish expenses like £2000 to dredge a moat at a private estate.

The datasets presented by news organizations are often publicly available numbers and statistics that are can be found on- or offline. The difference is the data has been cleaned up and made available in a digital format that takes less time to sift through and understand. Datasets aren't limited to third party information either: NPR recently made more than 80,000 of its transcripts available via its recently announced Transcript API. The API allows developers to mashup the transcripts in ways that are yet to be seen.

But is posting raw data journalism? Where is the editing, the reporting, and all the values that are the bedrock of newsrooms everywhere? The core of a journalist's job is to spread the news and to inform the public. While posting raw data may not involve some of the traditional values of journalism, it is still sharing the news and telling the story. Even better, this system for sharing content lets the public decide for themselves what is news without the filter of a news outlet to decide for them. This process encapsulates the core values of online journalism: collaboration, openness and stepping outside of traditional means of delivering the news.

Labels:


Share This  Bookmark and Share         TwitThis      Subscribe Subscribe to 10,000 Words




2 Comments



Anonymous Martin Stabe Says:    
Great post -- but you give the Guardian too much credit on the MPs expenses story.

The moat story -- and indeed most of the key stories in that scandal -- had been broken by the Telegraph, which had obtained the raw expenses files before their official (and heavily redacted) release that the Guardian project relied on.

The Telegraph's reporting of the leaked dataset was a more traditional, behind-closed-doors operation, involving a large team of journalists (very successfully) trawling a lot of data for stories.

The two papers' very different approaches to dealing with raw data is one of the interesting things that came out of that story.

September 9, 2009 12:21 AM


Blogger Adam Says:    
One of the big trends that is coming from the digital revolution is collaborative journalism. The days of "fortress journalism" where we get the information, sort it and proclaim it to the public from behind our walls are over.

Thats why its great to see people like the Guardian and NYT opening their data files. And if it means more stories come out of data mining all the better - over the last 20 years more stories have been ripped from press releases because journos don't have the time to mine data themselves.

September 9, 2009 1:57 AM


Add Your Comment





10,000 Words
10,000 Words © Copyright 2007-2009. Subscribe via RSS. Email: info@10000words.net