The Illinois River is a slow moving, meandering waterway that originates out of Lake Michigan, flows beneath downtown Chicago, then cuts through the rich Illinois topsoil as it wends its way to Peoria (giving the area its distinctive river bluffs formation) then through the middle of the state until it finally meets the Mississippi river at Alton, Illinois, on the Missouri border. Given where it begins and ends, the Illinois sees a lot of river traffic, from barges laden with grain to shipping containers to steam-powered paddle-wheel boats that evoke the memories of Mark Twain.
When I went to high school in Peoria, my mother worked for the Army Corps of Engineers, where one of her responsibilities (though far from the only one) was to maintain the "river watch". Three times a day she'd go down to the river's edge, lower what amounted to a large dipstick into the river, then recorded such information as the river height, the degree of turbidity, the depth of the silt and so forth. This information would be recorded and sent to a central dispatcher, who would combine it with similar information from up and down the river, and would use it to create a real time "map" of the state of the river.
This map had a lot of consumers. River barge masters, of course, needed to know which parts of the river were too low to navigate so that they could keep from beaching on a sand bar - or even to stay tied down in dock until the river levels were higher. The Department of Agriculture needed to have this information to measure topsoil erosion (which not surprisingly ended up as silt in the river) so they could warn farmers who were in most danger to put up wind breaks or similar erosion prevention measures. The National Weather Service needed this information during flood times to issue alerts, FEMA needed it to help coordinate disaster relief when those floods occurred, and insurance companies would ultimately use this information in myriad ways.
Those who work in IT have tended over the last couple of decades to focus on private sector data - business information, marketing of users for social networking sites, stock and bond trading information and so forth. This information is obviously useful, yet as the shift is made from stand-alone applications to Internet services, one of the increasingly pressing questions being asked by IT departments is "what kind of data can be monetized?"
The reality is that, for all of the data that the private sector produces, outside of a fairly limited scope that data is actually fairly useless for monetization purposes. There are only so many ways you can slice online demographic data, and there's a certain irony in that, despite concerns for privacy, the actual value of individual marketing data is close to zero, simply because it is measured by so many different competing agents.
On the other hand, one of the critical roles of any government is to monitor the state of its region of jurisdiction (and the regions it interacts with). This isn't usually glamorous work. The cost of measuring the level of a river is generally higher than any immediate gain to be made by monetizing that data, so it's not something that most private companies will rush to do, though in the aggregate that data has immense value to a wide number of consumers.
And therein lies an important point - the matrix of engineers, researchers, auditors, court recorders, census takers, and so forth form the nervous system of the body politic. Together, they provide a huge amount of information about the state of the system, which ultimately is useful for everyone, and they do so not because of any immediate short term profit motive, but rather for insuring the much longer term health and welfare of the people who live in the country/state/county, etc.
Yet for a variety of reasons, a disconcertingly large amount of that information is inaccessible to the Internet. The last decade has been a real struggle for the people who work in this network, especially when ideological concerns trumped the need for gathering information, leading to severe underfunding of many of these agencies. A big push to privatize government functions that started in the 1970s and early 1980s often resulted in companies making what had been public information proprietary, which had the effect of frequently limiting access to this information only to the most well heeled.
Paradoxically, this also often proved fairly disastrous to both the companies and the organization who authorized the privatization (usually to save costs), as it gave these companies local monopoly power which was frequently abused, resulted in reduced service as the privatizing companies realized just how diffuse and low-profit the information gathering was (especially a concern when you need to show quarterly profits), and in many cases ended with angry citizens filing lawsuits in order to recover and resuscitate damaged systems.
Beyond that, this all occurred against the backdrop of rapid technological change. The amount of information that a country like the United States or Canada produces is staggering, but in all too many cases, that information ends up in databases that become increasingly antiquated and typically existed outside of any real network. Limited manpower and low IT budgets limit the ability to make that data available, even when putative political mandates existed to do so, in great part because the databases were simply not set up to provide that data except in a very limited way.
What Obama's Data.gov initiative will do is both simple in concept and stunning in implication. It is data housekeeping. It is a set of requirements, established by the Fed CIO Vivek Kundra, that will make it possible to establish a web services infrastructure to expose at least partial representations of these databases as streams of XML. This isn't just about making the political process more transparent - it is about making the entire information gathering apparatus of the United States more transparent.
It's my hope (and looking at the initial reports I'm feeling optimistic about this) that most of these interfaces will be RESTful in nature - exposed via HTTP GET, POST, PUT and DELETE operations - in essence, making such information directly available as XML and JSON content, as appropriate. For now, the GET operations will likely predominate - providing the ability to see this information in the first place, perhaps to query it to better filter the content to just that which is appropriate. Even just with GET, this will "light up" the government's own noosphere to the Internet, making it possible to create "mashups" that correlate various data feeds into various applications.
For instance, its not hard to take the "Illinois River data", retrieved perhaps as an geo-encoded Atom feed, and transform that into a Google Earth KML file to be able to "tour" potential trouble points for barge navigators. Ag Dept. erosion data as XML can be transformed in order to create overlays showing erosion patterns, providing a clear visual to farmers and urban planners about where potential problems are (such as where mudslides or earth settling could have significant problems). Land Use permits filed with city or county Zoning authorities could also be tied in to show where wetland problems may exist or where such actions could increase erosion - not just for those officials who are evaluating these, but for people in the region who may be affected by such actions.
Yet for all the value that GET operations have here, the ability to POST to those URLs in order to create new records will be even more important. While it is possible to lay out a network of sensors to perform certain measurements, its worth noting that the vast majority of information that the government collects was recorded by people. Here in Victoria, BC, there are two very popular programs - a count of flowers by type in the region and a count of birds by type.
The bird count in particular should be mentioned because, beyond making people more aware of the avian diversity in the region, it provides a critical snapshot of the biological system of the area, information that can be used to study wildlife trends and highlight potential problems. Suppose that you enabled an army of such volunteers to do this count while working with GPS enabled phones connected to a government web service, so that people could record the exact position, species, number, actions, and habitat of each of those sightings. Made freely available as XML data, such a database could do everything from show where recovery programs for endangered species are or aren't working to set up an early warning system for avian flu.
In many respects, this is the real power of participatory government. An initiative such as Data.gov harnesses the power of people to provide both the data and the means for using this data. It provides information that companies can use to build new businesses on solving real problems that can be solved with private sector ingenuity, rather than simply being gatekeepers that make money by keeping such data scarce and expensive. It enables a nuanced view of the world that's informed by context, making it easier to avoid building up unbalanced situations that can cripple an economy or cause a disaster, and it can help in the allocation of scarce resources at the planning stages, rather than when the project's already well underway and changes are costly.
As IT professionals (especially for those of us in the XML community) this should also be seen as a call to arms. This will not be an easy process to achieve - it requires hard won expertise and a commitment to both open data and open standards, and at least in the short term should be seen not as a chance to line pockets but rather as a once in a lifetime opportunity to fundamentally shape the world which our children and grandchildren will grow up in for the better.