Documents as miniature websites?

Plus: Legislation needed to protect and promote FOSS

By Rick Jelliffe
July 22, 2009 | Comments: 5

In last year's blog Is ODF the new .RTF or the .DOC? I wrote:

perhaps the looming challenge for document standards is not in deciding or developing perfect formats, but in integrating the packaged world of documents with the fragmented world of web resources. Documents that can be websites.

My point was there was mainly that editors should be able to open full websites. But I'd like to tease out a couple more points.

I would say that the most likely future for documents and their formats, is that each document will start to look/act/be implemented more and more like a tiny, self-contained website.

If you look at the big trends in documents, it seems a plausible direction:

  • XML-in-ZIP packaging chunks the document into smaller resources which are then locatable by URL

  • The old SGML desire for a separation of concerns between data and processing is now the dominant paradigm for documents encourages this fragmentation

  • The old SGML one-bit-fat-tree approach has failed to withstand the need for more layering and fragmentation, and is being replaced (notably in DITA but also I see RDF fitting in here too) with a system of smaller chunks of data linked together by hierarchical/tabular maps: any container that contains a container is liable to be moved to a map chunk of some kind.

  • VBA's security issues are leading to its demise, certainly on the Mac platform, and its downplaying on Windows by MS. However, specific markup cannot hope to catch up, though it can certainly provide more of what scripts do (see HTML5.) So it seems there will still be the need for hygenic scripting, both at the GUI level and at the server level.

  • The possibility I raised two years ago seems to be coming closer to the mainstream: recently I read of a distro of Open Office which presents rendered PDF pages by default, and then switches to ODF if you want to edit the page: the ODF document is both PDF and ODF at the same time.

There is a flip-side: as documents become more like tiny websites, the desktop application will become more become more like a browser with an internal web-server. (If you like MVC, maybe you could say that the document is the model., the browser provides the view, and the server and scripts provide the controllers.)

That is the technique that Allette Systems (the Sydney company I work for most of the time) has been using for delivery of documents on CD-ROM: basically auto-running a Jetty server and implementing behind-the-scenes delivery of application functionality (such as indexing) using web technology (such as Lucene.) A few years ago at a presentation at the XML Europe conference in Amsterdam, I got half way through a presentation on the Topologi tool's architecture, which were internally organized as a client/server system with all functions accessed by URLs: I conked out before the halfway mark, due to my then unknown tumours and other nasties, but not before Oxygen's George Bina commented that they did something similar for their editor. So I don't think it is fanciful.

I think it should be obvious that, if we accept this as a feasible direction or a promising approach or even a vision that needs to be embraced, then it has a real impact on the features and design that we need from ODF and OOXML. And PDF, XPS, etc.

One of the comments I sent to the ODF Next Gen requirements gathering system could nudge ODF in this direction:

ODF needs to be audited against the requirements of CDRF; abstract features specified in CDRF that have no
equivalent in ODF should be added to ODF.

Now when I say browser, I don't necessarily mean an existing HTML browser. I would hope it would be a chance to leapfrog and escape the current crappy typesetting capabilities of web browsers: if you don't know what I mean by that, run the demo of Adobe's Text Layout Framework for a glimse at where we should be.

Adopting the document-as-website paradigm removes at a finger snap the distinction between the world of documents and the WWW. I think it also clarifies the position of the proprietary HTML wannabees (Flash, Silverlight, etc) to a large degree, by making them alternative or primary delivery mechanisms ( multi-valent documents anyone?)

Aside: Legislation necessary?

I would go further: I think governments should actively encourage this architecture (or ones like it) by legislating that all mission-critical information formats should be available RAND-z.

Now I already think that all market dominating interface technologies should be available as QA-ed, RAND-z standards: this is true of file formats (notably including the ISO media formats like the MPEGs), directory structures (such as FAT32 and VFAT, mainframe hardware and software interfaces, protocols and so on. It is just dumb to give monopoly rights such as patents on technologies long after they have paid of any investment: it circumvents the market and far from rewarding innovators, prevents innovations that build on top of the patented technology.

But I would go further than that: part of the call to have international standards for document formats is that, apart from publishing the information, it also flushes out the IP to a worthwhile degree. For-profit companies can prudently defer IP issues to a certain extent, because they can expect that they can negotiate a license if they start to make money. However, for open source developers, the possibility of lurking IP can be a killer.

So I believe that legislation is necessary to grant automatic RAND-z licenses, in particular to encourage FOSS developers, for the basic range of document and web technologies. That there are companies seeking "defensive patents" (patents the company itself considers junk, but is getting merely so that someone else does not get it and cause trouble or surprises down the line) should be an alarm bell to IP legislators.

Now that the FOSS industry is so large and so economically important, IP legislation needs to afford it as much protection and encouragement as the patent laws did a century ago to encourage industrial R&D. The whole area of documents and media formats for office systems is so fundamental to the modern economy, we cannot afford to discourage developers by lurking patents.

You might also be interested in:


So the holy scrollers are losing to the card sharks?

Sigh... so much damage for so little gain.

Len: I don't think that a move to having separate documents for containers and content necessarily will either cause GUIs to be more card-like, or documents fragments to be written in a more modular/atomistic/cardlike way. After all, SGML's general entity mechanism was very commonly used to make hub entities that included chapters or sections or topics, and SGML systems were rarely card-like. Not many documents are technical manuals or recipe books.

But websites have several characteristics: they use a distinctive family of technology, they have ideas like REST, they use URLs and resources are addressable, they separate processing/presentation from content formally but interwine them in the data, they are often 4-tier systems (client, renderer, business logic, datastore), the closer to the user the more that standards are involved, resources are self-identifying and loosely coupled, a simple protocol is used (HTTP), and so on.

Both desktop peer-to-peer and the Rich Client Platforms have failed to meet expectations. All the big players are wanting their strategic technologies to find mass love (Flash, Silverlight, Chrome?) But the WWW technologies chug along, hence the current push for WWW-plus technologies that extend the standard APIs or formats a little bit (aka "I'll only put it in a little bit.")

I'm also convinced that "the desktop application will become more like a browser with an internal web-server".

This internal web-server should also be a proxy to manage web exchanges : off-line mode, ... (anything a browser is not allowed to do!)

There shouldn't be as many browser+web-server as applications.

Having the same browser for anything is important and I agree that HTML was not designed for rich user interfaces...

I see your point more clearly. No, a package of pointers does not a deck make. OTOH, I'm not sure it ever made a difference considering the evolution of DIVs turned pages back into decks.

I'm not convinced the technology won't drive content to become more modular or card-like. Pans don't make cornbread square. The choice of pan does.

Sharks vs scrollers is really about writing style, not address-laden containers. Twitter and the character limit in Facebook are relegating blogs to second class status. The sharks are winning.

Hi, I was happy to read that some of the main principles and models that I have always been promoting in my Web courses, have been articulated and offered by experts.

My main model of progress in Web Science is biological systems. If we look at a tree, or our own body, they consist of smallest pieces and machines call them amino acids, proteins, cells, ... shaped (layered) into organisms, organs, ...

The trend of progress in Web science, and in the whole computing science (most of which, in my opinon, already, is Web science!) is breaking everything down into smaller and smaller pieces, separating data and processing as much as possible, and using the immensely powerful techniques of abstraction for layering and separating details that belong to various scales.

With apprecation for your good vision of the future.

Javad Abdollahi

News Topics

Recommended for You

Got a Question?