Where are the XML Editors?

By Eric Larson
November 25, 2008 | Comments: 20

Recently at my job I've had to spend a good portion of time working with a set of languages we created from scratch. It is interesting to think in terms of lexers and parsers, but I've hit the point where I wish I could just use XML and XSLT. Having a tree is powerful, especially when you have a language dedicated to recursively working within that tree.

The whole reason for writing the parser from scratch was for usability. The people that actually had to use the language have been better served with a more forgiving and simplified text format, so that's what was created. It is more difficult to program for, but our users have had a great deal of success, so the extra work has been worth it.

I was told that in the beginning XML was considered, but without an editor to ease the authoring pain, it wasn't practical for the users. It is interesting that the lack of a good editor was the stumbling block. From what understand, before XML was so widely adopted, advocates would suggest that editors will help ease the pain of getting all those angle brackets to disk. I could be getting my history incorrectly, but it doesn't refute the fact that the XML editor world is less than ideal.

There are some truly great XML editors out there. For example, if my hands weren't crippled with an addiction to Emacs, oXygen would be my tool of choice, hands down. It is a fantastic XML programmers editor. As good as oXygen is, it is not made for non-technical users. There have been attempts, especially in the land of publishing, that have proven to be fruitful. But, a generic XML editor that works reasonably well for non-technical users seems to be a myth.

The lack of options doesn' really surprise me because the infinite possibilities for XML seems prohibitive when creating an editor. The essence of an editor would be to maintain some constraints while keeping the markup behind the scenes. This usually ends up involving XML Schema or some other schema like language that is time consuming to keep updated. In a nutshell, it is a lot of work that makes you wonder why you didn't just use a JavaScript rich text editor or Word along side some extra tools.

One avenue that could be productive is a "XML by example" editor. Instead of requiring a complex schema, the editor would use a simple XML file as an example. Programmers are pretty good at writing software that creates XML, so this seems like a pretty reasonable requirement. XSLT or something simpler could be used to create a save or validation hook. This is where you could verify content types such as numbers or dates as well as links. Any system accepting the user created XML should already be prepared for malformed data in one way or another and writing an all encompassing output file makes for a very good test.

The larger question is whether or not it is too late for simple end user XML editors. Interest in XML has definitely been waning recently with technology like JSON becoming more popular. With that in mind, XML's strengths continue to be valuable, especially in traditional content management contexts. It is not surprising that traditional content is where a simple, generic, end user XML editor would be of the most value.

What do you think? Would a simple generic XML editor for end users be a valuable tool? What would it look like?

Update!

There have been some great comments, so thought I'd take a minute to make a few clarifications and respond.

First off, what I meant by a "generic editor" is really much closer to the description Evan offered below in the comments. I was thinking in terms of authoring content with XML as that is one area where I strongly believe XML excels. Paul Terrey mentions, Arbortext, XMetaL, Serna, etc., which were in fact the impetus for writing this article. George Bina (from oXygen) also points out there is oXygen XML Author, which seems to fall in a similar category as that of Arbortext or XMetaL.

Laurens van den Oever makes the point that a XML authoring editor should focus on providing valid output. Evan Lenz also suggests the importance of using XML Schema (or a DTD) when creating an editor. To some extent I agree, but the problem is that many groups that would benefit from using an XML based workflow, do not have the technical resources (read programmer) to setup a complex editor. Sure, they can hire a consultant, but that can be difficult and expensive to support.

One idea might be to allow traditional non-technical people make changes to their schema and add new formats. In this case, the solution is not only an authoring environment, but a schema building tool. Dimitre Novatchev mentions using Visual Studio for pulling a XSD from a XML file, which is a good low level strategy, but I'd imagine something more like Access would be required to help bridge the gap between low level details and end user usability. No matter what tools might be helpful, it is clear the problem is challenging.

Lastly, one reason I've been interested in a good XML authoring environment is for blogging. More specifically, blogging within the context of AtomPub. Having the ability to add and remove features via Atom Extensions seems like the perfect fit for a XML aware authoring tool. Likewise, taking oXygen XML Author as example, providing an Atom Framework that supports something like DITA or XHTML would allow for pretty powerful CMS integration. This sort of idea is nothing new. It is just finally much closer to reality.


You might also be interested in:

20 Comments

For free and robust, I prefer the Eclispe Web Tools standard tools editors. Provides most of the basics you need in the base package, as well as a couple of new Incubator projects that are relevant. XSL Tools in the web tools incubator provides XSLT editing and debugging support. And the Visual Editor for XML supplies the ability to edit DocBook, DITA, and other formats using CSS stylesheets for a WYSIWYG type editing environment.

I blogged a while ago on a comparison between Oxygen XML and Web Tools, and the gap is narrowing between the two. The later being a free alternative that provides the basics that somebody needs.

The essence of an editor would be to maintain some constraints while keeping the markup behind the scenes.

Well, Adobe FrameMaker is not exactly the typical XML editor for the XML specialist and programmer. But it is a great xml authoring solution. Actually in structured mode it does what you ask for: Hide the markup, "guide" the author and validate the structure behind the scenes. The "Structure view" panel offers a representation of the tree for easy navigation, drag and drop operations in the tree etc.
Out of the box it comes with a DITA implementation so that you can directly open DITA XML, author in a WYSIWG environment and even create great PDFs very easy.

Actually, FrameMaker is a much better example of why a generic XML editor for end users is so difficult. Structured Frame requires a schema to be defined in a non-standard (albeit close to DTD) format. Also, the format is rather rigid and become rather unwieldy to manage on an organizational level. The DITA support, for example, is actually rather compelling because it does show the promise of a powerful editor working with an excellent standard.

My point was not so much that there are not any editors, but rather the ones available require too much work. In the editors I have seen, it is non-trivial to setup an editor to support arbitrary XML formats.

As a vi user (ahem), as well as an Oxygen user, I tend to find WYSIWYG-ish type editors for XML difficult to really get. After all, they're providing a false promise. XML isn't about presentation, it's about content.

I think, though, that xfy and XMLMind provide some good functionality along those lines, for those so inclined.

The Visual studio (2008 or 2005) XML Editor is a great tool. It can generate an xml schema (xsd) from a given xml file. It allows an xsd to be associated with an xml file and in this case flags dynamically any typing validity errors. It also produces dynamic intellisense prompts for valid elements, attributes and text nodes in the current context. The VS2008 XML Editor even displays descriptions of the currently selected prompted element/attribute, if these have an xs:annotation in the schema.

This said, in my everyday xml/xslt programming tasks I am still using the old XSelerator, which is now available for free download from sourceforge.

I see one problem with this article -- there is no definition of what a "generic XML editor" might be. So, do we know what we are talking about?

Dimitre Novatchev

I'm not a XML developer but I have to work with XML message schemas and XML instance files based on standards such as OAGIS. I have used a tool called SpecBuilder from Edifecs and it works very well from business analyst standpoint without being an XML developer.

Recently I have started commenting here although I am not the one to surprise other readers with expertise in XML, but as an archaeolist being interested in the technology and trying to apply it, a comment from my side might be interesting for you to understand the world of the non-nerds.

The following set of tools was always a dream: (1) A simple, flexible data format to store text and data -- if possible in several languages at the same time; (2) An editor that is simple and offers nothing more than needed, that hides the brackets but shows the trees or separate panels with part of the text that is selected (e.g. via XSLT) -- like metadata, notes, translations etc.; (3) The option to combine and extract data, bibliographies and text via XSLT preferably in the same editor and published with tools in this editor to several formats.

I do not think that JSON can replace XML in this respect -- although there is a lot of critique recently, XML seems the best solution.

So I have checked the various well known XML formats like DocBook and TEI in order to use one of them for my purposes (archaeological text and data in several languages and writing systems), but each format had drawbacks -- not simple enough, not flexible enough, either good for text or for data storage, focussed on other disciplines. So I decided to try something new, and created an extremely simple format, for which I also created a markdown language which consists of 12 rules that fit on an A4 sheet of paper. It works for me already, but needs more polish, more stylesheets etc., before it could go public.

The ideal working environment would be a simple editor that focusses on this format and the needs of researchers in the humanities, but for the time being and while still working on the basics, I do use Oxygen. It is much to powerful for someone who still has to learn XML etc., but the developers are really nice. (So, if the editor is complicated, a nice group of developers is the best workaround and makes it a simle editor). In Oxygen it is possible to create frameworks, where people can write in an "Author mode" that hides the brackets in several ways. It is nice to have, but there is always a situation where one needs to go back to the original text; this is easy in this environment but needs basic XML knowledge and some knowledge of the more sophisticated features of the editor.

As for the XML editor for the rest of us, one should ask whether it is an editor to work with just one format or with any format. It may be easier to create an editor for one format, but it should be possible to create an interface to assemble CSS or XSL stylesheets for existing formats, to create XQueries in a dialog box and to collect results in separate frames of a window. There is a lot one could do. Since there is nothing really as simple as that, this is a task I am, after all, about to try. I thought of writing such an editor in Python, focussed on the simple format I mentioned above. I am not a programmer, I do not know Python yet, but I will try. If there is anybody who could give me advice, wants to tell me how silly I am, please let me know!

I hope these ideas throw another light on your interesting essay.

Hi Eric,

oXygen XML Editor is indeed an application for programmers. However, we made available also a product called oXygen XML Author that is targeted to non technical users and that offers visual editing for XML based on CSS, keeps also the text based editing and the support for transformations but removes all the development parts (debuggers, schema editors, etc).
For a new language you need to have also a CSS but we plan to remove this requirement by generating a CSS automatically from the associated schema or from the document structure if no schema is specified.

Regards,
George

We sell a validating XML editor specifically for non-technical people: Xopus. Xopus is browser based and supports XML Schema and XSLT. The product has been on the market for 7 years now, but the last two year we're getting some real traction: more people are asking the same question.

As you write, the main problem with XML for non-technical people is validation. Without validation, most of the benefits of XML are lost, with validation the tool may be too hard to use. We solved this with prevalidation: Xopus will hide or disable all actions that would make the document invalid. The result is an interface that guides the author through domain specific structures instead of an interface that punishes the author for the mistakes (s)he might make. Thanks to prevalidation non-technical people can create highly structured documents.

I am surprised no one talks about the two major worprocessing-oriented XML editors, i.e.: XMetaL and Arbortext.

Granted, you need some work before you can edit with a DTD or a schema (like xfy, XMLMind or oXygen XML Author) but they are widely used in industry, for very complex DTDs (like S1000D), sometimes for people with few XML expertise.

You also have Serna (that may be getting better, I have to check), the new oXygen XML author (like said above, good work pals!), etc. They all use CSS for appearance (except Arbortext actually), and can be extended with some development, etc.

As an documentary XML expert, what I would be looking for would be an XML editor that would completely mask the notion of XML, even if it restricts the various subtleties of XML (who uses entities these days?) and needs lots of parametrization. It would allow to broaden the spectrum of XML users and offer much more accessible interfaces.

When the cursor is somewhere in the document, the user should not have to work out what kinds of "cursor jockeying" and structural "glue" are needed to insert a desired element. That turns it into an MS-Word-for-gearheads. To rephrase: Block-type elements should always be as "insertable" as inline-type elements. That's how MS word succeeds... even tho complex nested structures go fubar pretty quickly.

I think part of the confusion around this issue is the ambiguity of the word "editor". For programmers (who are often involved in the selection if not creation of tools for end users), the word "editor" means something like Visual Studio, Eclipse, or even oXygen. But these are programmer's tools, not tools for "regular people" (excepting certain features like "oXygen XML Author").

I tend to use the term "XML authoring environment" rather than "XML editor" to describe what's needed for non-technical content authors: tools like Arbortext Editor, XMetaL, and Serna, and browser-based tools like Xopus, Ephox's EditLive! for XML, Ektron's eWebEditPro+XML, and Altova's Authentic. Another great one for certain classes of XML is Microsoft's InfoPath. So there's a fair wealth of offerings, each having their own limitations and approaches to configuration. (I'm still looking for the perfect one that's equally adept at--and easy to configure for--both forms-based XML editing and document-oriented XML authoring.)

All of these are "generic" in that they're not tied to a specific DTD or schema. But they're not generic in the sense that they're designed for authoring XML without respect to a schema. Schemas are great for this. Validation is boring compared to the utter friendliness you can squeeze out of a schema for XML authoring purposes. It's the developer's job to configure the authoring tool, and that's just as it should be. I don't see how this is too much work. If you only have one schema, then you can focus on developing the perfect solution (using a tool like any of the ones listed above) for users. If you have lots of schemas, then nothing beats getting familiar with an editor and using it over and over again. Each time you reconfigure it, it will be less work. Finally, if you're using DocBook or DITA or another industry-standard schema, a lot of the work may already be done for you.

Declarative configuration is the gold standard, as far as I'm concerned. But even if you end up having to do a little bit of scripting, it's nothing compared to the work you'd have to do in creating an editor from scratch using a non-standard format to boot. My two cents. :-)

If you are emacs lover, than you have the best open source XML editor available -- James Clark's nxml-mode is just the best IMHO. (I haven't tried Oxygen, because I don't want to pay for software, if I don't have to).

Why, again, are you using generic XML?

If everyone in the workflow is a human editing XML, then why not just use ASCII text?

Non-technical people should never be editing schemas, it should be done once when the data format is defined. If a schema is to be more than a local validation step, then it needs to be a standard. I.e. who is going to read in the data that you are producing? They better have access to the same schema that you do. Back to original question: if its another human reading it in, then generic XML is probably the wrong choice.

Also, how do you plan on delivering the XML documents that you produce? Emailing them? No need for generic XML. Hosting them on a web server? Why not use a standard XML spec, e.g. XHTML, RSS, Atom? What other way would you deliver them?


If you think that Non-technical users are afraid of brackets, specific character annotations, etc. then obviously you've never been to the biggest source of non-technical created documents in a specific data schema: wikipedia.

Non-technical users need an editor for an application of XML: XHTML, DocBook or whatever. These exist, and they don't have to be put into a single program. Other users may well need to edit other kinds of XML documents, including XML schema and XSLT, but these users have moved to a technical role. They've got so much to learn that mastering a technician's editor is the least of their problems.

Eric,

Good article. I'd echo Evan's comments on this - you need to differentiate here between IDEs (such as Oxygen) and specialized editors. I'd also raise another concept here - one way of thinking about XForms is that it is in fact a way to create specialized "editors" that can be used to create complex XML documents without the need to write those documents explicitly.

This doesn't necessarily work that well with "document-like" XML - though if you bind XML nodes that also contain XHTML content you can use a WYSIWYG type local editor (such as FCKEditor) for creating this content and incorporating it into the larger XML document stream. This also gives you the ability to validate content, and the XForms 1.1 spec is sufficiently expanded to solve a number of the really critical problems associated with the older 1.0 spec.

I realize that this isn't the self-contained IDE approach that you were discussing earlier, though conceptually it's not that far from it ... I think we've moved to a stage where the cost of building specialized standalone IDEs for various XML document types is simply not worth the effort involved for most schemas. Thus an XForms like approach may be more typical of such IDEs into the future.

@ Kurt

It is funny you mention XForms because that was the other option I considered. My basic use case was more along the lines of a person writing content (as they would in word processor) in a way that it would be easily consumable. A great example is using FrameMaker and DITA for creating a set of help documents. In this case I could see using something like FCKEditor and XForms, but it seems like that might fall short when you are talking about a rather large document set. With that said, you never know if the constraint could be beneficial.

@ tm

In one sense I totally agree that traditional users are not the best people to handle technical details such as creating a schema. The problem is that the too often people such as technical writers are forced into this role. It is very rare that a writing team will have a budget for a full time developer, which means the writers often have to step up to the plate.

With regard to how the system would actually use the content, this problem is already being handled (for better or worse) by CMS companies. Documentum is a good example, although, this is another space that I believe could be greatly improved. I've also seen writing teams using simple version control such as Subversion and Mercurial along with a basic shared filesystem. Rarely does this end up being an ideal situation, but nonetheless, the pieces are usually in place.

I am curious, have you ever played with the OpenOffice.org implementation of XForms? I have quickly perused it and it looked very powerful. I hope to look at it more in the new year.

I have long wished for an xml editor that could essentially grab an xml file, look it over, and display it (without the angle brackets) in some kind of sensible structure that resembles the xml's structure but prettied-up. Then it would allow you to add/remove additional sibling elements to those that exist, editing their text and attribute values by compiling dropdown lists of existing values in matching sibling elements or allowing you to type new values.

A simple CSS file would go a long way to making it pretty-enough and would easy enough for a regular user to create.

As soon as you want to do more than just editing raw XML, both XML and XForms editing seem get technical. Its like you have two types of tools: Notepad and MS Access. Most people are looking for something like Excel.

Great article. You've hit a nerve.

It seems that mostly those with mainly technical background meet in this discussion, and I hope that you do not mind me taking the chance once again, trying to explain the point of view of someone who wants to use the technology for a kind of work that is very different in content as well as in its approaches. The comments gave me a good opportunity to look for a better way to explain, I hope:

(1) Why do researchers in the humanities need more than ASCII?
(2) Why should a researcher without programming background get into the details of writing a schema?
(3) What kind of "editor" (developing environment vs. word processor) do we need?
(4) Which of the editors available adapts to these needs?

(1) The easy answer is: ASCII does not represent the writing systems of the many languages we (archaeologists, historians etc.) have to deal with.
The answer to the question why we need XML is manyfold:
- We collect data, quote literature, write text. With XML we can combine data, literature and text. (BibText is great, but not sufficient for people who work with several writing systems).
- We write text for different output, with XML we can produce this output.
- We can mark up according to content -- I do this 100%, no more style commands -- which has many well known merits, but I will add one less well known: The document tree of a text marked up well is almost as informative as a translation, thus it becomes possible to read text in languages a researcher does not understand well.
- We can create lists, vocabularies, dictionaries etc. from several documents automatically.
- The Web expert knows that marked-up text can be searched better than pure text by search enginges, but this is also true for manual search on a personal computer as well as for every-day work in our fields of study.
- We can comment text and data, and I wish I could have backed up my data from the early 90ies in XML rather than in tab-return delimeted files: It is possible to add comments, corrections, etc. to the individual data, and then, for certain analyses, always extract the version needed. That is much easier than the many logs and copied versions of data files that fills the hard disk and make it difficult to find something.
(additional remark:) Writing in a pure text editor's environment is fun, this is why I also wrote a mark-down version of the format, which can be converted via XSLT 2.0 when opened in an XML-editor.

(2) To create something useful, it needs the knowledge of those who create the tool as well as of those who use it. If you build a house, it needs an architect, the plummer, etc., but it will never be a good house without the knowledge of the family that wants to live in it.
Same with a data format (schema, ...) that may be fruitfully used in the humanities: The best solution is to build a team with XML specialists and researchers who share a certain knowledge and interests in the other side's problems.
XML is basically very simple, this is what makes it charming for people like me. When a schema is created it should be as simple as possible, such that we can exploit it fruitfully. This is another reason why people who are not experts should be heared throughout the process of creating a schema: to keep it simple.
As long as both sides do not see a reason to co-operate, or as long as there is no chance to build such a team for financial or structural reasons, there is no other way for an interested non-specialist than to try the best and create a schema that works for us and look for helping hands wherever they can be found.

(3) As can be seen from (1), we do not need a simple word-processing environment that gets the brackets out of sight and adds some word count or so, but we also need some of the special features of XML: Combining data and text, creating a bibliography, selecting data for analysis, understanding text in unknown languages via mark up. I did not mention the various opportunities of working in different languages, writing systems, transscription systems, ideological systems etc.
An "XML editor" for the humanities should offer an interface for data and text creation similar to word processors and databases, it should extent the interfaces according to the possibility to filter parts or the documents and present them separately. But it also should offer features that create new documents out of collections or single source documents.

(4) Yet I only know one environment that offers all these options and is affordable, this is Oxygen. I did not yet encounter a problem that could not be solved, in the worst case with a small workaround. I tried the new free Serna Editor yesterday, which is amazingly fast, good looking and focussed. But it lacks some of the features mentioned in (3). So my solution for the time being is working on a framework for Oxygen that uses a schema (Relax NG) for the humanities and in the long term trying to write a user interface that is more focussed on the specific schema and the special needs in our field of study.

I hope I could make my point of view a bit more understandable, and apologize for being so long. I appreciate if both sides become more understanding, in order to get something good out of the opportunities XML offers not only in the world of technology.

I started working with SGML in late 1991, after many years of experience with printing and publishing. I have been working with SGML and XML ever since.

From the beginning, the user community wanted an authoring tool that made the technology invisible. They did not demand WYSIWYG, they just wanted the technology to be transparent. And like most software users, they did not want to have to learn a lot of technical knowledge in order to use structured markup authoring software.

However, it is not in the nature of software developers to want to hide something as cool as SGML or XML. They feel users just need more training in order to appreciate the technology as much as they do.

There is merit to this point of view -- structured markup IS really cool!

However, this attitude is not helpful to software users, who could care less about the underlying technology in the software they use. They just want to do their jobs without feeling stupid.

Here's an exchange between two users that may help illustrate the difference in thinking between software users and IT developers. These users -- both college educated, one with a Master's degree -- noticed a new button on the keyboard and reacted as follows:

User #1: What's this?
User #2: I don't know. Let's not try it.

Most IT developers, however, can't wait to try something new and don't realize that users do not share their point of view.

Structured markup authoring software makers responded to the user community by trying to make their products WYSIWYG. Some allow authors to do anything and fix problems during a later validation process. Some show document structure and element names without tags in the text. Some enable users to toggle tags on and off. Some combine some or all of the above.

But none of them has solved the problem because WYSIWYG is not the solution. Which should not be surprising -- WYSIWYG is last-century technology that is based on the typewriter.

The typewriter was a huge technological leap a century ago, similar to the PC in its time. But the day of the typewriter is long gone. Today's users understand the difference between input and output. Even Microsoft Word, the ubiquitous WYSIWYG tool, has a preview that is different from the input view.

So part of the solution to a user-friendly SGML/XML authoring interface is to stop trying to make it WYSIWYG.

The rest of the solution must eliminate the need for technical knowledge. Software users want an interface they can figure out with minimal instruction and a little trial and error. They do NOT want to have to learn about the underlying technology in order to successfully use the software.

It's a challenging problem, but not an insurmountable one.

News Topics

Recommended for You

Got a Question?