Two new drafts out at W3C from the HTML 5 effort: HTML 5: The Markup Language (hat-tip Micah) and HTML 5: A vocabulary and associated APIs for HTML and XHTML (hat-tip Jeni.)
The first one is a model of the kinds of standards-writing we need: I'd recommend any standards editor looks at it for a model of a good solution to the problem they are trying solve. It uses standard notations or make simple objective statements that can be trivially implemented. In particular, see how easy it would be to implement its Assertions statements in Schematron: they are singular and objective. (I presume they spring from Henry Sivonen's validation work.)
The second one is much larger, and is where many of the fiddles of historical HTML applications go. So it is not surprising if it is a bit less crystalline than the markup language spec. Its contents are pretty good, though, which excuses a lot in a standard I suppose.
If I would find fault with it, I think it has the XML Schemas Part 1 problem of laboriously spelling out every step in natural language text: this disguises patterns in the constraints that diagrams or schemas or tables expose, which increases the reading burden on the reader. (Furthermore, artificial languages can be more readily automatically converted to code.) These are engineering problems and engineering has evolved a large set of diagramming techniques that should be used. You can link back to plain language descriptions, but it is dangerous to use language where less ambiguous notations are possible,
For example, why on earth don't they specify the parsing using either formal grammars or state diagrams or state tables? It is great that they actually do talk of state, but just using lists provides no certainly about missing transitions for example.
At least there is some levity. Here is the text for when in in-body insertion mode, whatever that is:
An end tag whose tag name is "sarcasm"
Take a deep breath, then act as described in the "any other end tag" entry below.
From the standards perspective, I think this may be a good approach for other specifications to follow: for the documents, a rigorous "minimum manual" approach using standard schema languages (or statements which are clearly trivially implementable in such) in particular RELAX NG, Schematron, XSD datatypes and EBNF. Then a separate specification giving semantics for a class of applications. It is a continual tension in both the ODF and the OOXML standardization efforts, so I am glad to see the HTML 5 editorial approach. From his comments, I think Murata Makoto is even more strong on this than me.
If you look at how difficult it is to draft standard text using required status terms like "shall" or "should", and how using other terms opens the door for abuse and malarky, I often think that we should just ban natural language from standards. Of course that is too much: I think Schematron's approach, where you back up natural language assertions with executable tests, is a much more practical approach.
QuibblesW3C is also providing a good document: HTML 5 differences from HTML 4
s2.1 I think they still get it wrong by looking at the trasnport later (e.g. the MIME header) to find the character encoding. APIs don't feed this, and it only works by accident.
s2.2 I don't see what the need for gratuitously departing from SGML and XML is, allowing <!doctype html> rather than <!DOCTYPE html>. Just make the lowercase version an optionally reportable, recoverable error and life would not be any different.
I see finally ruby text is making it into HTML. Only a decade late.
Other welcome changes include more widgets (e.g. menu, canvas, time) and some simple page-oriented features (header and footer). I was particularly pleased to see that <hr> has been given a semantic (or at least, a rhetorical function), being a paragraph-level thematic break: a step in the right direction.
Jeni Tennison was impressed with the microdata section: it seems to point out something obvious (if you label data, you can use it for stuff) but perhaps it is there for better direction.
All in all, HTML 5 looks really exciting. They have started to simplify the grammar which is good, but I would prefer further (e.g. Editor's Concrete Syntax) but still friendlier than XML.
By the way, did you know there is an ISO HTML too? It was a profile of HTML 4 designed to allow HTML to be used in certain government situtations and to provide a view of the technology more from the SGML angle: it was not an alternative to W3C HTML 4 but a service to users who needed HTML 4. The specification is online here (with corrigenda here) and a user guide here. A Japanese translation is available too.
One of the problems the users guide addressed was the lack of structural elements in HTML. It suggested using DIV1, DIV2, etc elements at authoring time but stripping them out for delivery as HTML 4. So the extra structural elements in HTML 5 are interesting: <section>, <article> and <figure; will make HTML far more useful as a structured format and for round-tripping of structure information through HTML.