I've been meaning to write an XSLT-based XHTML markup sanitizer for a while now and tonight discovered I needed it sooner rather than later. In case you find benefit from it, here it is:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:lookup="http://xameleon.org/lookup" xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml" version="1.0" exclude-result-prefixes="html lookup">
<lookup:html>
<html:p use="p"/>
<html:em use="em"/>
<html:strong use="strong"/>
<html:b use="strong"/>
<html:i use="em"/>
<html:blockquote use="blockquote"/>
<html:cite use="cite"/>
</lookup:html>
<xsl:variable name="safe-elements" select="document('')//lookup:html/*"/>
<xsl:template match="/">
<div>
<xsl:apply-templates mode="validate"/>
</div>
</xsl:template>
<xsl:template match="html:div" mode="validate">
<xsl:apply-templates select="*|text()" mode="validate"/>
</xsl:template>
<xsl:template match="*" mode="validate">
<xsl:variable name="local-name" select="local-name()"/>
<xsl:apply-templates select="$safe-elements[local-name() = $local-name]/@use" mode="safe">
<xsl:with-param name="node" select="."/>
</xsl:apply-templates>
</xsl:template>
<xsl:template match="text()" mode="validate">
<!-- You could do some extended text matching here to remove any text seen as undesirable -->
<xsl:value-of select="."/>
</xsl:template>
<xsl:template match="@*" mode="safe">
<xsl:param name="node"/>
<xsl:element name="{.}">
<xsl:apply-templates select="$node/*|$node/text()" mode="validate"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
To avoid copy/pasting escaped markup, you can snag the same code from monoport.
To adapt to your specific needs, use the //lookup:html table to define which elements are okay and, if yes, what element name to map it to in the output. e.g. html:b becomes html:strong, html:i becomes html:em, and so forth.
The above code assumes all attributes are /evil/.
Enjoy!

Print
Listen
By