PyMOTW: Creating XML Documents with ElementTree

By Doug Hellmann
March 21, 2010 | Comments: 1

Creating XML Documents with ElementTree

In addition to its parsing capabilities, ElementTree also supports creating well-formed XML documents from Element objects constructed in your application. The Element class used when a document is parsed also knows how to generate a serialized form of its contents, which can then be written to a file or other data stream.

Building Element Nodes

There are three helper functions useful for creating a hierarchy of Element nodes. Element() creates a standard node, SubElement() attaches a new node to a parent, and Comment()creates a node that serializes using XML’s comment syntax.

from xml.etree.ElementTree import Element, SubElement, Comment, tostring

top = Element('top')

comment = Comment('Generated for PyMOTW')
top.append(comment)

child = SubElement(top, 'child')
child.text = 'This child contains text.'

child_with_tail = SubElement(top, 'child_with_tail')
child_with_tail.text = 'This child has regular text.'
child_with_tail.tail = 'And "tail" text.'

child_with_entity_ref = SubElement(top, 'child_with_entity_ref')
child_with_entity_ref.text = 'This & that'

print tostring(top)

The output contains only the XML nodes in the tree, not the XML declaration with version and encoding.

$ python ElementTree_create.py
<top><!-- Generated for PyMOTW --><child>This child contains text.</child><child_with_tail>This child has regular text.</child_with_tail>And "tail" text.<child_with_entity_ref>This &amp; that</child_with_entity_ref></top>

Notice that the & character in the text of child_with_entity_ref is converted to the entity reference &amp; automatically.

Pretty-Printing XML

No effort is made by ElementTree to “pretty print” the output produced by tostring(), since adding extra whitespace changes the contents of the document. To make the output easier to follow for human readers, the rest of the examples below will use a tip I found onlineand re-parse the XML with xml.dom.minidom then use its toprettyxml() method.

from xml.etree import ElementTree
from xml.dom import minidom

def prettify(elem):
    """Return a pretty-printed XML string for the Element.
    """
    rough_string = ElementTree.tostring(elem, 'utf-8')
    reparsed = minidom.parseString(rough_string)
    return reparsed.toprettyxml(indent="  ")

The updated example now looks like:

from xml.etree.ElementTree import Element, SubElement, Comment
from ElementTree_pretty import prettify

top = Element('top')

comment = Comment('Generated for PyMOTW')
top.append(comment)

child = SubElement(top, 'child')
child.text = 'This child contains text.'

child_with_tail = SubElement(top, 'child_with_tail')
child_with_tail.text = 'This child has regular text.'
child_with_tail.tail = 'And "tail" text.'

child_with_entity_ref = SubElement(top, 'child_with_entity_ref')
child_with_entity_ref.text = 'This & that'

print prettify(top)

and the output is easier to read:

$ python ElementTree_create_pretty.py
<?xml version="1.0" ?>
<top>
  <!-- Generated for PyMOTW -->
  <child>
    This child contains text.
  </child>
  <child_with_tail>
    This child has regular text.
  </child_with_tail>
  And &quot;tail&quot; text.
  <child_with_entity_ref>
    This &amp; that
  </child_with_entity_ref>
</top>

In addition to the extra whitespace for formatting, the xml.dom.minidom pretty-printer also adds an XML declaration to the output.

Setting Element Properties

The previous example created nodes with tags and text content, but did not set any attributes of the nodes. Many of the examples from Parsing XML Documents with ElementTree worked with an OPML file listing podcasts and their feeds. The outline nodes in the tree used attributes for the group names and podcast properties. We can use ElementTree to construct a similar XML file from a CSV input file, setting all of the element attributes as the tree is constructed.

import csv
from xml.etree.ElementTree import Element, SubElement, Comment, tostring
import datetime
from ElementTree_pretty import prettify

generated_on = str(datetime.datetime.now())

# Configure one attribute with set()
root = Element('opml')
root.set('version', '1.0')

root.append(Comment('Generated by ElementTree_csv_to_xml.py for PyMOTW'))

head = SubElement(root, 'head')
title = SubElement(head, 'title')
title.text = 'My Podcasts'
dc = SubElement(head, 'dateCreated')
dc.text = generated_on
dm = SubElement(head, 'dateModified')
dm.text = generated_on

body = SubElement(root, 'body')

with open('podcasts.csv', 'rt') as f:
    current_group = None
    reader = csv.reader(f)
    for row in reader:
        group_name, podcast_name, xml_url, html_url = row
        if not current_group or group_name != current_group.text:
            # Start a new group
            current_group = SubElement(body, 'outline', {'text':group_name})
        # Add this podcast to the group,
        # setting all of its attributes at
        # once.
        podcast = SubElement(current_group, 'outline',
                             {'text':podcast_name,
                              'xmlUrl':xml_url,
                              'htmlUrl':html_url,
                              })

print prettify(root)

The attribute values can be configured one at a time with set()(as with the root node), or all at once by passing a dictionary to the node factory (as with each group and podcast node).

$ python ElementTree_csv_to_xml.py
<?xml version="1.0" ?>
<opml version="1.0">
  <!-- Generated by ElementTree_csv_to_xml.py for PyMOTW -->
  <head>
    <title>
      My Podcasts
    </title>
    <dateCreated>
      2010-03-21 11:40:24.042726
    </dateCreated>
    <dateModified>
      2010-03-21 11:40:24.042726
    </dateModified>
  </head>
  <body>
    <outline text="Science and Tech">
      <outline htmlUrl="http://www.publicradio.org/columns/futuretense/" text="APM: Future Tense" xmlUrl="http://www.publicradio.org/columns/futuretense/podcast.xml"/>
    </outline>
    <outline text="Science and Tech">
      <outline htmlUrl="http://www.uh.edu/engines/engines.htm" text="Engines Of Our Ingenuity Podcast" xmlUrl="http://www.npr.org/rss/podcast.php?id=510030"/>
    </outline>
    <outline text="Science and Tech">
      <outline htmlUrl="http://www.nyas.org/WhatWeDo/SciencetheCity.aspx" text="Science &amp; the City" xmlUrl="http://www.nyas.org/Podcasts/Atom.axd"/>
    </outline>
    <outline text="Books and Fiction">
      <outline htmlUrl="http://www.podiobooks.com/blog" text="Podiobooker" xmlUrl="http://feeds.feedburner.com/podiobooks"/>
    </outline>
    <outline text="Books and Fiction">
      <outline htmlUrl="http://web.me.com/normsherman/Site/Podcast/Podcast.html" text="The Drabblecast" xmlUrl="http://web.me.com/normsherman/Site/Podcast/rss.xml"/>
    </outline>
    <outline text="Books and Fiction">
      <outline htmlUrl="http://www.tor.com/" text="tor.com / category / tordotstories" xmlUrl="http://www.tor.com/rss/category/TorDotStories"/>
    </outline>
    <outline text="Computers and Programming">
      <outline htmlUrl="http://twit.tv/mbw" text="MacBreak Weekly" xmlUrl="http://leo.am/podcasts/mbw"/>
    </outline>
    <outline text="Computers and Programming">
      <outline htmlUrl="http://twit.tv" text="FLOSS Weekly" xmlUrl="http://leo.am/podcasts/floss"/>
    </outline>
    <outline text="Computers and Programming">
      <outline htmlUrl="http://www.coreint.org/" text="Core Intuition" xmlUrl="http://www.coreint.org/podcast.xml"/>
    </outline>
    <outline text="Python">
      <outline htmlUrl="http://advocacy.python.org/podcasts/" text="PyCon Podcast" xmlUrl="http://advocacy.python.org/podcasts/pycon.rss"/>
    </outline>
    <outline text="Python">
      <outline htmlUrl="http://advocacy.python.org/podcasts/" text="A Little Bit of Python" xmlUrl="http://advocacy.python.org/podcasts/littlebit.rss"/>
    </outline>
    <outline text="Python">
      <outline htmlUrl="" text="Django Dose Everything Feed" xmlUrl="http://djangodose.com/everything/feed/"/>
    </outline>
    <outline text="Miscelaneous">
      <outline htmlUrl="http://www.castsampler.com/users/dhellmann/" text="dhellmann's CastSampler Feed" xmlUrl="http://www.castsampler.com/cast/feed/rss/dhellmann/"/>
    </outline>
  </body>
</opml>

Serializing XML to a Stream

tostring() actually writes to an in-memory file-like object and then returns a string representing the entire element tree. When working with large amounts of data, it will take less memory and make more efficient use of the I/O libraries to write directly to a file handle using the write() method of ElementTree.

import sys
from xml.etree.ElementTree import Element, SubElement, Comment, ElementTree

top = Element('top')

comment = Comment('Generated for PyMOTW')
top.append(comment)

child = SubElement(top, 'child')
child.text = 'This child contains text.'

child_with_tail = SubElement(top, 'child_with_tail')
child_with_tail.text = 'This child has regular text.'
child_with_tail.tail = 'And "tail" text.'

child_with_entity_ref = SubElement(top, 'child_with_entity_ref')
child_with_entity_ref.text = 'This & that'

ElementTree(top).write(sys.stdout)

The example uses sys.stdout to write to the console, but it could also write to an open file or socket.

$ python ElementTree_write.py
<top><!-- Generated for PyMOTW --><child>This child contains text.</child><child_with_tail>This child has regular text.</child_with_tail>And "tail" text.<child_with_entity_ref>This &amp; that</child_with_entity_ref></top>

See also

Outline Processor Markup Language, OPML
Dave Winer’s OPML specification and documentation.

PyMOTW Home

The canonical version of this article


You might also be interested in:

1 Comment

wonderful announcement. The singnificance of this will be appreciated globally. Thanks for all the hard work on this Drummond and everyone else involved, and well done. We'll digest the real implications of this over weeks, months, years to come! Thanks
lightscribe dvd

News Topics

Recommended for You

Got a Question?