PyMOTW: Parsing XML Documents with ElementTree

By Doug Hellmann
March 14, 2010 | Comments: 1

Parsing XML Documents with ElementTree

Parsed XML documents are represented in memory by ElementTree and Element objects connected into a tree structure based on the way the nodes in the XML document are nested.

Parsing an Entire Document

When you parse an entire document with parse(), an ElementTree instance is returned. The tree knows about all of the data in the input document, and the nodes of the tree can be searched or manipulated in place. While this flexibility can make working with the parsed document a little easier, it typically takes more memory than an event-based parsing approach since the entire document must be loaded at one time.

The memory footprint of small, simple documents such as this list of podcasts represented as an OPML outline is not significant:

<?xml version="1.0" encoding="UTF-8"?>
<opml version="1.0">
<head>
	<title>My Podcasts</title>
	<dateCreated>Sun, 07 Mar 2010 15:53:26 GMT</dateCreated>
	<dateModified>Sun, 07 Mar 2010 15:53:26 GMT</dateModified>
</head>
<body>
  <outline text="Science and Tech">
    <outline text="APM: Future Tense" type="rss" 
             xmlUrl="http://www.publicradio.org/columns/futuretense/podcast.xml" 
             htmlUrl="http://www.publicradio.org/columns/futuretense/" />
	<outline text="Engines Of Our Ingenuity Podcast" type="rss" 
             xmlUrl="http://www.npr.org/rss/podcast.php?id=510030" 
             htmlUrl="http://www.uh.edu/engines/engines.htm" />
	<outline text="Science &#38; the City" type="rss" 
             xmlUrl="http://www.nyas.org/Podcasts/Atom.axd" 
             htmlUrl="http://www.nyas.org/WhatWeDo/SciencetheCity.aspx" />
  </outline>
  <outline text="Books and Fiction">
	<outline text="Podiobooker" type="rss" 
             xmlUrl="http://feeds.feedburner.com/podiobooks" 
             htmlUrl="http://www.podiobooks.com/blog" />
	<outline text="The Drabblecast" type="rss" 
             xmlUrl="http://web.me.com/normsherman/Site/Podcast/rss.xml" 
             htmlUrl="http://web.me.com/normsherman/Site/Podcast/Podcast.html" />
	<outline text="tor.com / category / tordotstories" type="rss" 
             xmlUrl="http://www.tor.com/rss/category/TorDotStories" 
             htmlUrl="http://www.tor.com/" />
  </outline>
  <outline text="Computers and Programming">
	<outline text="MacBreak Weekly" type="rss" 
             xmlUrl="http://leo.am/podcasts/mbw" 
             htmlUrl="http://twit.tv/mbw" />
	<outline text="FLOSS Weekly" type="rss" 
             xmlUrl="http://leo.am/podcasts/floss" 
             htmlUrl="http://twit.tv" />
	<outline text="Core Intuition" type="rss" 
             xmlUrl="http://www.coreint.org/podcast.xml" 
             htmlUrl="http://www.coreint.org/" />
  </outline>
  <outline text="Python">
    <outline text="PyCon Podcast" type="rss" 
             xmlUrl="http://advocacy.python.org/podcasts/pycon.rss" 
             htmlUrl="http://advocacy.python.org/podcasts/" />
	<outline text="A Little Bit of Python" type="rss" 
             xmlUrl="http://advocacy.python.org/podcasts/littlebit.rss" 
             htmlUrl="http://advocacy.python.org/podcasts/" />
	<outline text="Django Dose Everything Feed" type="rss" 
             xmlUrl="http://djangodose.com/everything/feed/" />
  </outline>
  <outline text="Miscelaneous">
	<outline text="dhellmann's CastSampler Feed" type="rss" 
             xmlUrl="http://www.castsampler.com/cast/feed/rss/dhellmann/" 
             htmlUrl="http://www.castsampler.com/users/dhellmann/" />
  </outline>
</body>
</opml>

To parse the file, pass an open file handle to parse(). It will read the data, parse the XML, and return an ElementTree object.

from xml.etree import ElementTree

with open('podcasts.opml', 'rt') as f:
    tree = ElementTree.parse(f)

print tree
$ python ElementTree_parse_opml.py
<xml.etree.ElementTree.ElementTree instance at 0x82f58>

Traversing the Parsed Tree

Now that we have a parsed XML tree, we can iterate over it, visiting all of the children in order and examining their attributes and contents.

from xml.etree import ElementTree

with open('podcasts.opml', 'rt') as f:
    tree = ElementTree.parse(f)

for node in tree.getiterator():
    print node.tag, node.attrib

Here we print the entire tree, one tag at a time.

$ python ElementTree_dump_opml.py
opml {'version': '1.0'}
head {}
title {}
dateCreated {}
dateModified {}
body {}
outline {'text': 'Science and Tech'}
outline {'xmlUrl': 'http://www.publicradio.org/columns/futuretense/podcast.xml', 'text': 'APM: Future Tense', 'type': 'rss', 'htmlUrl': 'http://www.publicradio.org/columns/futuretense/'}
outline {'xmlUrl': 'http://www.npr.org/rss/podcast.php?id=510030', 'text': 'Engines Of Our Ingenuity Podcast', 'type': 'rss', 'htmlUrl': 'http://www.uh.edu/engines/engines.htm'}
outline {'xmlUrl': 'http://www.nyas.org/Podcasts/Atom.axd', 'text': 'Science & the City', 'type': 'rss', 'htmlUrl': 'http://www.nyas.org/WhatWeDo/SciencetheCity.aspx'}
outline {'text': 'Books and Fiction'}
outline {'xmlUrl': 'http://feeds.feedburner.com/podiobooks', 'text': 'Podiobooker', 'type': 'rss', 'htmlUrl': 'http://www.podiobooks.com/blog'}
outline {'xmlUrl': 'http://web.me.com/normsherman/Site/Podcast/rss.xml', 'text': 'The Drabblecast', 'type': 'rss', 'htmlUrl': 'http://web.me.com/normsherman/Site/Podcast/Podcast.html'}
outline {'xmlUrl': 'http://www.tor.com/rss/category/TorDotStories', 'text': 'tor.com / category / tordotstories', 'type': 'rss', 'htmlUrl': 'http://www.tor.com/'}
outline {'text': 'Computers and Programming'}
outline {'xmlUrl': 'http://leo.am/podcasts/mbw', 'text': 'MacBreak Weekly', 'type': 'rss', 'htmlUrl': 'http://twit.tv/mbw'}
outline {'xmlUrl': 'http://leo.am/podcasts/floss', 'text': 'FLOSS Weekly', 'type': 'rss', 'htmlUrl': 'http://twit.tv'}
outline {'xmlUrl': 'http://www.coreint.org/podcast.xml', 'text': 'Core Intuition', 'type': 'rss', 'htmlUrl': 'http://www.coreint.org/'}
outline {'text': 'Python'}
outline {'xmlUrl': 'http://advocacy.python.org/podcasts/pycon.rss', 'text': 'PyCon Podcast', 'type': 'rss', 'htmlUrl': 'http://advocacy.python.org/podcasts/'}
outline {'xmlUrl': 'http://advocacy.python.org/podcasts/littlebit.rss', 'text': 'A Little Bit of Python', 'type': 'rss', 'htmlUrl': 'http://advocacy.python.org/podcasts/'}
outline {'xmlUrl': 'http://djangodose.com/everything/feed/', 'text': 'Django Dose Everything Feed', 'type': 'rss'}
outline {'text': 'Miscelaneous'}
outline {'xmlUrl': 'http://www.castsampler.com/cast/feed/rss/dhellmann/', 'text': "dhellmann's CastSampler Feed", 'type': 'rss', 'htmlUrl': 'http://www.castsampler.com/users/dhellmann/'}

If we wanted to print only the groups of names and feed URLs for the podcasts, leaving out of all of the data in the header section, we could iterate over only just the outline nodes and print the text and xmlUrl attributes.

from xml.etree import ElementTree

with open('podcasts.opml', 'rt') as f:
    tree = ElementTree.parse(f)

for node in tree.getiterator('outline'):
    name = node.attrib.get('text')
    url = node.attrib.get('xmlUrl')
    if name and url:
        print '  %s :: %s' % (name, url)
    else:
        print name

Because we passed 'outline' to tree.getiterator() processing is limited to only nodes with the tag 'outline'.

$ python ElementTree_show_feed_urls.py
Science and Tech
  APM: Future Tense :: http://www.publicradio.org/columns/futuretense/podcast.xml
  Engines Of Our Ingenuity Podcast :: http://www.npr.org/rss/podcast.php?id=510030
  Science & the City :: http://www.nyas.org/Podcasts/Atom.axd
Books and Fiction
  Podiobooker :: http://feeds.feedburner.com/podiobooks
  The Drabblecast :: http://web.me.com/normsherman/Site/Podcast/rss.xml
  tor.com / category / tordotstories :: http://www.tor.com/rss/category/TorDotStories
Computers and Programming
  MacBreak Weekly :: http://leo.am/podcasts/mbw
  FLOSS Weekly :: http://leo.am/podcasts/floss
  Core Intuition :: http://www.coreint.org/podcast.xml
Python
  PyCon Podcast :: http://advocacy.python.org/podcasts/pycon.rss
  A Little Bit of Python :: http://advocacy.python.org/podcasts/littlebit.rss
  Django Dose Everything Feed :: http://djangodose.com/everything/feed/
Miscelaneous
  dhellmann's CastSampler Feed :: http://www.castsampler.com/cast/feed/rss/dhellmann/

Finding Nodes in a Document

Walking the entire tree yourself like this searching for relevant nodes can be error prone. In the example above, we had to look at each outline node to determine if it was a group (nodes with only a “text” attribute) or podcast (with both “text” and “xmlUrl”). If we were writing a podcast downloader and needed to produce a simple list of the podcast feed URLs, without names or groups, we might simplify the logic using findall() to look for nodes with more descriptive search characteristics.

A first pass at converting the above example might construct an XPath argument to look for all outline nodes.

from xml.etree import ElementTree

with open('podcasts.opml', 'rt') as f:
    tree = ElementTree.parse(f)

for node in tree.findall('.//outline'):
    url = node.attrib.get('xmlUrl')
    if url:
        print url

The logic in this version is not substantially different than the version using getiterator(). We still have to check for the presence of the URL, except that we don’t print the group name when the URL is not found.

$ python ElementTree_find_feeds_by_tag.py
http://www.publicradio.org/columns/futuretense/podcast.xml
http://www.npr.org/rss/podcast.php?id=510030
http://www.nyas.org/Podcasts/Atom.axd
http://feeds.feedburner.com/podiobooks
http://web.me.com/normsherman/Site/Podcast/rss.xml
http://www.tor.com/rss/category/TorDotStories
http://leo.am/podcasts/mbw
http://leo.am/podcasts/floss
http://www.coreint.org/podcast.xml
http://advocacy.python.org/podcasts/pycon.rss
http://advocacy.python.org/podcasts/littlebit.rss
http://djangodose.com/everything/feed/
http://www.castsampler.com/cast/feed/rss/dhellmann/

Another version can take advantage of the fact that we know the outline nodes are only nested two levels deep. If we change the search path to .//outline/outline we will process only the second level of outline nodes.

from xml.etree import ElementTree

with open('podcasts.opml', 'rt') as f:
    tree = ElementTree.parse(f)

for node in tree.findall('.//outline/outline'):
    url = node.attrib.get('xmlUrl')
    print url

We expect all of those outline nodes nested 2 levels deep in the input will have the xmlURL attribute refering to the podcast feed, so if we were brave we could skip checking for for the attribute before using it.

$ python ElementTree_find_feeds_by_structure.py
http://www.publicradio.org/columns/futuretense/podcast.xml
http://www.npr.org/rss/podcast.php?id=510030
http://www.nyas.org/Podcasts/Atom.axd
http://feeds.feedburner.com/podiobooks
http://web.me.com/normsherman/Site/Podcast/rss.xml
http://www.tor.com/rss/category/TorDotStories
http://leo.am/podcasts/mbw
http://leo.am/podcasts/floss
http://www.coreint.org/podcast.xml
http://advocacy.python.org/podcasts/pycon.rss
http://advocacy.python.org/podcasts/littlebit.rss
http://djangodose.com/everything/feed/
http://www.castsampler.com/cast/feed/rss/dhellmann/

This version is limited to our existing structure, though, so if the outline nodes are ever rearranged into a deeper tree it will stop working.

Parsed Node Attributes

The items returned by findall() and getiterator() are Element objects, each representing a node in the XML parse tree. Each Element has attributes for accessing data pulled out of the XML. This can be illustrated with a somewhat more contrived example input file, data.xml:

1
2
3
4
5
6
7
<?xml version="1.0" encoding="UTF-8"?>
<top>
  <child>This child contains text.</child>
  <child_with_tail>This child has regular text.</child_with_tail>And "tail" text.
  <with_attributes name="value" foo="bar" />
  <entity_expansion attribute="This &#38; That">That &#38; This</entity_expansion>
</top>

The “attributes” of a node are available in the attrib property, which acts like a dictionary.

from xml.etree import ElementTree

with open('data.xml', 'rt') as f:
    tree = ElementTree.parse(f)

node = tree.find('./with_attributes')
print node.tag
for name, value in sorted(node.attrib.items()):
    print '  %-4s = "%s"' % (name, value)
    

The node on line 5 of the input file has 2 attributes, name and foo.

$ python ElementTree_node_attributes.py
with_attributes
  foo  = "bar"
  name = "value"

The text content of the nodes is available, along with the “tail” text that comes after the end of a close tag.

from xml.etree import ElementTree

with open('data.xml', 'rt') as f:
    tree = ElementTree.parse(f)

for path in [ './child', './child_with_tail' ]:
    node = tree.find(path)
    print node.tag
    print '  child node text:', node.text
    print '  and tail text  :', node.tail

The child node on line 3 contains embedded text, and the node on line 4 has text with a tail (including any whitespace).

$ python ElementTree_node_text.py
child
  child node text: This child contains text.
  and tail text  :

child_with_tail
  child node text: This child has regular text.
  and tail text  : And "tail" text.

Conveniently, XML entity references embedded in the document are converted to the appropriate characters before values are returned.

from xml.etree import ElementTree

with open('data.xml', 'rt') as f:
    tree = ElementTree.parse(f)

node = tree.find('entity_expansion')
print node.tag
print '  in attribute:', node.attrib['attribute']
print '  in text     :', node.text

The conversion saves you from having to worry about an implementation detail of representing certain characters in an XML document.

$ python ElementTree_entity_references.py
entity_expansion
  in attribute: This & That
  in text     : That & This

Watching Events While Parsing

The other API useful for processing XML documents is event-based. The parser generates start events for opening tags and end events for closing tags. Iterating over the event stream lets you extract data from the document while parsing it, which is convenient if you don’t need to manipulate the entire document afterwards and if you want to avoid holding the entire parsed document in memory.

iterparse() returns an iterable that produces tuples containing the name of the event and the node triggering the event. Events can be one of:

start
A new tag has been encountered. The closing angle bracket of the tag was processed, but not the contents.
end
The closing angle bracket of a closing tag has been processed. All of the children were already processed.
start-ns
Start a namespace declaration.
end-ns
End a namespace declaration.
from xml.etree.ElementTree import iterparse

depth = 0
prefix_width = 8
prefix_dots = '.' * prefix_width
line_template = '{prefix:<0.{prefix_len}}{event:<8}{suffix:<{suffix_len}} {node.tag:<12} {node_id}'

for (event, node) in iterparse('podcasts.opml', ['start', 'end', 'start-ns', 'end-ns']):
    if event == 'end':
        depth -= 1

    prefix_len = depth * 2
    
    print line_template.format(prefix=prefix_dots,
                               prefix_len=prefix_len,
                               suffix='',
                               suffix_len=(prefix_width - prefix_len),
                               node=node,
                               node_id=id(node),
                               event=event,
                               )
    
    if event == 'start':
        depth += 1

By default, only end events are generated. To see other events, pass the list of event names you want to receive to iterparse(), as in this example:

$ python ElementTree_show_all_events.py
start            opml         876256
..start          head         876336
....start        title        888920
....end          title        888920
....start        dateCreated  889280
....end          dateCreated  889280
....start        dateModified 889320
....end          dateModified 889320
..end            head         876336
..start          body         889400
....start        outline      889560
......start      outline      889600
......end        outline      889600
......start      outline      889480
......end        outline      889480
......start      outline      889680
......end        outline      889680
....end          outline      889560
....start        outline      889720
......start      outline      889760
......end        outline      889760
......start      outline      889840
......end        outline      889840
......start      outline      889920
......end        outline      889920
....end          outline      889720
....start        outline      889880
......start      outline      890040
......end        outline      890040
......start      outline      890120
......end        outline      890120
......start      outline      890200
......end        outline      890200
....end          outline      889880
....start        outline      890240
......start      outline      890360
......end        outline      890360
......start      outline      890440
......end        outline      890440
......start      outline      890520
......end        outline      890520
....end          outline      890240
....start        outline      890640
......start      outline      890720
......end        outline      890720
....end          outline      890640
..end            body         889400
end              opml         876256

The event-style of processing may be more natural for some operations, such as converting XML input to some other format. For example, suppose we want to convert the list of podcasts we have been working with from an XML file to a data file we can load into a spreadsheet or database application. We don’t need to hold the entire data set in memory at a time, since we’re simply changing the format.

import csv
from xml.etree.ElementTree import iterparse
import sys

writer = csv.writer(sys.stdout, quoting=csv.QUOTE_NONNUMERIC)

group_name = ''

for (event, node) in iterparse('podcasts.opml', events=['start']):
    if node.tag != 'outline':
        # Ignore anything not part of the outline
        continue
    if not node.attrib.get('xmlUrl'):
        # Remember the current group
        group_name = node.attrib['text']
    else:
        # Output a podcast entry
        writer.writerow( (group_name, node.attrib['text'],
                          node.attrib['xmlUrl'],
                          node.attrib.get('htmlUrl', ''),
                          )
                         )

This example program converts our podcast list to a CSV file, ready to be imported into another application.

$ python ElementTree_write_podcast_csv.py
"Science and Tech","APM: Future Tense","http://www.publicradio.org/columns/futuretense/podcast.xml","http://www.publicradio.org/columns/futuretense/"
"Science and Tech","Engines Of Our Ingenuity Podcast","http://www.npr.org/rss/podcast.php?id=510030","http://www.uh.edu/engines/engines.htm"
"Science and Tech","Science & the City","http://www.nyas.org/Podcasts/Atom.axd","http://www.nyas.org/WhatWeDo/SciencetheCity.aspx"
"Books and Fiction","Podiobooker","http://feeds.feedburner.com/podiobooks","http://www.podiobooks.com/blog"
"Books and Fiction","The Drabblecast","http://web.me.com/normsherman/Site/Podcast/rss.xml","http://web.me.com/normsherman/Site/Podcast/Podcast.html"
"Books and Fiction","tor.com / category / tordotstories","http://www.tor.com/rss/category/TorDotStories","http://www.tor.com/"
"Computers and Programming","MacBreak Weekly","http://leo.am/podcasts/mbw","http://twit.tv/mbw"
"Computers and Programming","FLOSS Weekly","http://leo.am/podcasts/floss","http://twit.tv"
"Computers and Programming","Core Intuition","http://www.coreint.org/podcast.xml","http://www.coreint.org/"
"Python","PyCon Podcast","http://advocacy.python.org/podcasts/pycon.rss","http://advocacy.python.org/podcasts/"
"Python","A Little Bit of Python","http://advocacy.python.org/podcasts/littlebit.rss","http://advocacy.python.org/podcasts/"
"Python","Django Dose Everything Feed","http://djangodose.com/everything/feed/",""
"Miscelaneous","dhellmann's CastSampler Feed","http://www.castsampler.com/cast/feed/rss/dhellmann/","http://www.castsampler.com/users/dhellmann/"

Creating Your Own Tree Builder

A potentially more efficient means of handling parse events is to replace the standard tree builder behavior with your own. The ElementTree parser uses an XMLTreeBuilder to process the XML and call methods on a target class to save the results. The usual output is an ElementTree instance created by the default TreeBuilder class. By replacing TreeBuilder with your own class, you can receive the events before the Element nodes are instantiated, saving that portion of the overhead.

The XML-to-CSV app from the previous section can be translated to a tree builder.

import csv
from xml.etree.ElementTree import XMLTreeBuilder
import sys

class PodcastListToCSV(object):

    def __init__(self, outputFile):
        self.writer = csv.writer(outputFile, quoting=csv.QUOTE_NONNUMERIC)
        self.group_name = ''
        return

    def start(self, tag, attrib):
        if tag != 'outline':
            # Ignore anything not part of the outline
            return
        if not attrib.get('xmlUrl'):
            # Remember the current group
            self.group_name = attrib['text']
        else:
            # Output a podcast entry
            self.writer.writerow( (self.group_name, attrib['text'],
                                   attrib['xmlUrl'],
                                   attrib.get('htmlUrl', ''),
                                   )
                                  )

    def end(self, tag):
        # Ignore closing tags
        pass
    def data(self, data):
        # Ignore data inside nodes
        pass
    def close(self):
        # Nothing special to do here
        return


target = PodcastListToCSV(sys.stdout)
parser = XMLTreeBuilder(target=target)
with open('podcasts.opml', 'rt') as f:
    for line in f:
        parser.feed(line)
parser.close()

PodcastListToCSV implements the TreeBuilder protocol. Each time a new XML tag is encountered, start() is called with the tag name and attributes. When a closing tag is seen end() is called with the name. In between, data() is called when a node has content (the tree builder is expected to keep up with the “current” node). When all of the input is processed, close() is called. It can return a value, which will be returned to the user of the XMLTreeBuilder.

$ python ElementTree_podcast_csv_treebuilder.py
"Science and Tech","APM: Future Tense","http://www.publicradio.org/columns/futuretense/podcast.xml","http://www.publicradio.org/columns/futuretense/"
"Science and Tech","Engines Of Our Ingenuity Podcast","http://www.npr.org/rss/podcast.php?id=510030","http://www.uh.edu/engines/engines.htm"
"Science and Tech","Science & the City","http://www.nyas.org/Podcasts/Atom.axd","http://www.nyas.org/WhatWeDo/SciencetheCity.aspx"
"Books and Fiction","Podiobooker","http://feeds.feedburner.com/podiobooks","http://www.podiobooks.com/blog"
"Books and Fiction","The Drabblecast","http://web.me.com/normsherman/Site/Podcast/rss.xml","http://web.me.com/normsherman/Site/Podcast/Podcast.html"
"Books and Fiction","tor.com / category / tordotstories","http://www.tor.com/rss/category/TorDotStories","http://www.tor.com/"
"Computers and Programming","MacBreak Weekly","http://leo.am/podcasts/mbw","http://twit.tv/mbw"
"Computers and Programming","FLOSS Weekly","http://leo.am/podcasts/floss","http://twit.tv"
"Computers and Programming","Core Intuition","http://www.coreint.org/podcast.xml","http://www.coreint.org/"
"Python","PyCon Podcast","http://advocacy.python.org/podcasts/pycon.rss","http://advocacy.python.org/podcasts/"
"Python","A Little Bit of Python","http://advocacy.python.org/podcasts/littlebit.rss","http://advocacy.python.org/podcasts/"
"Python","Django Dose Everything Feed","http://djangodose.com/everything/feed/",""
"Miscelaneous","dhellmann's CastSampler Feed","http://www.castsampler.com/cast/feed/rss/dhellmann/","http://www.castsampler.com/users/dhellmann/"

Parsing Strings

To work with smaller bits of XML text, especially string literals as might be embedded in the source of a program, use xml.etree.ElementTree.XML and pass a single argument, the string containing the XML to be parsed.

from xml.etree.ElementTree import XML

parsed = XML('''
<root>
  <group>
    <child id="a">This is child "a".</child>
    <child id="b">This is child "b".</child>
  </group>
  <group>
    <child id="c">This is child "c".</child>
  </group>
</root>
''')

print 'parsed =', parsed

for elem in parsed.getiterator():
    print elem.tag
    if elem.text is not None and elem.text.strip():
        print '  text: "%s"' % elem.text
    if elem.tail is not None and elem.tail.strip():
        print '  tail: "%s"' % elem.tail
    for name, value in sorted(elem.attrib.items()):
        print '  %-4s = "%s"' % (name, value)
    print

Notice that unlike with parse(), the return value is an Element instance instead of an ElementTree.

$ python ElementTree_XML.py
parsed = <Element root at d4e40>
root

group

child
  text: "This is child "a"."
  id   = "a"

child
  text: "This is child "b"."
  id   = "b"

group

child
  text: "This is child "c"."
  id   = "c"

For structured XML that uses the “id” attribute to identify unique nodes of interest, XMLID() is a convenient way to access the parse results.

from xml.etree.ElementTree import XMLID

tree, id_map = XMLID('''
<root>
  <group>
    <child id="a">This is child "a".</child>
    <child id="b">This is child "b".</child>
  </group>
  <group>
    <child id="c">This is child "c".</child>
  </group>
</root>
''')

for key, value in sorted(id_map.items()):
    print '%s = %s' % (key, value)
    

XMLID() returns the parsed tree as an Element object, along with a dictionary mapping the id attribute strings to the individual nodes in the tree.

$ python ElementTree_XMLID.py
a = <Element child at d3eb8>
b = <Element child at d3d78>
c = <Element child at d9030>

See also

Outline Processor Markup Language, OPML
Dave Winer’s OPML specification and documentation.
XPath Support in ElementTree
Part of Fredrick Lundh’s original documentation for ElementTree.
csv
Read and write comma-separated-value files

PyMOTW Home

The canonical version of this article


You might also be interested in:

1 Comment

thanks for information..

News Topics

Recommended for You

Got a Question?