The `xml` module¶

The XML module comes with Python. In the following section we will focus on the two sub-modules minidom and ElementTree.

Working with `minidom`¶

In the following example we analyse books.xml:

<?xml version="1.0"?>
<catalog>
   <book id="1">
      <title>Python basics</title>
      <language>en</language>
      <author>Veit Schiele</author>
      <license>BSD-3-Clause</license>
      <date>2021-10-28</date>
   </book>
   <book id="2">
      <title>Jupyter Tutorial</title>
      <language>en</language>
      <author>Veit Schiele</author>
      <license>BSD-3-Clause</license>
      <date>2019-06-27</date>
   </book>
   <book id="3">
      <title>Jupyter Tutorial</title>
      <language>de</language>
      <author>Veit Schiele</author>
      <license>BSD-3-Clause</license>
      <date>2020-10-26</date>
   </book>
   <book id="4">
      <title>PyViz Tutorial</title>
      <language>en</language>
      <author>Veit Schiele</author>
      <license>BSD-3-Clause</license>
      <date>2020-04-13</date>
   </book>
</catalog>

To do this, we first import the minidom module and give it the same name so that it can be referenced more easily:
```
1import xml.dom.minidom as minidom
```

Then we define the method getTitles and capture the desired XML tags with the method getElementsByTagName:

def getTitles(xml):
    """
    Print all titles found in books.xml
    """
    doc = minidom.parse(xml)
    node = doc.documentElement
    books = doc.getElementsByTagName("book")

Then we create an empty list called titles, which is filled with the title objects:

    titles = []
    for book in books:
        titleObj = book.getElementsByTagName("title")[0]
        titles.append(titleObj)

Now the title is output in nested for-loops:

    for title in titles:
        nodes = title.childNodes
        for node in nodes:
            if node.nodeType == node.TEXT_NODE:
                print(node.data)

Finally, we set the __name__ variable like __main__ so that the module can be executed like the main program. Then we apply our getTitles method to our books.xml file:
```
24if __name__ == "__main__":
25    document = "books.xml"
26    getTitles(document)
```

Parsing with ElementTree¶

Importing cElementTree:
```
1import xml.etree.cElementTree as ET
```
Note

cElementTree written in C and is considerably faster than ElementTree.

Then we define the method parseXML and the root element:

def parseXML(xml_file):
    """
    Parse XML with ElementTree
    """
    tree = ET.ElementTree(file=xml_file)
    print(tree.getroot())
    root = tree.getroot()
    print(f"tag={root.tag}, attrib={root.attrib}")

<Element 'catalog' at 0x10b009620>
tag=catalog, attrib={}

Output the XML child elements of book:

    for child in root:
        print(child.tag, child.attrib)
        if child.tag == "book":
            for step_child in child:
                print(step_child.tag)

book {'id': '1'}
title
language
author
license
date
book {'id': '2'}
...

Output the contents of the child elements with iter:

    print("-" * 20)
    print("Iterating using iter")
    print("-" * 20)
    books = root.iter()
    for book in books:
        book_children = book.iter()
        for book_child in book_children:
            print(f"{book_child.tag}={book_child.text}")

--------------------
Iterating using iter
--------------------
catalog=
book=
title=Python basics
language=en
author=Veit Schiele
license=BSD-3-Clause
date=2021-10-28
book=
title=Jupyter Tutorial
...

The xml module¶

Working with minidom¶

Parsing with ElementTree¶

The `xml` module¶

Working with `minidom`¶