The xml
module¶
The XML module comes with Python. In the following section we will focus on the two sub-modules minidom and ElementTree.
Working with minidom
¶
In the following example we analyse books.xml
:
1<?xml version="1.0"?>
2<catalog>
3 <book id="1">
4 <title>Python basics</title>
5 <language>en</language>
6 <author>Veit Schiele</author>
7 <license>BSD-3-Clause</license>
8 <date>2021-10-28</date>
9 </book>
10 <book id="2">
11 <title>Jupyter Tutorial</title>
12 <language>en</language>
13 <author>Veit Schiele</author>
14 <license>BSD-3-Clause</license>
15 <date>2019-06-27</date>
16 </book>
17 <book id="3">
18 <title>Jupyter Tutorial</title>
19 <language>de</language>
20 <author>Veit Schiele</author>
21 <license>BSD-3-Clause</license>
22 <date>2020-10-26</date>
23 </book>
24 <book id="4">
25 <title>PyViz Tutorial</title>
26 <language>en</language>
27 <author>Veit Schiele</author>
28 <license>BSD-3-Clause</license>
29 <date>2020-04-13</date>
30 </book>
31</catalog>
To do this, we first import the
minidom
module and give it the same name so that it can be referenced more easily:1import xml.dom.minidom as minidom
Then we define the method
getTitles
and capture the desired XML tags with the methodgetElementsByTagName
:4def getTitles(xml): 5 """ 6 Print all titles found in books.xml 7 """ 8 doc = minidom.parse(xml) 9 node = doc.documentElement 10 books = doc.getElementsByTagName("book")
Then we create an empty list called
titles
, which is filled with the title objects:12 titles = [] 13 for book in books: 14 titleObj = book.getElementsByTagName("title")[0] 15 titles.append(titleObj)
Now the title is output in nested
for
-loops:17 for title in titles: 18 nodes = title.childNodes 19 for node in nodes: 20 if node.nodeType == node.TEXT_NODE: 21 print(node.data)
Finally, we set the
__name__
variable like__main__
so that the module can be executed like the main program. Then we apply ourgetTitles
method to ourbooks.xml
file:24if __name__ == "__main__": 25 document = "books.xml" 26 getTitles(document)
Parsing with ElementTree¶
Importing
cElementTree
:1import xml.etree.cElementTree as ET
Note
cElementTree
written in C and is considerably faster thanElementTree
.Then we define the method
parseXML
and theroot
element:4def parseXML(xml_file): 5 """ 6 Parse XML with ElementTree 7 """ 8 tree = ET.ElementTree(file=xml_file) 9 print(tree.getroot()) 10 root = tree.getroot() 11 print(f"tag={root.tag}, attrib={root.attrib}")
<Element 'catalog' at 0x10b009620> tag=catalog, attrib={}
Output the XML child elements of
book
:13 for child in root: 14 print(child.tag, child.attrib) 15 if child.tag == "book": 16 for step_child in child: 17 print(step_child.tag)
book {'id': '1'} title language author license date book {'id': '2'} ...
Output the contents of the child elements with
iter
:20 print("-" * 20) 21 print("Iterating using iter") 22 print("-" * 20) 23 books = root.iter() 24 for book in books: 25 book_children = book.iter() 26 for book_child in book_children: 27 print(f"{book_child.tag}={book_child.text}")
-------------------- Iterating using iter -------------------- catalog= book= title=Python basics language=en author=Veit Schiele license=BSD-3-Clause date=2021-10-28 book= title=Jupyter Tutorial ...