About Parsing Nodes with Namespaces in XML
If you use defusedxml
(or lxml
) to parse RSS or other XML documents, you need to be able to read values from namespaced nodes, for example <content:encoded>
. You can do that by passing a dictionary with your namespaces to the find()
or findall()
methods, like this:
from defusedxml.ElementTree import fromstring
namespaces = {
"content": "http://purl.org/rss/1.0/modules/content/",
"dc": "http://purl.org/dc/elements/1.1/",
}
xml_doc = fromstring(xml_string)
for item in xml_doc.findall("channel/item"):
print(item.find("content:encoded", namespaces).text)
XML namespaces are usually declared in the root node of XML document with xmlns
prefix, for example:
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
>
<!-- ... --->
</rss>
Also by me
Django Paddle Subscriptions app
For Django-based SaaS projects.
Django App for You
Django GDPR Cookie Consent app
For Django websites that use cookies.
Django App for You