About Parsing Nodes with Namespaces in XML
If you use defusedxml (or lxml) to parse RSS or other XML documents, you need to be able to read values from namespaced nodes, for example <content:encoded>. You can do that by passing a dictionary with your namespaces to the find() or findall() methods, like this:
from defusedxml.ElementTree import fromstring
namespaces = {
"content": "http://purl.org/rss/1.0/modules/content/",
"dc": "http://purl.org/dc/elements/1.1/",
}
xml_doc = fromstring(xml_string)
for item in xml_doc.findall("channel/item"):
print(item.find("content:encoded", namespaces).text)
XML namespaces are usually declared in the root node of XML document with xmlns prefix, for example:
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
>
<!-- ... --->
</rss>
Also by me
Django Paddle Subscriptions app
For Django-based SaaS projects.
Django App for You
Django GDPR Cookie Consent app
For Django websites that use cookies.
Django App for You