About Extracting Information from an HTML File

You can extract information from an HTML file by extending the html.parser.HTMLParser and overwriting the handle_*() methods. For example, this class lets you extract Open Graph information from a web page:

from html.parser import HTMLParser
import requests
from pprint import pprint

class OpenGraphParser(HTMLParser):
    OG_PROPERTIES = ["og:title", "og:type", "og:image", "og:url"]

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.og_data = {}

    def handle_starttag(self, tag, attrs):
        if tag.lower() == "meta":
            attrs_dict = dict(attrs)
            if (
                (prop := attrs_dict.get("property"))
                and (content := attrs_dict.get("content"))
                and prop in self.OG_PROPERTIES
            ):
                self.og_data[prop.replace("og:", "")] = content

    def get_data(self):
        return self.og_data

if __name__ == "__main__":
    response = requests.get("https://www.djangotricks.com/tricks/3J96KxVxbApk/")
    og_parser = OpenGraphParser()
    og_parser.feed(response.text)
    og_data = og_parser.get_data()
    pprint(og_data)

These methods are called repetitively for each occurrence, so you can collect them or search for a specific tag, text, character, or comment:

handle_startendtag(self, tag, attrs) - for each self-closing tag
handle_starttag(self, tag, attrs) - for each opening tag
handle_endtag(self, tag) - for each closing tag
handle_charref(self, name) - for each character reference, e.g. 🤩
handle_entityref(self, name) - for each entity reference, e.g. €
handle_data(self, data) - for each piece of inner text, including inline scripts and styles
handle_comment(self, data) - for each HTML comment

Tips and Tricks Programming Python 3 HTML5 Open Graph

Also by me

Django Paddle Subscriptions app

For Django-based SaaS projects.

Django App for You

Django GDPR Cookie Consent app

For Django websites that use cookies.

Django App for You

Book for You

Django 3 Web Development Cookbook

Learn how to build practical web projects with Django 3.

SaaS for You

Online prioritizer "1st things 1st"

It's not for everyone, but it might be for you!