Meta tags parser

Fast, modern, pure python meta tags parser and snippet creator with full support of type annotations, py.typed in basic package and structured output. No jelly dicts, only typed structures!
If you want to see what exactly is social media snippets, look at the example:

Requirements
Install
pip install meta-tags-parser
Usage
TL:DR
- Parse meta tags from source:
from meta_tags_parser import parse_meta_tags_from_source, structs
desired_result: structs.TagsGroup = parse_meta_tags_from_source("""... html source ...""")
- Parse meta tags from url:
from meta_tags_parser import parse_tags_from_url, parse_tags_from_url_async, structs
desired_result: structs.TagsGroup = parse_tags_from_url("https://xfenix.ru")
desired_result: structs.TagsGroup = await parse_tags_from_url_async("https://xfenix.ru")
- Parse social media snippet from source:
from meta_tags_parser import parse_snippets_from_source, structs
snippet_obj: structs.SnippetGroup = parse_snippets_from_source("""... html source ...""")
- Parse social media snippet from url:
from meta_tags_parser import parse_snippets_from_url, parse_snippets_from_url_async, structs
snippet_obj: structs.SnippetGroup = parse_snippets_from_url("https://xfenix.ru")
snippet_obj: structs.SnippetGroup = await parse_snippets_from_url_async("https://xfenix.ru")
Huge note: functions *_from_url
written only for convenience and very error-prone, so any reconnections/error handling — completely on your side.
Also, I don't want to add some bloated requirements to achieve robust connections for any users, because they may simply not await any of this from the library. But if you really need this — write me.
Basic snippets parsing
Lets say you want extract snippet for twitter from html page:
from meta_tags_parser import parse_snippets_from_source, structs
my_result: structs.TagsGroup = parse_snippets_from_source("""
<meta property="og:card" content="summary_large_image">
<meta property="og:url" content="https://github.com/">
<meta property="og:title" content="Hello, my friend">
<meta property="og:description" content="Content here, yehehe">
<meta property="twitter:card" content="summary_large_image">
<meta property="twitter:url" content="https://github.com/">
<meta property="twitter:title" content="Hello, my friend">
<meta property="twitter:description" content="Content here, yehehe">
""")
print(my_result)
"""
SnippetGroup(
open_graph=SocialMediaSnippet(
title='Hello, my friend',
description='Content here, yehehe',
image='',
url='https://github.com/'
),
twitter=SocialMediaSnippet(
title='Hello, my friend',
description='Content here, yehehe',
image='',
url='https://github.com/'
)
)
"""
my_result.open_graph.title
my_result.twitter.image
Basic meta tags parsing
Main function is parse_meta_tags_from_source
. It can be used like this:
from meta_tags_parser import parse_meta_tags_from_source, structs
my_result: structs.TagsGroup = parse_meta_tags_from_source("""... html source ...""")
print(my_result)
"""
structs.TagsGroup(
title="...",
twitter=[
structs.OneMetaTag(
name="title", value="Hello",
...
)
],
open_graph=[
structs.OneMetaTag(
name="title", value="Hello",
...
)
],
basic=[
structs.OneMetaTag(
name="title", value="Hello",
...
)
],
other=[
structs.OneMetaTag(
name="article:name", value="Hello",
...
)
]
)
"""
As you can see from this example, we are not using any jelly dicts, only structured dataclasses. Lets see another example:
from meta_tags_parser import parse_meta_tags_from_source, structs
my_result: structs.TagsGroup = parse_meta_tags_from_source("""
<meta property="twitter:card" content="summary_large_image">
<meta property="twitter:url" content="https://github.com/">
<meta property="twitter:title" content="Hello, my friend">
<meta property="twitter:description" content="Content here, yehehe">
""")
print(my_result)
"""
TagsGroup(
title='',
basic=[],
open_graph=[],
twitter=[
OneMetaTag(name='card', value='summary_large_image'),
OneMetaTag(name='url', value='https://github.com/'),
OneMetaTag(name='title', value='Hello, my friend'),
OneMetaTag(name='description', value='Content here, yehehe')
],
other=[]
)
"""
for one_tag in my_result.twitter:
if one_tag.name == "title":
print(one_tag.value)
"""
Hello, my friend
"""
If you want to improve speed
You can specify what you want to parse:
from meta_tags_parser import parse_meta_tags_from_source, structs
result: structs.TagsGroup = parse_meta_tags_from_source("""... source ...""",
what_to_parse=(WhatToParse.TITLE, WhatToParse.BASIC, WhatToParse.OPEN_GRAPH, WhatToParse.TWITTER, WhatToParse.OTHER)
)
If you reduce this tuple of parsing requirements it may increase overall parsing speed.
Important notes
- Any name in meta tag (name or property attribute) will be lowercased
- I decided to strip
og:
and twitter:
from original attributes, and let dataclass structures carry this information. If parser met meta tag with property og:name
, it will be available in my_result
variable as one element of list my_result.open_graph
- Title of page (e.g.
<title>Something</title>
) will be available as string my_result.title
(of course, you recieve Something
)
- «Standart» tags like title, description (check full list here ./meta_tags_parser/structs.py in constant
BASIC_META_TAGS
) will be available as list in my_result.basic
- Other tags will be available as list in
my_result.other
attribute, name of tags will be preserved, unlike og:
/twitter:
behaviour
- If you want structured snippets, use
parse_snippets_from_source
function
Changelog
You can check https://github.com/xfenix/meta-tags-parser/releases/ release page.