webpreview
For a given URL, webpreview
extracts its title, description, and image url using
Open Graph, Twitter Card, or
Schema meta tags, or, as an alternative, parses it as a generic webpage.
Installation
pip install webpreview
Usage
Use the generic webpreview
method (added in v1.7.0) to parse the page independent of its nature.
This method fetches a page and tries to extracts a title, description, and a preview image from it.
It first attempts to parse the values from Open Graph properties, then it falls back to
Twitter Card format, and then to Schema. If none of these methods succeed in extracting all
three properties, then the web page's content is parsed using a generic HTML parser.
>>> from webpreview import webpreview
>>> p = webpreview("https://en.wikipedia.org/wiki/Enrico_Fermi")
>>> p.title
'Enrico Fermi - Wikipedia'
>>> p.description
'Italian-American physicist (1901–1954)'
>>> p.image
'https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Enrico_Fermi_1943-49.jpg/1200px-Enrico_Fermi_1943-49.jpg'
>>> p["url"] == p.url
True
>>> p.is_complete()
True
>>> content = """
<html>
<head>
<title>The Dormouse's story</title>
<meta property="og:description" content="A Mad Tea-Party story" />
</head>
<body>
<p class="title"><b>The Dormouse's story</b></p>
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>
</body>
</html>
"""
>>> webpreview("aa.com", content=content)
WebPreview(url="http://aa.com", title="The Dormouse's story", description="A Mad Tea-Party story")
Using the command line
When webpreview
is installed via pip
, then the accompanying command-line tool is
installed alongside.
$ webpreview https://en.wikipedia.org/wiki/Enrico_Fermi
title: Enrico Fermi - Wikipedia
description: Italian-American physicist (1901–1954)
image: https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Enrico_Fermi_1943-49.jpg/1200px-Enrico_Fermi_1943-49.jpg
$ webpreview https://github.com/ --absolute-url
title: GitHub: Where the world builds software
description: GitHub is where over 83 million developers shape the future of software, together.
image: https://github.githubassets.com/images/modules/site/social-cards/github-social.png
Using compatibility API
Before v1.7.0 the package mainly exposed a different set of the API methods.
All of them are supported and may continue to be used.
from webpreview import web_preview
title, description, image = web_preview("aurl.com")
title, description, image = web_preview("a_slow_url.com", timeout=1000)
headers = {'User-Agent': 'Mozilla/5.0'}
title, description, image = web_preview("a_slow_url.com", headers=headers)
content = """<html><head><title>Dummy HTML</title></head></html>"""
title, description, image = web_preview("aurl.com", content=content)
title, description, image = web_preview("aurl.com", content=content, parser='lxml')
Run with Docker
The docker image can be built and ran similarly to the command line.
The default entry point is the webpreview
command-line function.
$ docker build -t webpreview .
$ docker run -it --rm webpreview "https://en.m.wikipedia.org/wiki/Enrico_Fermi"
title: Enrico Fermi - Wikipedia
description: Enrico Fermi (Italian: [enˈriːko ˈfermi]; 29 September 1901 – 28 November 1954) was an Italian (later naturalized American) physicist and the creator of the world's first nuclear reactor, the Chicago Pile-1. He has been called the "architect of the nuclear age"[1] and the "architect of the atomic bomb".
image: https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Enrico_Fermi_1943-49.jpg/1200px-Enrico_Fermi_1943-49.jpg
Note: built docker image weighs around 210MB.
Testing
# Execute the tests
poetry run pytest webpreview
# OR execute until the first failed test
poetry run pytest webpreview -x
Setting up development environment
# Install a correct minimal supported version of python
pyenv install 3.7.13
# Create a virtual environment
# By default, the project already contains a .python-version file that points
# to 3.7.13.
python -m venv .venv
# Install dependencies
# Poetry will automatically install them into the local .venv
poetry install
# If you have errors likes this:
ERROR: Can not execute `setup.py` since setuptools is not available in the build environment.
# Then do this:
.venv/bin/pip install --upgrade setuptools