asgi-sitemaps
Sitemap generation for ASGI applications. Inspired by Django's sitemap framework.
Contents
Features
- Build and compose sitemap sections into a single dynamic ASGI endpoint.
- Supports drawing sitemap items from a variety of sources (static lists, (async) ORM queries, etc).
- Compatible with any ASGI framework.
- Fully type annotated.
- 100% test coverage.
Installation
Install with pip:
$ pip install 'asgi-sitemaps==1.*'
asgi-sitemaps
requires Python 3.7+.
Quickstart
Let's build a static sitemap for a "Hello, world!" application. The sitemap will contain a single URL entry for the home /
endpoint.
Here is the project file structure:
.
└── server
├── __init__.py
├── app.py
└── sitemap.py
First, declare a sitemap section by subclassing Sitemap
, then wrap it in a SitemapApp
:
import asgi_sitemaps
class Sitemap(asgi_sitemaps.Sitemap):
def items(self):
return ["/"]
def location(self, item: str):
return item
def changefreq(self, item: str):
return "monthly"
sitemap = asgi_sitemaps.SitemapApp(Sitemap(), domain="example.io")
Now, register the sitemap
endpoint as a route onto your ASGI app. For example, if using Starlette:
from starlette.applications import Starlette
from starlette.responses import PlainTextResponse
from starlette.routing import Route
from .sitemap import sitemap
async def home(request):
return PlainTextResponse("Hello, world!")
routes = [
Route("/", home),
Route("/sitemap.xml", sitemap),
]
app = Starlette(routes=routes)
Serve the app using $ uvicorn server.app:app
, then request the sitemap:
curl http://localhost:8000/sitemap.xml
<?xml version="1.0" encoding="utf-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://example.io/</loc>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>
</urlset>
Tada!
To learn more:
- See How-To for more advanced usage, including splitting the sitemap in multiple sections, and dynamically generating entries from database queries.
- See the
Sitemap
API reference for all supported sitemap options.
How-To
Sitemap sections
You can combine multiple sitemap classes into a single sitemap endpoint. This is useful to split the sitemap in multiple sections that may have different items()
and/or sitemap attributes. Such sections could be static pages, blog posts, recent articles, etc.
To do so, declare multiple sitemap classes, then pass them as a list to SitemapApp
:
import asgi_sitemaps
class StaticSitemap(asgi_sitemaps.Sitemap):
...
class BlogSitemap(asgi_sitemaps.Sitemap):
...
sitemap = asgi_sitemaps.SitemapApp([StaticSitemap(), BlogSitemap()], domain="example.io")
Entries from each sitemap will be concatenated when building the final sitemap.xml
.
Dynamic generation from database queries
Sitemap.items()
supports consuming any async iterable. This means you can easily integrate with an async database client or ORM so that Sitemap.items()
fetches and returns relevant rows for generating your sitemap.
Here's an example using Databases, assuming you have a Database
instance in server/resources.py
:
import asgi_sitemaps
from .resources import database
class Sitemap(asgi_sitemaps.Sitemap):
async def items(self):
query = "SELECT permalink, updated_at FROM articles;"
return await database.fetch_all(query)
def location(self, row: dict):
return row["permalink"]
Advanced web framework integration
While asgi-sitemaps
is framework-agnostic, you can use the .scope
attribute available on Sitemap
instances to feed the ASGI scope into your framework-specific APIs for inspecting and manipulating request information.
Here is an example with Starlette where we build sitemap of static pages. To decouple from the raw URL paths, pages are referred to by view name. We reverse-lookup their URLs by building a Request
instance from the ASGI .scope
, and using .url_for()
:
import asgi_sitemaps
from starlette.datastructures import URL
from starlette.requests import Request
class StaticSitemap(asgi_sitemaps.Sitemap):
def items(self):
return ["home", "about", "blog:home"]
def location(self, name: str):
request = Request(scope=self.scope)
url = request.url_for(name)
return URL(url).path
The corresponding Starlette routing table could look something like this:
from starlette.routing import Mount, Route
from . import views
from .sitemap import sitemap
routes = [
Route("/", views.home, name="home"),
Route("/about", views.about, name="about"),
Route("/blog/", views.blog_home, name="blog:home"),
Route("/sitemap.xml", sitemap),
]
API Reference
class Sitemap
Represents a source of sitemap entries.
You can specify the type T
of sitemap items for extra type safety:
import asgi_sitemaps
class MySitemap(asgi_sitemaps.Sitemap[str]):
...
async items
Signature: async def () -> Union[Iterable[T], AsyncIterable[T]]
(Required) Return an iterable or an asynchronous iterable of items of the same type. Each item will be passed as-is to .location()
, .lastmod()
, .changefreq()
, and .priority()
.
Examples:
def items(self) -> List[str]:
return ["/", "/contact"]
async def items(self) -> List[dict]:
query = "SELECT permalink, updated_at FROM pages;"
return await database.fetch_all(query)
async def items(self) -> AsyncIterator[dict]:
query = "SELECT permalink, updated_at FROM pages;"
async for row in database.aiter_rows(query):
yield row
location
Signature: def (item: T) -> str
(Required) Return the absolute path of a sitemap item.
"Absolute path" means an URL path without a protocol or domain. For example: /blog/my-article
. (So https://mydomain.com/blog/my-article
is not a valid location, nor is mydomain.com/blog/my-article
.)
lastmod
Signature: def (item: T) -> Optional[datetime.datetime]
(Optional) Return the date of last modification of a sitemap item as a datetime
object, or None
(the default) for no lastmod
field.
changefreq
Signature: def (item: T) -> Optional[str]
(Optional) Return the change frequency of a sitemap item.
Possible values are:
None
- No changefreq
field (the default)."always"
"hourly"
"daily"
"weekly"
"monthly"
"yearly"
"never"
priority
Signature: def (item: T) -> float
(Optional) Return the priority of a sitemap item. Must be between 0 and 1. Defaults to 0.5
.
protocol
Type: str
(Optional) This attribute defines the protocol used to build URLs of the sitemap.
Possible values are:
"auto"
- The protocol with which the sitemap was requested (the default)."http"
"https"
scope
This property returns the ASGI scope of the current HTTP request.
class SitemapApp
An ASGI application that responds to HTTP requests with the sitemap.xml
contents of the sitemap.
Parameters:
- (Required)
sitemaps
- A Sitemap
object or a list of Sitemap
objects, used to generate sitemap entries. - (Required)
domain
- The domain to use when generating sitemap URLs.
Examples:
sitemap = SitemapApp(Sitemap(), domain="mydomain.com")
sitemap = SitemapApp([StaticSitemap(), BlogSitemap()], domain="mydomain.com")
License
MIT
Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog.
1.0 - 2022-02-13
Added
- Now marked as Production/Stable software. (Pull #14)
- Add official support for Python 3.9 and Python 3.10. (Pull #13)
0.3.2 - 2020-07-07
Fixed
- Fix support for async items. (Pull #9)
0.3.1 - 2020-07-05
Fixed
- Fix
Scope
type hint: values are now Any
.
0.3.0 - 2020-07-05
This release changes the approach from "scrape the ASGI app to gather URLs" to a programmatic class-based API inspired by Django's sitemap framework.
As such, the command line application does not exist anymore. Users are expected to define Sitemap
classes, compose them into a SitemapApp
endpoint, and add that to their ASGI app routing table.
See the new README.md
documentation for more information.
Changed
- Switch to a class-based dynamic endpoint API. (Pull #4)
0.2.0 - 2020-06-01
Changed
- Project was renamed from
sitemaps
to asgi-sitemaps
- sitemap generation for ASGI apps. (Pull #2) - Change options of CLI and programmatic API to fit new "ASGI-only" project scope. (Pull #2)
- CLI now reads from stdin (for
--check
mode) and outputs sitemap to stdout. (Pull #2)
Removed
- Drop support for crawling arbitrary remote servers. (Pull #2)
Fixed
- Don't include non-200 or non-HTML URLs in sitemap. (Pull #2)
0.1.0 - 2020-05-31
Added
- Initial implementation: CLI and programmatic async API.