===========
psycopg2_mq
.. image:: https://img.shields.io/pypi/v/psycopg2_mq.svg
:target: https://pypi.org/pypi/psycopg2_mq
.. image:: https://github.com/mmerickel/psycopg2_mq/workflows/Build%20and%20test/badge.svg?branch=main
:target: https://github.com/mmerickel/psycopg2_mq/actions?query=workflow%3A%22Build+and+test%22
:alt: main CI Status
psycopg2_mq
is a message queue implemented on top of
PostgreSQL <https://www.postgresql.org/>
,
SQLAlchemy <https://www.sqlalchemy.org/>
, and
psycopg2 <http://initd.org/psycopg/>
__.
Currently the library provides only the low-level constructs that can be used
to build a multithreaded worker system. It is broken into two components:
-
psycopg2_mq.MQWorker
- a reusable worker object that manages a
single-threaded worker that can accept jobs and execute them. An application
should create worker per thread. It supports an API for thread-safe graceful
shutdown.
-
psycopg2_mq.MQSource
- a source object providing a client-side API for
invoking and querying job states.
It is expected that these core components are then wrapped into your own
application in any way that you see fit, without dictating a specific CLI
or framework.
Data Model
Queues
Workers run jobs defined in queues. Currently each queue will run jobs
concurrently, while a future version may support serial execution on a
per-queue basis. Each registered queue should contain an execute_job(job)
method.
Jobs
The execute_job
method of a queue is passed a Job
object containing
the following attributes:
As a convenience, there is an extend(**kw)
method which can be used to
add extra attributes to the object. This is useful in individual queues to
define a contract between a queue and its methods.
Cursors
A Job
can be scheduled with a cursor_key
. There can only be one
pending job and one running job for any cursor. New jobs scheduled while
another one is pending will be ignored and the pending job is returned.
A job.cursor
dict is provided to the workers containing the cursor data,
and is saved back to the database when the job is completed. This effectively
gives jobs some persistent, shared state, and serializes all jobs over a given
cursor.
Delayed Jobs
A Job
can be delayed to run in the future by providing a datetime
object to the when
argument. This, along with a cursor key, can provide a
nice throttle on how frequently a job runs. For example, delay jobs to run
in 30 seconds with a cursor_key
and any jobs that are scheduled in the
meantime will be dropped. The assumption here is that the arguments are
constant and data for execution is in the cursor or another table. As a last
resort, a conflict_resolver
callback can be used to modify properties of
the job when arguments cannot be constant.
Schedules
A JobSchedule
can be defined which supports
RFC 5545 <https://tools.ietf.org/html/rfc5545>
__ RRULE
schedules. These
are powerful and can support timezones, frequencies based on calendars as well
as simple recurring rules from an anchor time using DTSTART
. Cron jobs
can be converted to this syntax for simpler scenarios.
psycopg2-mq
workers will automatically negotiate which worker is responsible
for managing schedules so clustered workers should operate as expected.
Example Worker
.. code-block:: python
from psycopg2_mq import (
MQWorker,
make_default_model,
)
from sqlalchemy import (
MetaData,
create_engine,
)
import sys
class EchoQueue:
def execute_job(self, job):
return f'hello, {job.args["name"]} from method="{job.method}"'
if __name__ == '__main__':
engine = create_engine(sys.argv[1])
metadata = MetaData()
model = make_default_model(metadata)
worker = MQWorker(
engine=engine,
queues={
'echo': EchoQueue(),
},
model=model,
)
worker.run()
Example Source
.. code-block:: python
engine = create_engine()
metadata = MetaData()
model = make_default_model(metadata)
session_factory = sessionmaker()
session_factory.configure(bind=engine)
dbsession = session_factory()
with dbsession.begin():
mq = MQSource(
dbsession=dbsession,
model=model,
)
job = mq.call('echo', 'hello', {'name': 'Andy'})
print(f'queued job={job.id}')
Changes
0.9 (2023-04-21)
-
Add support for Python 3.10, and 3.11.
-
[breaking] Prevent retrying of collapsible jobs. Require them to be invoked
using call
instead for an opportunity to specify a conflict_resolver
.
-
Fix a bug in the default model schema in which the collapsible database
index was not marked unique.
-
Copy trace info when retrying a job.
-
Capture the stringified exception to the job result in the message
key,
alongside the existing tb
, exc
, and args
keys.
-
The worker was not recognizing capture_signals=False
, causing problems
when running the event loop in other threads.
-
Blackify the codebase and add some real tests. Yay!
0.8.3 (2022-04-15)
- [breaking] Remove
MQWorker.make_job_context
.
0.8.2 (2022-04-15)
0.8.1 (2022-04-15)
-
Ensure the trace
attribute is populated on the JobContext
.
-
Add MQWorker.make_job_context
which can be defined to completely override
the JobContext
factory using the Job
object and open database session.
0.8.0 (2022-04-15)
-
[breaking] Add update_job_id
foreign key to the JobCursor
model to
make it possible to know which job last updated the value in the cursor.
-
[breaking] Add trace
json blob to the Job
model.
-
Support a trace
json blob when creating new jobs. This value is available
on the running job context and can be used when creating sub-jobs or when
making requests to external systems to pass through tracing metadata.
See MQSource.call
's new trace
parameter when creating jobs.
See JobContext.trace
attribute when handling jobs.
-
Add a standard FailedJobError
exception which can be raised by jobs to
mark a failure with a custom result object. This is different from unhandled
exceptions that cause the MQWorker.result_from_error
method to be invoked.
0.7.0 (2022-03-03)
- Fix a corner case with lost jobs attached to cursors. In scenarios where
multiple workers are running, if one loses a database connection then the
other is designed to notice and mark jobs lost. However, it's possible the
job is not actually lost and the worker can then recover after resuming
its connection, and marking the job running again. In this situation, we
do not want another job to begin on the same cursor. To fix this issue,
new jobs will not be run if another job is marked lost on the same cursor.
You will be required to recover the job by marking it as not lost (probably
failed) first to unblock the rest of the jobs on the cursor.
0.6.2 (2022-03-01)
- Prioritize maintenance work higher than running new jobs.
There was a chicken-and-egg issue where a job would be marked running
but needs to be marked lost. However marking it lost is lower priority than
trying to start new jobs. In the case where a lot of jobs were scheduled
at the same time, the worker always tried to start new jobs and didn't
run the maintenance so the job never got marked lost, effectively blocking
the queue.
0.6.1 (2022-01-15)
- Fix a bug introduced in the 0.6.0 release when scheduling new jobs.
0.6.0 (2022-01-14)
-
[breaking] Add model changes to mark jobs as collapsible.
-
[breaking] Add model changes to the cursor index.
-
Allow multiple pending jobs to be scheduled on the same cursor if either:
-
The queue or method are different from existing pending jobs on the cursor.
-
collapse_on_cursor
is set to False
when scheduling the job.
0.5.7 (2021-03-07)
- Add a
schedule_id
attribute to the job context for use in jobs that want
to know whether they were executed from a schedule or not.
0.5.6 (2021-02-28)
- Some UnicodeDecodeError exceptions raised from jobs could trigger a
serialization failure (UntranslatableCharacter) because it would contain
the sequence ``\u0000``` which, while valid in Python, is not allowed
in postgres. So when dealing with the raw bytes, we'll decode it with
the replacement character that can be properly stored. Not ideal, but
better than failing to store the error at all.
0.5.5 (2021-01-22)
- Fixed some old code causing the worker lock to release after a job
completed.
0.5.4 (2021-01-20)
- Log at the error level when marking a job as lost.
0.5.3 (2021-01-11)
- Copy the
schedule_id
information to retried jobs.
0.5.2 (2021-01-11)
- [breaking] Require
call_schedule
to accept an id instead of an object.
0.5.1 (2021-01-09)
- Drop the
UNIQUE
constraint on the background job lock_id
column.
0.5 (2021-01-09)
0.4.5 (2020-12-22)
- Use column objects in the insert statement to support ORM-level synonyms,
enabling the schema to have columns with different names.
0.4.4 (2019-11-07)
- Ensure the advisory locks are released when a job completes.
0.4.3 (2019-10-31)
- Ensure maintenance (finding lost jobs) always runs at set intervals defined
by the
timeout
parameter.
0.4.2 (2019-10-30)
- Recover active jobs when the connection is lost by re-locking them
and ensuring they are marked running.
0.4.1 (2019-10-30)
- Attempt to reconnect to the database after losing the connection.
If the reconnect attempt fails then crash.
0.4 (2019-10-28)
-
Add a worker
column to the Job
model to track what worker
is handling a job.
-
Add an optional name
argument to MQWorker
to name the worker -
the value will be recorded in each job.
-
Add a threads
argument (default=1
) to MQWorker
to support
handling multiple jobs from the same worker instance instead of making a
worker per thread.
-
Add capture_signals
argument (default=True
) to MQWorker
which
will capture SIGTERM
, SIGINT
and SIGUSR1
. The first two will
trigger graceful shutdown - they will make the process stop handling new
jobs while finishing active jobs. The latter will dump to stderr
a
JSON dump of the current status of the worker.
0.3.3 (2019-10-23)
- Only save a cursor update if the job is completed successfully.
0.3.2 (2019-10-22)
- Mark lost jobs during timeouts instead of just when a worker starts in order
to catch them earlier.
0.3.1 (2019-10-17)
-
When attempting to schedule a job with a cursor and a scheduled_time
earlier than a pending job on the same cursor, the job will be updated to
run at the earlier time.
-
When attempting to schedule a job with a cursor and a pending job already
exists on the same cursor, a conflict_resolver
function may be
supplied to MQSource.call
to update the job properties, merging the
arguments however the user wishes.
0.3 (2019-10-15)
- Add a new column
cursor_snapshot
to the Job
model which will
contain the value of the cursor when the job begins.
0.2 (2019-10-09)
- Add cursor support for jobs. This requires a schema migration to add
a
cursor_key
column, a new JobCursor
model, and some new indices.
0.1.6 (2019-10-07)
- Support passing custom kwargs to the job in
psycopg2_mq.MQSource.call
to allow custom columns on the job table.
0.1.5 (2019-05-17)
- Fix a regression when serializing errors with strings or cycles.
0.1.4 (2019-05-09)
- More safely serialize exception objects when jobs fail.
0.1.3 (2018-09-04)
- Rename the thread to contain the job id while it's handling a job.
0.1.2 (2018-09-04)
- Rename
Job.params
to Job.args
.
0.1.1 (2018-09-04)
- Make
psycopg2
an optional dependency in order to allow apps to depend
on psycopg2-binary
if they wish.
0.1 (2018-09-04)