================
python-processor
Badges
| |docs| |changelog| |travis| |coveralls| |landscape| |scrutinizer|
| |version| |downloads| |wheel| |supported-versions| |supported-implementations|
.. |docs| image:: https://readthedocs.org/projects/python-processor/badge/?style=flat
:target: https://readthedocs.org/projects/python-processor
:alt: Documentation Status
.. |changelog| image:: http://allmychanges.com/p/python/processor/badge/
:target: http://allmychanges.com/p/python/processor/?utm_source=badge
:alt: Release Notes
.. |travis| image:: http://img.shields.io/travis/svetlyak40wt/python-processor/master.png?style=flat
:alt: Travis-CI Build Status
:target: https://travis-ci.org/svetlyak40wt/python-processor
.. |coveralls| image:: http://img.shields.io/coveralls/svetlyak40wt/python-processor/master.png?style=flat
:alt: Coverage Status
:target: https://coveralls.io/r/svetlyak40wt/python-processor
.. |landscape| image:: https://landscape.io/github/svetlyak40wt/python-processor/master/landscape.svg?style=flat
:target: https://landscape.io/github/svetlyak40wt/python-processor/master
:alt: Code Quality Status
.. |version| image:: http://img.shields.io/pypi/v/processor.png?style=flat
:alt: PyPI Package latest release
:target: https://pypi.python.org/pypi/processor
.. |downloads| image:: http://img.shields.io/pypi/dm/processor.png?style=flat
:alt: PyPI Package monthly downloads
:target: https://pypi.python.org/pypi/processor
.. |wheel| image:: https://pypip.in/wheel/processor/badge.png?style=flat
:alt: PyPI Wheel
:target: https://pypi.python.org/pypi/processor
.. |supported-versions| image:: https://pypip.in/py_versions/processor/badge.png?style=flat
:alt: Supported versions
:target: https://pypi.python.org/pypi/processor
.. |supported-implementations| image:: https://pypip.in/implementation/processor/badge.png?style=flat
:alt: Supported imlementations
:target: https://pypi.python.org/pypi/processor
.. |scrutinizer| image:: https://img.shields.io/scrutinizer/g/svetlyak40wt/python-processor/master.png?style=flat
:alt: Scrtinizer Status
:target: https://scrutinizer-ci.com/g/svetlyak40wt/python-processor/
Simple rules
Python processor is a tool for creating chained pipelines for dataprocessing.
It have very few key concepts:
Data object
Any python dict with two required fields: source
and type
.
Source
An iterable sequence of data objects
or a function which returns data objects
.
See full list of sources
_ in the docs.
Output
A function which accepts a data object
as input and could output another. See full list of outputs
_ in the docs.
(or same) data object
as result.
Predicate
Pipeline consists from sources outputs, but predicate
decides which
data object
should be processed by which output
.
Quick example
Here is example of pipeline which reads IMAP folder and sends all emails to Slack chat:
.. code:: python
run_pipeline(
sources.imap('imap.gmail.com'
'username',
'password'
'INBOX'),
[prepare_email_for_slack, outputs.slack(SLACK_URL)])
Here you construct a pipeline, which uses sources.imap
for reading imap folder
"INBOX" of username@gmail.com
. In more complex case outputs.fanout
can be used for routing dataobjects to different processors and sources.mix
can
be used to merge items two or more sources into a one stream.
Functions prepare_email_to_slack
and outputs.slack(SLACK_URL)
are processors. First one
is a simple function which accepts data object, returned by imap source and transforming
it to the data object which could be used by slack.output. We need that because slack
requires a different set of fields. Call to outputs.slack(SLACK_URL)
returns a
function which gets an object and send it to the specified Slack's endpoint.
It is just example, for working snippets, continue reading this documention ;-)
.. Note:: By the way, did you know there is a Lisp dialect which runs on Python
virtual machine? It's name is HyLang, and python processor is written in this
language.
Installation
Create a virtual environment with python3:::
virtualenv --python=python3 env
source env/bin/activate
Install required version of hylang (this step is necessary because Hy syntax is not
final yet and frequently changed by language maintainers):::
pip install -U 'git+git://github.com/hylang/hy.git@a3bd90390cb37b46ae33ce3a73ee84a0feacce7d#egg=hy'
If you are on OSX, then install lxml on OSX separately:::
STATIC_DEPS=true pip install lxml
If you want to access IMAP over SSL on OSX, then you need to install
openssl
via homebrew, and then install pyopenssl
like this:::
brew install openssl
env LDFLAGS="-L$(brew --prefix openssl)/lib"
CFLAGS="-I$(brew --prefix openssl)/include"
pip install -U --force-reinstall pyopenssl
Then install the processor
:::
pip install processor
Usage
Now create an executable python script, where you'll place your pipline's configuration.
For example, this simple code creates a process line which searches new results in Twitter
and outputs them to console. Of cause, you can output them not only to console, but also
post by email, to Slack chat or everywhere else if there is an output for it:
.. code:: python
#!env/bin/python3
import os
from processor import run_pipeline, sources, outputs
from twiggy_goodies.setup import setup_logging
for_any_message = lambda msg: True
def prepare(tweet):
return {'text': tweet['text'],
'from': tweet['user']['screen_name']}
setup_logging('twitter.log')
run_pipeline(
sources=[sources.twitter.search(
'My Company',
consumer_key='***', consumer_secret='***',
access_token='***', access_secret='***',
)],
rules=[(for_any_message, [prepare, outputs.debug()])])
Running this code, will fetch new results for search by query My Company
and output them on the screen. Of course, you could use any other output
,
supported by the processor
. Browse online documentation to find out
which sources and outputs are supported and for to configure them.
.. _full list of sources: sources.html
.. _full list of outputs: outputs.html
Ideas for Sources and Outputs
web-hook
endpoint (in progress)
.tail
source which reads file and outputs lines appeared in a file between invocations
or is able to emulate tail -f
behaviour. Python module
tailer <https://pypi.python.org/pypi/tailer/>
_ could be used here.grep
output -- a filter to grep some fields using patterns. With tail
and grep
you could build a pipeline which watch on a log and send errors by email or to the chat.xmpp
output.irc
output.rss/atom feed reader
.weather
source which tracks tomorrow's weather forecast and outputs a message if it was
changed significantly, for example from "sunny" to "rainy".github
some integrations with github API?jira
or other task tracker of your choice?suggest your ideas!
Documentation
https://python-processor.readthedocs.org/
Development
To run the all tests run::
tox
Authors
Changelog
0.10.0 (2016-01-04)
- IMAP source was fixed to work with new IMAPClient's API and
support
IMAPClient > 1.0.0
. - Datastorage was fixed to get
filename
from PROCESSOR_DB
environment variable in case if it was setup using
os.environ['PROCESSOR_DB'] = 'some.db'
after the imports.
0.9.0 (2015-12-06)
Code was fixed to work with HyLang from a3bd90390cb37b46ae33ce3a73ee84a0feacce7d
commit. Please, use this pinned version of HyLang and subscribe
_ on future
release notes to know when this requirement will change.
.. _subscribe: https://allmychanges.com/p/python/processor/
0.8.0 (2015-11-16)
- Code was fixed to work with latest Hy, from GitHub.
- Added
twitter.mentions
source, to read stream of mentions from the Twitter. - Fixed a way how number of messages from IMAP folder is limited. Previously
limit was applied even when we already know an ID of the last seen message,
but now limit is ignored in this case and only applied when visiting the
folder first time.
0.7.0 (2015-05-05)
New output – XMPP was added and now processor is able
to notify Jabber users.
0.6.0 (2015-05-01)
The biggest change in this release is a new source – github.releases
.
It is able to read all new releases in given repository and send them into
processing pipeline. This works as for public repositories, and for private
too. Read the docs
_ for futher details.
.. _Read the docs: https://python-processor.readthedocs.org/en/latest/sources.html#github-releases
Other changes are:
- Storage backend now saves JSON database nicely pretty printed for you could read and edit it in your favorite editor. This is Emacs, right?
- Twitter.search source now saves state after the tweet was processed. This way processor shouldn't loose tweets if there was exception somewhere in processing pipeline.
- IMAP source was fixed and now is able to fetch emails from really big folders.
0.5.0 (2015-04-15)
Good news, everyone! New output was added - email
.
Now Processor is able to notify you via email about any event.
0.4.0 (2015-04-06)
- Function
run_pipline
was simplified and now accepts only one source and one ouput.
To implement more complex pipelines, use sources.mix
and outputs.fanout
helpers.
0.3.0 (2015-04-01)
- Added a
web.hook
_ source. - Now
source
could be not only a iterable object, but any function which returns values.
.. _web.hook: https://python-processor.readthedocs.org/en/latest/sources.html#web-hook
0.2.1 (2015-03-30)
Fixed error in import-or-error
macro, which prevented from using 3-party libraries.
0.2.0 (2015-03-30)
Most 3-party libraries are optional now. If you want to use
some extension which requires external library, it will issue
an error and call sys.exit(1)
until you satisfy this
requirement.
This should make life easier for thouse, who does not want
to use rss
output which requires feedgen
which requires
lxml
which is hard to build because it is C extension.
0.1.0 (2015-03-18)