tidehunter
HTTP streaming with accurate flow control
Master branch: |Build Status|
NOTE: Not backward compatible with 0.x since 1.x.
Highlights
- Consumption limits, total control over your stream quota just on the
client side.
- Instant on/off switch and accurate consumption counter. Best used
with
techies <https://github.com/woozyking/techies>
__. - Queue interface for scalable stream data consumption. Best used with
techies <https://github.com/woozyking/techies>
__. - Core mechanisms based on the solid
requests <https://github.com/kennethreitz/requests>
__ library,
inherits all its goodness.
Installation
.. code:: bash
$ pip install tidehunter
$ pip install tidehunter --upgrade
Usage
Example 1 (with limit):
**NOTE**: when no external ``Queue`` or ``StateCounter`` supplied,
``Hunter`` uses the Python standard ``Queue`` and the builtin
``SimpleStateCounter`` respectively, which are usually enough for single
process designs and other simple cases.
.. code:: python
from tidehunter import Hunter
# The Hunter!
h = Hunter(url='https://httpbin.org/stream/20')
# Start streaming
h.tide_on(limit=5)
# Consume the data which should be in the data queue now
while h.q.qsize():
print(h.q.get()) # profit x 5
# You can re-use the same Hunter object, with a difference limit
r = h.tide_on(limit=1) # this time we only want one record
assert h.q.qsize() == 1 # or else there's a bug, create an issue!
print(h.q.get()) # more profit
# r is actually just a requests.Response object
print(r.headers)
print(r.status_code)
# ... read up on requests library for more information
Example 2 (without limit):
NOTE: this example uses techies
and therefore requires Redis
installed.
Assume you have a process running the following code:
.. code:: python
from techies import StateCounter, Queue
from tidehunter import Hunter
# The data queue
q = Queue(key='demo_q', host='localhost', port=6379, db=0)
# The state machine and record counter (state counter)
sc = StateCounter(key='demo_sc', host='localhost', port=6379, db=0)
# The Hunter!
h = Hunter(url='SOME_ENDLESS_STREAM_LIKE_TWITTER_FIREHOSE', q=q, sc=sc)
# Start streaming, FOREVA
h.tide_on()
Then you delegate the flow control and data consumption to another/many
other processes such as:
.. code:: python
from techies import StateCounter, Queue
# The key is to have the SAME state counter
sc = StateCounter(key='demo_sc', host='localhost', port=6379, db=0)
# And the SAME data queue
q = Queue(key='demo_q', host='localhost', port=6379, db=0)
while sc.started:
data = q.get() # dequeue and
# ...do something with data
if SHT_HITS_THE_FAN:
sc.stop() # instant off switch
# end of this loop, as well as the streaming process from above
# If needed
q.clear()
sc.clear()
Example 3 (OAuth with Twitter Sample Firehose):
**NOTE**: this example requires ``requests_oauthlib``
.. code:: python
import os
import json
from requests_oauthlib import OAuth1
from tidehunter import Hunter
url = 'https://stream.twitter.com/1.1/statuses/sample.json'
auth = OAuth1(
os.environ['TWITTER_CONSUMER_KEY'],
os.environ['TWITTER_CONSUMER_SECRET'],
os.environ['TWITTER_TOKEN_KEY'],
os.environ['TWITTER_TOKEN_SECRET']
)
h = Hunter(url=url, q=q, auth=auth)
r = h.tide_on(5) # let's just get 5 for now
print(r.status_code)
print('')
while h.q.qsize():
print(json.loads(h.q.get()))
print('')
You can find other authentications on `this requests
doc <http://docs.python-requests.org/en/latest/user/authentication/>`__.
In short, all you have to do is to pass the desired ``auth`` parameter
to ``Hunter``, like what you would do with ``requests``.
Test (Unit Tests)
=================
.. code:: bash
$ pip install -r requirements.txt
$ pip install -r test_requirements.txt
$ nosetests --with-coverage --cover-package=tidehunter
License
=======
The MIT License (MIT). See the full
`LICENSE <https://github.com/woozyking/tidehunter/blob/master/LICENSE>`__.
.. |Build Status| image:: https://travis-ci.org/woozyking/tidehunter.png?branch=master
:target: https://travis-ci.org/woozyking/tidehunter
Contributors
------------
- `Runzhou Li (Leo) <https://github.com/woozyking>`__
Changelog
---------
1.0.1 (2015-04-17)
~~~~~~~~~~~~~~~~~~
- **Breaking Change**: ``tidehunter.SimpleStateCounter`` updated to
reflect ```techies`` <https://github.com/woozyking/techies>`__ 0.2.0
changes on ``StateCounter``.
1.0.0 (2014-01-22)
~~~~~~~~~~~~~~~~~~
- Moved codebase of ``Queue``, ``StateCounter`` to
`techies <https://github.com/woozyking/techies>`__. It's recommended
to use ``techies`` together with ``tidehunter``, but not always
required, and therefore not a dependency of ``tidehunter``
- Added ``tidehunter.SimpleStateCounter`` to be used when no other
state counter provided. It's a pure in-process implementation and
therefore cannot be accessed by other processes
- You can now do ``from tidehunter import Hunter`` instead of
``from tidehunter.stream import Hunter``
- Replaced ``PycURL`` with
`requests <https://github.com/kennethreitz/requests>`__. Some of the
benefits:
- Straight Python 2/3 support
- Much cleaner implementation
- Further delegation of `various authentications
support <http://docs.python-requests.org/en/latest/user/authentication/>`__
to ``requests`` itself
0.1.9 (2013-12-24)
~~~~~~~~~~~~~~~~~~
- ``PyCurl`` and ``Redis`` Python libraries bumped to the latest
versions.
- ``Queue`` now is **almost** Python Queue compatible (in a complaint
free fashion), with the exception of ``Queue.full`` which always
returns ``False``; ``Queue.task_done`` and ``Queue.join`` do nothing.
- NEW: Both ``Queue`` and ``StateCounter`` now have a ``clear`` method
which performs a Redis ``DEL`` command on the said key and
reinitialize based on each class's ``initialize`` method.
0.1.8 (2013-10-02)
~~~~~~~~~~~~~~~~~~
- Added alias methods ``put_nowait()`` and ``get_nowait()`` and other
place holders to map the Python built-in Queue interfaces.
- Added ``rstgen`` shell script for Markdown to reStructuredText. To
use, run ``$ source rstgen`` in the root folder.
- Credentials involved in unit tests and demo are now using environment
variables.
0.1.7 (2013-07-22)
~~~~~~~~~~~~~~~~~~
- Massive update to README.rst
- Fixed PyPi rendering of long description.
0.1.5 (2013-07-22)
~~~~~~~~~~~~~~~~~~
- NEW: ``Hunter.tide_on()`` now accepts an optional limit parameter for
on the fly limit adjustment. The adjustment is not permanent, meaning
if you want to reuse the same Hunter object, the old limit (or
default None) is in effect.
- Fixed a potential issue of Hunter puts in more records than desired
limit.
- Added temp Basic Auth test case (no stream, need to find a better
source).
0.1.3 (2013-07-13)
~~~~~~~~~~~~~~~~~~
- Use the great httpbin.org (by Kenneth Reitz) for unit test now.
- Auth (oauth or basic) is no longer required, as long as the target
stream server supports access without auth.
0.1.2 (2013-07-12)
~~~~~~~~~~~~~~~~~~
- Include CHANGES (changelog) to be shown on PyPi.
- Use with statement to open files for setup.py.
- Added the first
`demo <https://github.com/amoa/tidehunter/tree/master/demo>`__.
0.1.1 (2013-07-12)
~~~~~~~~~~~~~~~~~~
- Clean up setup.py to ensure requirements are installed/updated.
0.1.0 (2013-07-12)
~~~~~~~~~~~~~~~~~~
- Initial release