Legacy MediaCloud Python API Client
This is a python client for accessing the MediaCloud API v2.
We support Python versions 2.7 and 3.6.
Related work:
This package exists to mantain access to our legacy search interface while we build out tooling and reliability for our new system. API keys for the new search tools will not work for this API.
Usage
First sign up for an API key. Then
pip install mediacloud-api-legacy
Check CHANGELOG.md
for a detailed history of changes.
Examples
Find out how many stories in the top US online news sites mentioned "Zimbabwe" in the last year:
import mediacloud.api
mc = mediacloud.api.MediaCloud('MY_API_KEY')
res = mc.storyCount('zimbabwe AND president AND tags_id_media:58722749', 'publish_date:[NOW-1YEAR TO NOW]')
print(res['count'])
Get 2000 stories from the NYT about a topic in 2018 and dump the output to json:
import mediacloud.api, json, datetime
mc = mediacloud.api.MediaCloud('MY_API_KEY')
fetch_size = 500
stories = []
last_processed_stories_id = 0
while len(stories) < 2000:
fetched_stories = mc.storyList('trump AND "north korea" AND media_id:1',
solr_filter=mc.dates_as_query_clause(datetime.date(2018,1,1), datetime.date(2019,1,1)),
last_processed_stories_id=last_processed_stories_id, rows= fetch_size)
stories.extend(fetched_stories)
if len( fetched_stories) < fetch_size:
break
last_processed_stories_id = stories[-1]['processed_stories_id']
print(json.dumps(stories))
Find the most commonly used words in stories from the US top online news sites that mentioned "Zimbabwe" and "president" in 2013:
import mediacloud.api, datetime
mc = mediacloud.api.MediaCloud('MY_API_KEY')
words = mc.wordCount('zimbabwe AND president AND tags_id_media:58722749',
mc.dates_as_query_clause( datetime.date( 2013, 1, 1), datetime.date( 2014, 1, 1)))
print(words[0])
To find out all the details about one particular story by id:
import mediacloud.api
mc = mediacloud.api.MediaCloud('MY_API_KEY')
story = mc.story(169440976)
print(story['url'])
To save the first 100 stories from one day to a database:
import mediacloud.api, datetime
mc = mediacloud.api.MediaCloud('MY_API_KEY')
db = mediacloud.storage.MongoStoryDatabase('one_day')
stories = mc.storyList('*', mc.dates_as_query_clause( datetime.date (2014, 1, 1), datetime.date(2014,1,2) ),
last_processed_stories_id=0,rows=100)
[db.addStory(s) for s in stories]
print(db.storyCount())
Take a look at the test in the mediacloud/test/
module for more detailed examples.
Development
If you are interested in adding code to this module, first clone the GitHub repository.
Testing
You need to create an MC_API_KEY
envvar and set it to your API key (we use python-dotenv
).
Then run make test
. We run continuous integration (via Travis),
so every push runs the whole test suite (we also do this nightly and on PRs).
Distributing a New Version
If you want to, setup twin's keyring integration to avoid typing your PyPI
password over and over.
- Run
make test
to make sure all the test pass - Update the version number in
mediacloud/__init__.py
- Make a brief note in the CHANGELOG.md about what changes
- Run
make build-release
to create an install package - Run
make release-test
to upload it to PyPI's test platform - Run
make release
to upload it to PyPI