= Chimps on the Command Line
Infochimps[http://www.infochimps.com] is an online data marketplace
and repository where anyone can find, share, and sell data.
Infochimps offers two APIs for users to access and modify data
Chimps[http://github.com/infochimps/chimps-cli] is a Ruby wrapper for
these APIs that makes interacting with them simple. You can embed
Chimps inside your web application or any other software you write.
But if you finding yourself wishing that you could make queries,
create datasets, &c. from your command line, where you already live,
where you already keep your data...then Chimps CLI is for you:
See your datasets
$ chimps list --my
...
Create a new dataset
$ chimps create title="A Brand New Dataset" description="That I created in 2 minutes. But that doesn't mean it's not awesome."
...
Take a look
$ chimps show a-brand-new-dataset
Check out your competition
$ chimps search awesome data
...
Hmmm. Better do some more work.
$ chimps update a-brand-new-dataset tag_list="awesome,new,data"
= First Steps
== Installing Chimps CLI
Assuming you've already set up your Gem sources, just run
gem install chimps-cli
This will also install Chimps[http://github.com/infochimps/chimps] if
it's not already present on your system.
== Configuring Chimps
Chimps CLI is just a command-line wrapper for the Chimps library. If
Chimps is already properly configured with your API credentials then
Chimps CLI will read them just fine without you having to do anything.
If you need to obtain API keys for either the Dataset API or the Query
API then {sign up at Infochimps}[http://www.infochimps.com/signup].
You'll need to put your API keys into one of two files, either
/etc/chimps/chimps.yaml or ~/.chimps. See the
README for Chimps[http://github.com/infochimps/chimps] for more
details on how to set up these configuration files.
= Usage
Try running
$ chimps help
to make sure you can run the +chimps+ command and to see an overview
of what subcommands are available. You can get more detailed help as
well as example usage on COMMAND by running
$ chimps help COMMAND
You can test and see whether your credentials are valid using the
+test+ command:
$ chimps test
Authenticated as user 'Infochimps' for Infochimps Dataset API at http://www.infochimps.com
Authenticated for Infochimps Query API at http://api.infochimps.com
If you get messages about missing keys and so on go back and read the
{Chimps installation
instructions}[http://github.com/infochimps/chimps].
If you get messages about not being able to authenticate, double-check
that the API keys in your configuration file (either
~/.chimps or /etc/chimps/chimps.yaml) match the
credentials listed in your profile[http://www.infochimps.com/me].
== Options
Commands to +chimps+ accept arguments as well as options. Options
always begin with two dashes and some options have single-letter flags
as well.
Some options work for every +chimps+ command. --verbose
(-v), for example, is a great way to see what underlying HTTP
request(s) a given command is making.
== Operating on a Dataset, Source, License, &c.
Many requests can operate on a particular resource. The +show+
command, for example, can be used to show a dataset (the default
choice), a license, a source, or a user.
You can see what resources +COMMAND+ can operate on with chimps
help COMMAND. Two examples
Will attempt to show the Dataset 'an-example'
$ chimps show an-example
Will attempt to show the Source 'an-example'
$ chimps show source an-example
Will search datasets for 'stocks'
$ chimps search stocks
Will list all licenses
$ chimps list licenses
== Providing Data to a Command
Some commands (typically those that result in HTTP +GET+ and +DELETE+
requests) don't require you to pass any data to Infochimps.
Other commands (typically those that result in HTTP +POST+ and +PUT+
requests) do. These commands usually create or modify a dataset or
other resource at Infochimps.
Say you wanted to create a new dataset on Infochimps with the title
"List of hottest Salsas" and with description "All salsas were tried
personally by me."
There are two methods you can use to pass this data to a Chimps CLI
command:
- You can put the data you need to pass into a file on disk. Chimps
understands YAML and JSON files formats and will automatically parse
and serialize them properly when making a request. You could create
the following file
in salsa_dataset.yml
title: "List of hottest Salsas"
description: |-
All salsas were tried personally by me.
and you can create the dataset with
$ chimps create --data=salsa_dataset.yml
- You can pass parameters and values directly on the command line.
You could create the same dataset as above with
$ chimps create title="List of hottest Salsas" description="All salsas were tried personally by me."
This will only work for a flat collection of parameters and values, as
in this example. If you need to pass a nested data structure you
should use a file and the --data option above.
Another example, which makes a query to the Query API and returns
demographics on an IP address
$ chimps query web/an/ip_census ip=67.78.118.7
= Basic HTTP Verbs
Infochimps' Dataset API is RESTful so it respects the semantics of
{HTTP
verbs}[http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol#Request_methods].
You can use this "lower-level" interface to make simple +GET+, +POST+,
+PUT+, and +DELETE+ requests.
Here's how to return information on a Yahoo! Stocks dataset
$ chimps get /datasets/yahoo-stock-search
The default response will be in JSON but you can change the response
format by explicitly passing a different one of +xml+, +json+, or
+yaml+. This works for (almost) all Dataset API requests.
$ chimps get /datasets/yahoo-stock-search --response_format=yaml
Try running chimps help for the +get+ command (chimps
help get) as well as for the +post+, +put+, and +delete+
commands.
== Signed vs. Unsigned Requests
Some requests, like the +GET+ request above, don't need to be signed
in any way: using +chimps+ to make a simple unsigned +GET+ request
isn't anything different than just doing it with +curl+.
All +POST+, +PUT+, and +DELETE+ requests, however, need to be signed
and using Chimps to do it makes it easy. Here's how you might create
a dataset
$ chimps post --sign /datasets title="My dataset" description="Some text..."
If you leave out the --sign option then the request will fail
with a 401 Authentication error.
The above request is really just the same as
$ chimps create title="My dataset" description="Some text..."
which is a little simple because +create+ understands what you're
trying to do and internally constructs the appropriate +POST+ request.
You can find a list of all available requests, the correct HTTP verb
to use, whether the request needs to be signed, and what parameters it
accepts at http://www.infochimps.com/apis.
To round out this section, here's an example of a +PUT+ request and a
+DELETE+ request (both of which must be signed):
Update your existing dataset
$ chimps put --sign /datasets/my-dataset title="A New Title"
Let's delete this dataset now because we're fickle little monkeys...
$ chimps delete --sign /datasets/my-dataset
Most things you might want to do with this "low-level" HTTP verb
interface can be done with specialized +chimps+ commands. Read on.
= Core REST Actions
Since the Infochimps Dataset API is
RESTful[http://en.wikipedia.org/wiki/Representational_State_Transfer],
it implements list, show, create, update, and destroy actions for all
resources. Each of these actions has a corresponding Chimps command.
Here's how to +list+ datasets:
$ chimps list
The +list+ command is one of a few (+search+ being another) that
accepts the --my (-m) option. This will restrict
the output to only datasets (or whatever resource you're listing) that
are owned by you.
$ chimps list --my
$ chimps list --my licenses
Here's how to +show+ a dataset:
$ chimps show my-dataset
this returns YAML by default but you can specify a different response
format by passing the --response_format option
$ chimps show my-dataset --response_format=json
You've already seen +create+ in action a few times so here's +update+
instead
$ chimps update my-dataset title="A new title"
And of course +destroy+
$ chimps destroy my-dataset
If you're curious about the underlying HTTP requests being sent, try
running these commands with the --verbose (-v) flag.
= Special Requests
Chimps CLI has a few special commands which aren't HTTP verbs or core
REST actions.
== Search
Here's how to search Infochimps for datasets about music:
$ chimps search music
Here's the same search restricted to only datasets you own and
pretty-printed:
$ chimps search --my music --pretty
== Download
If a dataset on Infochimps has a downloadable package then the
+download+ command can be used to download the data:
$ chimps download daily-1970-2010-open-close-hi-low-and-volume-nyse-exchange
The dataset must be free, you must own it, or you must have purchased
it (through the website) before you can +download+ it with Chimps.
You may want to include the --verbose (-v) flag so
that you can see the progress of the download, especially if it is a
large file.
== Upload
Infochimps does not presently allow you to upload data by using an
API. Please create a dataset first (you can do this with Chimps) and
then go to that dataset's page in a browser and upload any data you
wish.
This feature will be coming very, very soon!
== Help/Test
chimps help and chimps help COMMAND should carry you
a good ways with the examples and usage they output.
chimps test should confirm that your API keys are properly
configured.
= Contributing
Chimps CLI is an open source project created by the Infochimps team to
encourage adoption of the Infochimps APIs. The official repository is
hosted on GitHub
http://github.com/infochimps/chimps-cli
Feel free to clone it and send pull requests.