Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

csvpack

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

csvpack

  • 0.2.0
  • Rubygems
  • Socket score

Version published
Maintainers
1
Created
Source

csvpack

work with tabular data packages using comma-separated values (CSV) datafiles in text with datapackage.json; download, read into and query comma-separated values (CSV) datafiles with your SQL database (e.g. SQLite, PostgreSQL, ...) of choice and much more

  • home :: github.com/csv11/csvpack
  • bugs :: github.com/csv11/csvpack/issues
  • gem :: rubygems.org/gems/csvpack
  • rdoc :: rubydoc.info/gems/csvpack
  • forum :: ruby-talk@ruby-lang.org

Usage

What's a tabular data package?

Tabular Data Package is a simple structure for publishing and sharing tabular data with the following key features:

  • Data is stored in CSV (comma separated values) files
  • Metadata about the dataset both general (e.g. title, author) and the specific data files (e.g. schema) is stored in a single JSON file named datapackage.json which follows the Data Package format

(Source: Tabular Data Packages, Frictionless Data Initiative • Data Hub.io • Open Knowledge Foundation • Data Protocols.org)

Here's a minimal example of a tabular data package holding two files, that is, data.csv and datapackage.json:

data.csv:

Brewery,City,Name,Abv
Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
Augustiner Bräu München,München,Edelstoff,5.6%
Bayerische Staatsbrauerei Weihenstephan,Freising,Hefe Weissbier,5.4%
Brauerei Spezial,Bamberg,Rauchbier Märzen,5.1%
Hacker-Pschorr Bräu,München,Münchner Dunkel,5.0%
Staatliches Hofbräuhaus München,München,Hofbräu Oktoberfestbier,6.3%
...

datapackage.json:

{
  "name": "beer",
  "resources": [
    {
      "path": "data.csv",
      "schema": {
        "fields": [{ "name": "Brewery",   "type": "string" },
                   { "name": "City",      "type": "string" },
                   { "name": "Name",      "type": "string" },
                   { "name": "Abv",       "type": "number" }]
      }
    }
  ]
}

Where to find data packages?

For some real world examples see the Data Packages Listing (Sources) at the Data Hub.io • Frictionless Data Initiative website for a start. Tabular data packages include:

NameComments
country-codesComprehensive country codes: ISO 3166, ITU, ISO 4217 currency codes and many more
language-codesISO Language Codes (639-1 and 693-2)
currency-codesISO 4217 Currency Codes
gdbCountry, Regional and World GDP (Gross Domestic Product)
s-and-p-500-companiesS&P 500 Companies with Financial Information
un-locodeUN-LOCODE Codelist
gold-pricesGold Prices (Monthly in USD)
bond-yields-uk-10y10 Year UK Government Bond Yields (Long-Term Interest Rate)

and many more

Code, Code, Code - Script Your Data Workflow with Ruby

require 'csvpack'

CsvPack.import(
  's-and-p-500-companies',
  'gdb'
)

Using CsvPack.import will:

  1. download all data packages to the ./pack folder

  2. (auto-)add all tables to an in-memory SQLite database using SQL create_table commands via ActiveRecord migrations e.g.

create_table :constituents_financials do |t|
  t.string :symbol          # Symbol         (string)
  t.string :name            # Name           (string)
  t.string :sector          # Sector         (string)
  t.float  :price           # Price          (number)
  t.float  :dividend_yield  # Dividend Yield (number)
  t.float  :price_earnings  # Price/Earnings (number)
  t.float  :earnings_share  # Earnings/Share (number)
  t.float  :book_value      # Book Value     (number)
  t.float  :_52_week_low    # 52 week low    (number)
  t.float  :_52_week_high   # 52 week high   (number)
  t.float  :market_cap      # Market Cap     (number)
  t.float  :ebitda          # EBITDA         (number)
  t.float  :price_sales     # Price/Sales    (number)
  t.float  :price_book      # Price/Book     (number)
  t.string :sec_filings     # SEC Filings    (string)
end
  1. (auto-)import all datasets using SQL inserts e.g.
INSERT INTO constituents_financials
  (symbol,
   name,
   sector,
   price,
   dividend_yield,
   price_earnings,
   earnings_share,
   book_value,
   _52_week_low,
   _52_week_high,
   market_cap,
   ebitda,
   price_sales,
   price_book,
   sec_filings)
VALUES
  ('MMM',
   '3M Company',
   'Industrials',
   162.27,
   2.11,
   22.28,
   7.284,
   25.238,
   123.61,
   162.92,
   104.0,
   8.467,
   3.28,
   6.43,
   'http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=MMM')
  1. (auto-)add ActiveRecord models for all tables.

So what? Now you can use all the "magic" of ActiveRecord to query the datasets. Example:

pp Constituent.count

# SELECT COUNT(*) FROM "constituents"
# => 496


pp Constituent.first

# SELECT  "constituents".* FROM "constituents" ORDER BY "constituents"."id" ASC LIMIT 1
# => #<Constituent:0x9f8cb78
#         id:     1,
#         symbol: "MMM",
#         name:   "3M Company",
#         sector: "Industrials">


pp Constituent.find_by!( symbol: 'MMM' )

# SELECT  "constituents".*
#         FROM "constituents"
#         WHERE "constituents"."symbol" = "MMM"
#         LIMIT 1
# => #<Constituent:0x9f8cb78
#         id:     1,
#         symbol: "MMM",
#         name:   "3M Company",
#         sector: "Industrials">


pp Constituent.find_by!( name: '3M Company' )

# SELECT  "constituents".*
#          FROM "constituents"
#          WHERE "constituents"."name" = "3M Company"
#          LIMIT 1
# => #<Constituent:0x9f8cb78
#         id:     1,
#         symbol: "MMM",
#         name:   "3M Company",
#         sector: "Industrials">


pp Constituent.where( sector: 'Industrials' ).count

# SELECT COUNT(*) FROM "constituents"
#         WHERE "constituents"."sector" = "Industrials"
# => 63


pp Constituent.where( sector: 'Industrials' ).all

# SELECT "constituents".*
#         FROM "constituents"
#         WHERE "constituents"."sector" = "Industrials"
# => [#<Constituent:0x9f8cb78
#          id:     1,
#          symbol: "MMM",
#          name:   "3M Company",
#          sector: "Industrials">,
#      #<Constituent:0xa2a4180
#          id:     8,
#          symbol: "ADT",
#          name:   "ADT Corp (The)",
#          sector: "Industrials">,...]

and so on

Frequently Asked Questions (F.A.Qs) and Answers

Q: How to dowload a data package ("by hand")?

Use the CsvPack::Downloader class to download a data package to your disk (by default data packages get stored in ./pack).

dl = CsvPack::Downloader.new
dl.fetch( 'language-codes' )
dl.fetch( 's-and-p-500-companies' )
dl.fetch( 'un-locode')

Will result in:

-- pack
   |-- language-codes
   |   |-- data
   |   |   |-- ietf-language-tags.csv
   |   |   |-- language-codes-3b2.csv
   |   |   |-- language-codes-full.csv
   |   |   `-- language-codes.csv
   |   `-- datapackage.json
   |-- s-and-p-500-companies
   |   |-- data
   |   |   `-- constituents.csv
   |   `-- datapackage.json
   `-- un-locode
       |-- data
       |   |-- code-list.csv
       |   |-- country-codes.csv
       |   |-- function-classifiers.csv
       |   |-- status-indicators.csv
       |   `-- subdivision-codes.csv
       `-- datapackage.json
Q: How to add and import a data package ("by hand")?

Use the CsvPack::Pack class to read-in a data package and add and import into an SQL database.

pack = CsvPack::Pack.new( './pack/un-locode/datapackage.json' )
pack.tables.each do |table|
  table.up!      # (auto-) add table  using SQL create_table via ActiveRecord migration
  table.import!  # import all records using SQL inserts
end
Q: How to connect to a different SQL database?

You can connect to any database supported by ActiveRecord. If you do NOT establish a connection in your script - the standard (default fallback) is using an in-memory SQLite3 database.

SQLite

For example, to create an SQLite3 database on disk - lets say mine.db - use in your script (before the CsvPack.import statement):

ActiveRecord::Base.establish_connection( adapter:  'sqlite3',
                                         database: './mine.db' )
PostgreSQL

For example, to connect to a PostgreSQL database use in your script (before the CsvPack.import statement):

require 'pg'       ##  pull-in PostgreSQL (pg) machinery

ActiveRecord::Base.establish_connection( adapter:  'postgresql'
                                         username: 'ruby',
                                         password: 'topsecret',
                                         database: 'database' )

Install

Just install the gem:

$ gem install csvpack

Alternatives

See the "Tools and Plugins for working with Data Packages" page at the Frictionless Data Initiative.

License

The csvpack scripts are dedicated to the public domain. Use it as you please with no restrictions whatsoever.

Questions? Comments?

Send them along to the ruby-talk mailing list. Thanks!

FAQs

Package last updated on 08 Aug 2018

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc