Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
work with tabular data packages using comma-separated values (CSV) datafiles in text with datapackage.json; download, read into and query comma-separated values (CSV) datafiles with your SQL database (e.g. SQLite, PostgreSQL, ...) of choice and much more
Tabular Data Package is a simple structure for publishing and sharing tabular data with the following key features:
- Data is stored in CSV (comma separated values) files
- Metadata about the dataset both general (e.g. title, author) and the specific data files (e.g. schema) is stored in a single JSON file named
datapackage.json
which follows the Data Package format
Here's a minimal example of a tabular data package holding two files, that is, data.csv
and datapackage.json
:
data.csv
:
Brewery,City,Name,Abv
Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
Augustiner Bräu München,München,Edelstoff,5.6%
Bayerische Staatsbrauerei Weihenstephan,Freising,Hefe Weissbier,5.4%
Brauerei Spezial,Bamberg,Rauchbier Märzen,5.1%
Hacker-Pschorr Bräu,München,Münchner Dunkel,5.0%
Staatliches Hofbräuhaus München,München,Hofbräu Oktoberfestbier,6.3%
...
datapackage.json
:
{
"name": "beer",
"resources": [
{
"path": "data.csv",
"schema": {
"fields": [{ "name": "Brewery", "type": "string" },
{ "name": "City", "type": "string" },
{ "name": "Name", "type": "string" },
{ "name": "Abv", "type": "number" }]
}
}
]
}
For some real world examples see the Data Packages Listing (Sources) at the Data Hub.io • Frictionless Data Initiative website for a start. Tabular data packages include:
Name | Comments |
---|---|
country-codes | Comprehensive country codes: ISO 3166, ITU, ISO 4217 currency codes and many more |
language-codes | ISO Language Codes (639-1 and 693-2) |
currency-codes | ISO 4217 Currency Codes |
gdb | Country, Regional and World GDP (Gross Domestic Product) |
s-and-p-500-companies | S&P 500 Companies with Financial Information |
un-locode | UN-LOCODE Codelist |
gold-prices | Gold Prices (Monthly in USD) |
bond-yields-uk-10y | 10 Year UK Government Bond Yields (Long-Term Interest Rate) |
and many more
require 'csvpack'
CsvPack.import(
's-and-p-500-companies',
'gdb'
)
Using CsvPack.import
will:
download all data packages to the ./pack
folder
(auto-)add all tables to an in-memory SQLite database using SQL create_table
commands via ActiveRecord
migrations e.g.
create_table :constituents_financials do |t|
t.string :symbol # Symbol (string)
t.string :name # Name (string)
t.string :sector # Sector (string)
t.float :price # Price (number)
t.float :dividend_yield # Dividend Yield (number)
t.float :price_earnings # Price/Earnings (number)
t.float :earnings_share # Earnings/Share (number)
t.float :book_value # Book Value (number)
t.float :_52_week_low # 52 week low (number)
t.float :_52_week_high # 52 week high (number)
t.float :market_cap # Market Cap (number)
t.float :ebitda # EBITDA (number)
t.float :price_sales # Price/Sales (number)
t.float :price_book # Price/Book (number)
t.string :sec_filings # SEC Filings (string)
end
INSERT INTO constituents_financials
(symbol,
name,
sector,
price,
dividend_yield,
price_earnings,
earnings_share,
book_value,
_52_week_low,
_52_week_high,
market_cap,
ebitda,
price_sales,
price_book,
sec_filings)
VALUES
('MMM',
'3M Company',
'Industrials',
162.27,
2.11,
22.28,
7.284,
25.238,
123.61,
162.92,
104.0,
8.467,
3.28,
6.43,
'http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=MMM')
So what? Now you can use all the "magic" of ActiveRecord to query the datasets. Example:
pp Constituent.count
# SELECT COUNT(*) FROM "constituents"
# => 496
pp Constituent.first
# SELECT "constituents".* FROM "constituents" ORDER BY "constituents"."id" ASC LIMIT 1
# => #<Constituent:0x9f8cb78
# id: 1,
# symbol: "MMM",
# name: "3M Company",
# sector: "Industrials">
pp Constituent.find_by!( symbol: 'MMM' )
# SELECT "constituents".*
# FROM "constituents"
# WHERE "constituents"."symbol" = "MMM"
# LIMIT 1
# => #<Constituent:0x9f8cb78
# id: 1,
# symbol: "MMM",
# name: "3M Company",
# sector: "Industrials">
pp Constituent.find_by!( name: '3M Company' )
# SELECT "constituents".*
# FROM "constituents"
# WHERE "constituents"."name" = "3M Company"
# LIMIT 1
# => #<Constituent:0x9f8cb78
# id: 1,
# symbol: "MMM",
# name: "3M Company",
# sector: "Industrials">
pp Constituent.where( sector: 'Industrials' ).count
# SELECT COUNT(*) FROM "constituents"
# WHERE "constituents"."sector" = "Industrials"
# => 63
pp Constituent.where( sector: 'Industrials' ).all
# SELECT "constituents".*
# FROM "constituents"
# WHERE "constituents"."sector" = "Industrials"
# => [#<Constituent:0x9f8cb78
# id: 1,
# symbol: "MMM",
# name: "3M Company",
# sector: "Industrials">,
# #<Constituent:0xa2a4180
# id: 8,
# symbol: "ADT",
# name: "ADT Corp (The)",
# sector: "Industrials">,...]
and so on
Use the CsvPack::Downloader
class to download a data package
to your disk (by default data packages get stored in ./pack
).
dl = CsvPack::Downloader.new
dl.fetch( 'language-codes' )
dl.fetch( 's-and-p-500-companies' )
dl.fetch( 'un-locode')
Will result in:
-- pack
|-- language-codes
| |-- data
| | |-- ietf-language-tags.csv
| | |-- language-codes-3b2.csv
| | |-- language-codes-full.csv
| | `-- language-codes.csv
| `-- datapackage.json
|-- s-and-p-500-companies
| |-- data
| | `-- constituents.csv
| `-- datapackage.json
`-- un-locode
|-- data
| |-- code-list.csv
| |-- country-codes.csv
| |-- function-classifiers.csv
| |-- status-indicators.csv
| `-- subdivision-codes.csv
`-- datapackage.json
Use the CsvPack::Pack
class to read-in a data package
and add and import into an SQL database.
pack = CsvPack::Pack.new( './pack/un-locode/datapackage.json' )
pack.tables.each do |table|
table.up! # (auto-) add table using SQL create_table via ActiveRecord migration
table.import! # import all records using SQL inserts
end
You can connect to any database supported by ActiveRecord. If you do NOT establish a connection in your script - the standard (default fallback) is using an in-memory SQLite3 database.
For example, to create an SQLite3 database on disk - lets say mine.db
-
use in your script (before the CsvPack.import
statement):
ActiveRecord::Base.establish_connection( adapter: 'sqlite3',
database: './mine.db' )
For example, to connect to a PostgreSQL database use in your script
(before the CsvPack.import
statement):
require 'pg' ## pull-in PostgreSQL (pg) machinery
ActiveRecord::Base.establish_connection( adapter: 'postgresql'
username: 'ruby',
password: 'topsecret',
database: 'database' )
Just install the gem:
$ gem install csvpack
See the "Tools and Plugins for working with Data Packages" page at the Frictionless Data Initiative.
The csvpack
scripts are dedicated to the public domain.
Use it as you please with no restrictions whatsoever.
Send them along to the ruby-talk mailing list. Thanks!
FAQs
Unknown package
We found that csvpack demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.