RStore
A library for easy batch storage of csv data into a database
Uses the CSV standard library for parsing, Nokogiri for URL handling, and Sequel ORM for database management.
Special Features
- Batch processing of csv files
- Fetches data from different sources: files, directories, URLs
- Customizable using additional options (also see section Available Options)
- Validation of field values. At the moment validation of the following types is supported:
String
, Integer
, Float
, Date
, DateTime
, Time
, and Boolean
- Descriptive error messages pointing helping you to find any invalid data quickly.
- Only define your database and table classes once, then just
require
them when needed. - Safe and transparent data storage:
- Using database transactions: Either the data from all all files is stored or none (also see section Database Requirements)
- To avoid double entry of data, the
run
method can only be run once on a single instance of RStore::CSV
.
Database Requirements
- Expects the database table to have an addition column storing an auto-incrementing primary key.
- Requires the database to support transactions:
Most other database platforms support transactions natively.
In MySQL, you'll need to be running
InnoDB
or BDB
table types rather than the more common MyISAM
.
If you are using MySQL and the table has not been created yet, RStore::CSV will take care of using the
correct table type upon creation.
Installation
$ gem install rstore
Note:
As RStore
depends on Nokogiri for fetching data from URLs, you need to install Nokogiri first to use this feature.
However, on some operating systems there can be problems due to missing libraries,
so you might want to take a look at the following installation instructions:
Debian
Users of Debian Linux (e.g. Ubuntu) need to run:
$ sudo apt-get install libxslt1-dev libxml2-dev
$ gem install nokogiri
Mac OS X
The following instruction should work, but I haven't tested them personally
$ sudo port install libxml2 libxslt
$ gem install nokogiri
Source: Installing Nokogiri
If you have any difficulties installing Nokogiri, please let me know, so that I can help you.
Public API Documentation
The documentation is hosted on RubyDoc.info: RStore Public API documentation.
Sample Usage
Sample csv file
"product","quantity","price","created_at","min_demand","max_demand","on_stock"
"toy1","1","1.12","2011-2-4","1:30","1:30am","true"
"toy2","2","2.22","2012/2/4","2:30","2:30pm","false
"toy3","3","3.33","2013/2/4","3:30","3:30 a.m.","True
"toy4","4",,,"4:30","4:30 p.m.","False"
"toy4","5","5.55","2015-2-4","5:30","5:30AM","1"
"toy5","6","6.66","2016/2/4","6:30","6:30 P.M.","0"
"toy6","7","7.77",,,,"false"
- Load gem
require 'rstore/csv'
- Store database information in a subclass of
RStore::BaseDB
Naming convention: name => NameDB
class CompanyDB < RStore::BaseDB
info(:adapter => 'mysql',
:host => 'localhost',
:user => 'root',
:password => 'xxx')
end
- Store table information in a subclass of
RStore::BaseTable
Naming convention: name => NameTable
class ProductsTable < RStore::BaseTable
create do
primary_key :id, :allow_null => false
String :product
Integer :quantity
Float :price
Date :created_at
DateTime :min_demand
Time :max_demand
Boolean :on_stock, :allow_null => false, :default => false
end
end
Note:
You can put the database and table class definitions in separate files
and require
them when needed.
- Enter csv data into the database
The
from
method accepts a path to a file or directory as well as an URL.
The to
metthod accepts a string of the form db_name.table_name
RStore::CSV.new do
from '../easter/children', :recursive => true
from '../christmas/children/toys.csv'
from 'www.example.com/sweets.csv', :selector => 'pre div.line'
to 'company.products'
run
end
Additional Features
You can change and reset the default options (see section Available Options below for details)
RStore::CSV.change_default_options(:recursive => true, :has_headers => false)
RStore::CSV.new do
from 'dir1'
from 'dir2'
from 'dir3'
to 'company.products'
run
end
RStore::CSV.reset_default_options
There is also a convenience method enabling you to use
all of Sequels query methods.
RStore::CSV.query('company.products') do |table|
table.all
table.all[3]
table.filter(:id => 2).update(:on_stock => true)
table.filter(:id => 3).delete
end
*)
Output of db[table.name].all[3]
Access all of Sequels functionality by using the convenience methods
BaseDB.connect
, BaseTable.name
, and BaseTable.table_info
:
DB = CompanyDB.connect
name = ProductTable.name
layout = ProductsTable.table_info
DB.create_table(name, &layout)
DB.alter_table name do
drop_column :created_at
add_column :entry_date, :date
end
DB.drop_table(name)
Available Options
The method from
accepts two kinds of options, file options and parse options:
File Options
File options are used for fetching csv data from a source. The following options are recognized:
- :has_headers, default:
true
- When set to false, the first line of a file is processed as data, otherwise it is discarded.
- :recursive, default:
false
- When set to true and a directory is given, recursively search for files. Non-csv files are skipped.
- :digit_seps, default
[',', '.']
- The thousands separator and decimal mark used for numbers in the data source.
Different countries use different thousands separators and decimal marks,
and setting this options ensures that parsing of these numbers succeeds.
Note that all numbers will still be stored in the format that Ruby recognizes, that is with a point (.) as the decimal mark.
- :selector, default:
""
- Mandatory css selector when fetching data from an URL. For more details please see the section Further Reading below
Parse Options
Parse options are arguments to CSV::parse
. The following options are recognized:
- :col_sep, default:
","
- The String placed between each field.
- :row_sep, default:
:auto
- The String appended to the end of each row.
- :quote_char, default:
'"'
- The character used to quote fields.
- :field_size_limit, default:
nil
- The maximum size CSV will read ahead looking for the closing quote for a field.
- :skip_blanks, default:
false
- When set to a true value, CSV will skip over any rows with no content.
For more information on the parse options, please see section Further Reading below.
Further Reading
Feedback
Any suggestions or criticism are highly welcome! Whatever your feedback, it will help me make this gem better!