Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

csv_fast_importer

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

csv_fast_importer

  • 1.2.0
  • Rubygems
  • Socket score

Version published
Maintainers
1
Created
Source

Gem Version Tests status Codacy Badge

CSV Fast Importer

A gem to import CSV files' content into a PostgreSQL or MySQL database. It is respectively based on PostgreSQL COPY and MySQL LOAD DATA INFILE which are designed to be as fast as possible.

Why?

CSV importation is a common task which can be done by more than 6 different gems, but none of them is able to import 1 million of lines in a few seconds (see benchmark below), hence the creation of this gem.

Here is an indicative benchmark to compare available solutions. It represents the duration (ms) to import a 10 000 lines csv file into a local PostgreSQL instance on a laptop running OSX (lower is better):

Benchmark

Like all benchmarks, some tuning can produce different results, yet this chart gives a big picture. See benchmark details.

Requirements

  • Rails (ActiveRecord in fact)
  • PostgreSQL or MySQL

Limitations

  • Usual ActiveRecord process (validations, callbacks, computed fields like created_at...) is bypassed. This is the price for performance
  • Custom enclosing field (ex: ") is not supported yet
  • Custom line separator (ex: \r\n for windows file) is not supported yet
  • MySQL: encoding is not supported yet
  • MySQL: transaction is not supported yet
  • MySQL: row_index is not supported yet

Note about custom line separator: it might work by opening the file with the universal_newline argument (e.g. file = File.new(path, universal_newline: true)). Unfortunately, we weren't able to reproduce and test it so we don't support it "officialy". You can find more information in this ticket (in French).

Installation

Add the dependency to your Gemfile:

gem 'csv_fast_importer'

Run bundle install.

You can install the gem by yourself too:

$ gem install csv_fast_importer

For MySQL :warning: enable local_infile for both client and server. In Rails application, juste add local_infile: true to your database config file databse.yml to configure the database client. See Security Issues with LOAD DATA LOCAL for more details.

Usage

Actually, CSV Fast Importer needs active_record to work. Setup your database configuration as in a usual Rails project. Then, use the CsvFastImporter class:

require 'csv_fast_importer'

file = File.new '/path/to/knights.csv'
imported_lines_count = CsvFastImporter.import(file)

puts imported_lines_count

Under the hood, CSV Fast Importer deletes data from the knights table and imports those from knights.csv by mapping columns' names to table's fields. Note: mapping is case insensitive so database fields' names must be lowercase. For instance, a FIRSTNAME CSV column will be mapped to the firstname field.

Options

Option key Purpose Default value
encodingFile encoding. PostgreSQL only (see FAQ for more details)'UTF-8'
col_sepColumn separator in file';'
destinationDestination tablegiven base filename (without extension)
mappingColumn mapping{}
row_index_columnColumn name where inserting file row index (not used when nil). PostgreSQL onlynil
transactionExecute DELETE and INSERT in same transaction. PostgreSQL only:enabled
deletionRow deletion method (:delete for SQL DELETE, :truncate for SQL TRUNCATE or :none for no deletion before import):delete

If your CSV file is not encoded with same table than your database, you can specify encoding at the file opening (see FAQ for more details):

file = File.new '/path/to/knights.csv', encoding: 'ISO-8859-1'

You can specify a different separator column with the col_sep option (; by default):

CsvFastImporter.import file, col_sep: '|'

By default, CSV Fast Importer computes the database table's name by taking the basename of the imported file. For instance, considering the imported file /path/to/knights.csv, the table's name will be knights. To bypass this default behaviour, specify the destination option:

file = File.new '/path/to/clients.csv'
CsvFastImporter.import file, destination: 'knights'

Finally, you can precise a custom mapping between CSV file's columns and database fields with the mapping option.

Considering the following knights.csv file:

NAME;KNIGHT_EMAIL
Perceval;perceval@logre.cel
Lancelot;lancelot@logre.cel

To map the KNIGHT_EMAIL column to the email database field:

CsvFastImporter.import file, mapping: { knight_email: :email }

Need help?

See FAQ.

How to contribute?

You can fork and submit new pull request (with tests and explanations). First of all, you need to initialize your environment :

$ brew install postgresql # in macOS
$ apt-get install libpq-dev # in Linux
$ bundle install

Then, start your PostgreSQL database (ex: Postgres.app for the Mac) and setup database environment:

$ bundle exec rake test:db:create

This will connect to localhost PostgreSQL database without user (see config/database.postgres.yml) and create a new database dedicated to tests.

Warning: database instance have to allow database creation with UTF-8 encoding.

Finally, you can run all tests with RSpec like this:

$ bundle exec rspec

By default, PostgreSQL is used. You can set another database with environment variables like this for MySQL:

$ DB_TYPE=mysql DB_ROOT_PASSWORD=password DB_USERNAME=username bundle exec rake test:db:create
$ DB_TYPE=mysql DB_USERNAME=username bundle exec rspec

This will connect to mysql with root user (with password as password) and create database for user username. Use DB_TYPE=mysql DB_USERNAME= (with empty username) for anonymous account.

Warning: Mysql tests require your local database permits LOCAL works. Check your Mysql instance with following command: SHOW GLOBAL VARIABLES LIKE 'local_infile' (should be ON).

Versioning

master is the development branch and releases are published as tags.

We follow the Semantic Versioning 2.0.0 for our gem releases.

In few words:

Given a version number MAJOR.MINOR.PATCH, increment the:

  1. MAJOR version when you make incompatible API changes,
  2. MINOR version when you add functionality in a backwards-compatible manner, and
  3. PATCH version when you make backwards-compatible bug fixes.

Backlog (unordered)

  • [ ] Support any column and table case
  • Support custom enclosing field (ex: ")
  • Support custom line serparator (ex: \r\n for windows file)
  • Support custom type convertion
  • MySQL: support encoding parameter. See https://dev.mysql.com/doc/refman/5.5/en/charset-charsets.html
  • MySQL: support transaction parameter
  • MySQL: support row_index_column parameter
  • MySQL: run multiple SQL queries in single statement
  • Refactor tests (with should-> must / should -> expect / subject...)
  • Reduce technical debt on db connection (test & benchmark)
  • SQLite support

How to release new version?

Setup rubygems.org account:

curl -u {your_gem_account_name} https://rubygems.org/api/v1/api_key.yaml > ~/.gem/credentials
chmod 0600 ~/.gem/credentials

Make sure you are in master branch and run:

bundle exec rake "release:make[major|minor|patch|x.y.z]"

Example: bundle exec rake "release:make[minor]"

Then, follow instructions.

FAQs

Package last updated on 25 Sep 2023

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc