Replaced type_cast_from_column to lookup_cast_type_from_column
BulkInsert
A little ActiveRecord extension for helping to insert lots of rows in a
single insert statement.
Installation
Add it to your Gemfile:
gem 'bulk_insert'
Usage
BulkInsert adds a new class method to your ActiveRecord models:
class Book < ActiveRecord::Base
end
book_attrs = ...
Book.bulk_insert do |worker|
book_attrs.each do |attrs|
worker.add(attrs)
end
end
All of those #add
calls will be accumulated into a single SQL insert
statement, vastly improving the performance of multiple sequential
inserts (think data imports and the like).
If you don't like using a block API, you can also simply pass an array
of rows to be inserted:
book_attrs = ...
Book.bulk_insert values: book_attrs
By default, the columns to be inserted will be all columns in the table,
minus the id
column, but if you want, you can explicitly enumerate
the columns:
Book.bulk_insert(:title, :author) do |worker|
worker.add ["Eye of the World", "Robert Jordan"]
worker.add title: "Lord of Light", author: "Roger Zelazny"
end
It will automatically set created_at
/updated_at
columns to the current
date, as well.
Book.bulk_insert(:title, :author, :created_at, :updated_at) do |worker|
worker.add ["The Chosen", "Chaim Potok", Time.now, Time.now]
worker.add ["Hello Ruby", "Linda Liukas"]
end
Similarly, if a value is omitted, BulkInsert will use whatever default
value is defined for that column in the database:
Book.bulk_insert(:title, :author, :medium) do |worker|
worker.add title: "Ender's Game", author: "Orson Scott Card"
end
Book.first.medium
By default, the batch is always saved when the block finishes, but you
can explicitly save inside the block whenever you want, by calling
#save!
on the worker:
Book.bulk_insert do |worker|
worker.add(...)
worker.add(...)
worker.save!
worker.add(...)
end
That will save the batch as it has been defined to that point, and then
empty the batch so that you can add more rows to it if you want. Note
that all records saved together will have the same created_at/updated_at
timestamp (unless one was explicitly set).
Batch Set Size
By default, the size of the insert is limited to 500 rows at a time.
This is called the set size. If you add another row that causes the
set to exceed the set size, the insert statement is automatically built
and executed, and the batch is reset.
If you want a larger (or smaller) set size, you can specify it in
two ways:
Book.bulk_insert(set_size: 100) do |worker|
end
Book.bulk_insert do |worker|
worker.set_size = 100
end
Insert Ignore
By default, when an insert fails the whole batch of inserts fail. The
ignore option ignores the inserts that would have failed (because of
duplicate keys or a null in column with a not null constraint) and
inserts the rest of the batch.
This is not the default because no errors are raised for the bad
inserts in the batch.
destination_columns = [:title, :author]
Book.bulk_insert(*destination_columns, ignore: true) do |worker|
worker.add(...)
worker.add(...)
end
Update Duplicates (MySQL, PostgreSQL)
If you don't want to ignore duplicate rows but instead want to update them
then you can use the update_duplicates option. Set this option to true
(MySQL) or list unique column names (PostgreSQL) and when a duplicate row
is found the row will be updated with your new values.
Default value for this option is false.
destination_columns = [:title, :author]
Book.bulk_insert(*destination_columns, update_duplicates: true) do |worker|
worker.add(...)
worker.add(...)
end
Book.bulk_insert(*destination_columns, update_duplicates: %w[title]) do |worker|
worker.add(...)
end
Return Primary Keys (PostgreSQL, PostGIS)
If you want the worker to store primary keys of inserted records, then you can
use the return_primary_keys option. The worker will store a result_sets
array of ActiveRecord::Result
objects. Each ActiveRecord::Result
object
will contain the primary keys of a batch of inserted records.
worker = Book.bulk_insert(*destination_columns, return_primary_keys: true) do
|worker|
worker.add(...)
worker.add(...)
end
worker.result_sets
License
BulkInsert is released under the MIT license (see MIT-LICENSE) by
Jamis Buck (jamis@jamisbuck.org).