#+TITLE: DataFix - Data Maintenance Tasks Manager
#+AUTHOR: Adolfo Villafiorita
Manage your data maintenance tasks like migrations.
#+begin_quote
I am not sure how it slipped through, but I realized there is
another gem, [[https://rubygems.org/gems/data_migrate/versions][data_migrate]], which does the same thing as =data_fix=.
... well, I should actually say it the other way around, since
=data_migrate= has been around for longer and has been downloaded
extensively.
We keep using and maintaining =data_fix=, but if you are starting
from scratch, [[https://rubygems.org/gems/data_migrate/versions][data_migrate]] is probably a more complete and safer
choice.
#+end_quote
This Rails gem provides a set of tasks to manage DB data maintenance
tasks like they were migrations.
Data maintenance tasks include anything which does not fit in a schema
migration, such as, for instance, adding new records in production,
fixing errors in existing records, migrating data to a new
schema, ...
Before we wrote =data_fix= we would create a rake tasks or a script
with the migration code, test the script in development and finally
run it in production. The process was highly manual, with no
information about which migrations were run and when. Although these
scripts are usually one-offs and lose their value once run, we were
not quite ok with the approach.
=data_fix= helps by enforcing standards and keeping track of the data
fix run.
Similar to Rails schema migrations, thus, =data_fix= provides:
- Tasks for generating time-stamped scripts, where you will put the
code you need to run on your data.=
- A table in your DBs to keep track of which scripts have been already
run
- Tasks to manage execution of the scripts and of the table of the
tasks run
- Automatic backup of your data before running the scripts
Different from schema migrations data_fix does not provide a
mechanism for data rollbacks. In fact writing reversible maintenance
scripts on data not only is complex, but also, in many cases,
pointless: why should you write the code to fix a typo in a record and
also that to reintroduce the typo you are fixing? It is also an
overkill, since it is much simpler rolling back to a previous version
using a backup.
We now use it at [[https://shair.tech][Shair.Tech]] to perform data cleaning and data updates
of our Rails apps.
Add this line to your application's Gemfile:
#+begin_example ruby
gem 'rails_data_fix'
#+end_example
And then execute:
#+begin_example
$ bundle install
#+end_example
Or install it yourself as:
#+begin_example
$ gem install rails_data_fix
#+end_example
Than for each environment and DB in which you want to use
=data_fix= run the following command:
#+begin_example
rails data_fix:init
#+end_example
This means that you need to run =RAILS_ENV=production rake
data_fix:init= in your production environment, if you want to use it
there.
First create a file and write the script:
#+begin_example sh
rails data_fix:create[data_fix_name]
[... write the script in the generated file ...]
#+end_example
Then, for each environment in which you need to run the script:
#+begin_example sh
rails data_fix:run
#+end_example
=data_fix= scripts are not atomic: if you interrupt a script, the DB
will be left in status which depends on the code you wrote and when
you interrupted the script. In a typical scenario only part your
record will have been updated/fixed.
It is a good idea to test your scripts in development before running
them on the actual data.
You can run the same script multiple times on the DB either by
restoring the status from a backup or by using the =rollback= task,
which declares one or more migrations as not run.
For instance:
#+begin_example
rake db:run
[... ERROR! ... ]
[... FIX SCRIPT ...]
rake db:rollback
rake db:run
[... REPEAT ...]
#+end_example
A typical usage scenario is the following.
-
You realize you have been inconsistent in storing color names in
the =color= field of a table of your DB: some of your records use
the word =gray= while others use the British spelling =grey=.
-
You use =data_fix:create= to generate a file in =db/migrate-data=.
(The file will contain your data migration/maintenance script):
#+begin_example sh
rails data_fix:create[prefer_british_spelling]
#+end_example
The script generates a file whose name is along the lines of:
=db/migrate-data/20210730135129_prefer_british_over_american.rb=
-
You now write the code to fix your data in the file just created. For
instance something along the lines of:
#+begin_example sh
cat > db/migrate-data/20210730135129_prefer_british_over_american.rb
Color.where(name: "gray").each do |record|
record.color = "grey"
record.save
end
^D
#+end_example
-
You can test your script in development by running:
#+begin_example sh
rails data_fix:run
#+end_example
-
If you are unhappy, you can declare the data_fix as not run, fix
you script, and run it again:
#+begin_example sh
rails data_fix:rollback
#+end_example
#+begin_example sh
cat > db/migrate-data/20210730135129_prefer_british_over_american.rb
puts "I prefer a brute-force approach"
Color.all.each do |record|
record.color = "grey"
record.save
end
^D
#+end_example
#+begin_example sh
rails data_fix:run
#+end_example
#+begin_quote
Despite the name of the task, =data_fix:rollback= does not roll
back data: for that you need to reload from a DB. The
=data_fix:rollback= task updates the table in the DB declaring
the the latest =data_fix= has not yet been run.
#+end_quote
You repeat the steps above for any other data fix you need. When you
are ready, you can run all the migrations at once in production, with
the following command:
#+begin_example
RAILS_ENV=production rails data_fix:run
#+end_example
=data_fix= keeps track of the scripts it has already run ensuring the
script is not run twice.
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in
version.rb
, and then run bundle exec rake release
, which will
create a git tag for the version, push git commits and the created
tag, and push the .gem
file to rubygems.org.
Bug reports and pull requests are welcome on GitHub at
https://github.com/shair.tech/data_fix.
The gem is available as open source under the terms of the
[[https://opensource.org/licenses/MIT][MIT License]].