Dipa
This gem provides an API for parallel processing like the parallel gem but
distributed and scalable over different machines. All this with minimum configuration and minimum dependencies to
specific technologies and using the rails ecosystem.
Dipa provides a rails engine which depends on ActiveJob and
ActiveStorage.
You can use whatever backend you like for any of this components and configure them for your specific usecase.
The purpose of this gem is to distribute load heavy and long running processing of large datasets over multiple
processes or machines using ActiveJob.
Installation
Before you install Dipa make sure ActiveJob and
ActiveStorage are installed and configured properly.
Add this line to your application's Gemfile:
gem 'dipa'
And then execute:
$ bundle install
Or install it yourself as:
$ gem install dipa
Install Dipa migrations
bundle exec rake dipa:install:migrations
bundle exec rake db:migrate
Configuration
Dipa can be configured in the application config. These configuration options set the default for this installation.
config.dipa.agent_queue = :default_queue_for_dipa_agent_jobs
config.dipa.coordinator_queue = :default_queue_for_coordinator_queue_jobs
config.dipa.agent_timeout = 900
config.dipa.agent_processing_timeout = 600
config.dipa.coordinator_timeout = 0
config.dipa.coordinator_processing_timeout = 18000
config.dipa.agent_queue
defaults to config.active_job.default_queue_name
config.dipa.coordinator_queue
defaults to config.active_job.default_queue_name
config.dipa.agent_timeout
defaults to 0 (no timeout).config.dipa.agent_processing_timeout
defaults to 0 (no timeout).config.dipa.coordinator_timeout
defaults to 0 (no timeout).config.dipa.coordinator_processing_timeout
defaults to 0 (no timeout).
Usage
Minimum example:
Dipa.map(1..100).with('Integer', :sqrt)
More realistic examples:
Dipa.map(large_dataset, options: options).with('ProcessorClassName', :processor_class_method)
Dipa.each(large_dataset, options: options).with('ProcessorClassName', :processor_class_method)
Dipa.map
returns an Array
of the processed items. The result is in the same order as the input (large_dataset
).
Dipa.each
returns large_dataset.to_a
.
large_dataset
must be an Enumerable
.
options
is a hash. Following keys are allowed:
agent_queue:
[Symbol] Defaults to config.dipa.agent_queue
.coordinator_queue:
[Symbol] Defaults to config.dipa.coordinator_queue
.agent_timeout:
[Integer] Defaults to config.dipa.agent_timeout
.agent_processing_timeout:
[Integer] Defaults to config.dipa.agent_processing_timeout
.coordinator_timeout:
[Integer] Defaults to config.dipa.coordinator_timeout
.coordinator_processing_timeout:
[Integer] Defaults to config.dipa.coordinator_processing_timeout
.keep_data:
[true|false] Defaults to false
. Useful for debugging. After processing all Dipa::*
records and the
associated ActiveStorage data will be removed. If you don't want that to happen, set this to true
.
ProcessorClassName
must be a Class
or a String
. Defines the class which provides the processor method.
:processor_class_method
must be a Symbol
or a String
. Defines the method which is used to process each single
element of large_dataset
. MUST be a class method. MUST except just one element as argument.
TODO
TODO.md
Development
With nix
-
Having nix installed. See https://nixos.org/download.html for detailed instructions for your OS.
The shell sets up the environment for working with this repository and installs all required tools for this project.
Changes in the flake.nix
fix will trigger a rebuild of your devenv environment as soon as you hit the shell
(return key/(re-)enter shell). Specifically, it rebuilds parts that needs rebuild only. You can also enforce a
rebuild by executing direnv reload
.
Starting the shell the first time might take some minutes.
-
Run bundle install
.
-
Start services in another terminal window with devenv up
(as of 15.08.2023 it's mysql). The first
run will also setup the database.
-
Run bundle exec rake db:migrate
.
Without nix
After checking out the repo, run bin/setup
to install dependencies. Then, run bundle exec rspec
to run the tests.
You can also run bin/console
for an interactive prompt that will allow you to experiment.
Contributing
Bug reports and pull requests are welcome on Codeberg at https://codeberg.org/empunkt/dipa. This project is intended
to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the
code of conduct.
License
The gem is available as open source under the terms of the MIT License.
Code of Conduct
Everyone interacting in the Dipa project's codebases, issue trackers, chat rooms and mailing lists is expected to follow
the code of conduct.