Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

cdmbl

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

cdmbl

  • 0.18.0
  • Rubygems
  • Socket score

Version published
Maintainers
1
Created
Source

Build Status

CDMBL: CONTENTdm on Blacklight

Use Blacklight as a front end for your CONTENTdm instance.

At the moment, CDMBL consists only of a micro ETL system dedicated to extracting metadata records from a CONTENTdm instance (using the CONTENTdm API gem, transforming them into Solr documents, and loading them into Solr.

Installation

Add this line to your application's Gemfile:

gem 'cdmbl'

And then execute:

$ bundle

Or install it yourself as:

$ gem install cdmbl

Add the CDMBL rake task to your project Rakefile:

require 'cdmbl/rake_task'

GeoNames (optional)

In order to make use of the GeoNames service, you must purchase a GeoNames Premium Webservices Account. If you do not have a geonam field in your CONTENTdm schema, you may ignore this instruction. Add your credentials to your shell environment once you have secured a GeoNames user:

# e.g. within your .bash_profile or .zprofile file
export export GEONAMES_USER="yourusernamehere"

Usage

Run the ingester

rake cdmbl:batch[solr_url,oai_endpoint,cdm_endpoint,set_spec, batch_size, max_compounds]

ArgumentDefinition
solr_urlThe full URL to your Solr core instance (same as your blacklight.yml solr url)
oai_endpointA URL to your OAI instance (e.g. https://server16022.contentdm.oclc.org/oai/oai.php)
cdm_endpointA URL to your CONTENTdm API endpoint (e.g. https://server16022.contentdm.oclc.org/dmwebservices/index.php)
set_specSelectively harvest from a single collection with setSpec
batch_sizeThe number of records to transform at a time. Note: it is within the record transformation process that the CONTENTdm API is requested. This API can be sluggish, so we conservatively transform batches of ten records at a time to prevent timeouts.
max_compoundsCONTENTdm records with many compounds can take a long time to load from the CONTENTdm API as multiple requests must happen in order to get the metadata for each child record of a parent compound object. For this reason, records with ten or more compound children are, by default, processed in batches of one. This setting allows you to override this behavior.

For example:

rake "cdmbl:ingest[http://solr:8983/solr/foo-bar-core, https://server16022.contentdm.oclc.org/oai/oai.php, https://server16022.contentdm.oclc.org/dmwebservices/index.php, 2015-01-01]"

Custom Rake Tasks

You might also create your own rake task to run your modified field transformers:

require 'cdmbl'

namespace :cdmbl do
  desc "ingest batches of records"
  ##
  # e.g. rake mdl_ingester:ingest[2015-09-14, 2]
  task :batch, [:batch_size, :set_spec] => :environment  do |t, args|
    config  =
      {
        oai_endpoint: 'http://cdm16022.contentdm.oclc.org/oai/oai.php',
        cdm_endpoint: 'https://server16022.contentdm.oclc.org/dmwebservices/index.php',
        set_spec: (args[:set_spec] != '""') ? args[:set_spec] : nil,
        max_compounds: (args[:max_compounds]) ? args[:max_compounds] : 2,
        batch_size: (args[:batch_size]) ? args[:batch_size] : 30,
        solr_config: solr_config
      }
    CDMBL::ETLWorker.perform_async(config)
  end
end

Your Own Custom Solr Field Mappings (see above code snippet)

The default CONTENTdm to Solr field transformation rules may be overriden by calling the CDMBL::ETLWorker (a Sidekiq worker) directly. These rules may be found in the default_mappings method of the CDMBL::Transformer Class.

The transformer expects mappings in the following format:

def your_custom_field_mappings
  [
    {dest_path: 'title_tei', origin_path: 'title', formatters: [StripFormatter]},
  ]
end
ArgumentDefinition
dest_pathThe 'destination path' is the name of the field you will be sending to Solr for this field mapping.
origin_pathWhere to get the field data from the original record for this mapping.
formattersFormatters perform tasks such as stripping white space or splitting CONTENTdm multi-valued fields (delimited by semicolons) into JSON arrays.

Note: The first formatter receives the value found at the declared origin_path. Each formatter declared after the initial formatter will receive a value produced by the preceding formatter.

Formatters are very simple stateless classes that take a value, do something to it, and respond with a modified version of this value via a class method called format. Examples of other formatters may be found in the Formatters file. For Example:

  class SplitFormatter
    def self.format(value)
      (value.respond_to?(:split)) ? value.split(';') : value
    end
  end

You might also want to simply override some of the default mappings or add your own:

mappings = CDMBL::Transformer.default_mappings.merge(your_custom_field_mappings)

A Custom Post-indexing Callback

If you would like to perform some action (e.g. send an email) following the completion of the CDMBL indexing process, you may declare your own callback hook (anything with "Callback" in the class name declared within the CDMBL module space will be used). To do so in Rails, create a Rails initializer file config/initializers/cdmbl.rb:

module CDMBL
  class Callback
    def self.call!
      Rails.logger.info("My Custom CDMBL Callback")
    end
  end
end

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/UMNLibraries/cdmbl. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

MIT

TODO

  • Make StripFormatter the default formatter so it doesn't need to be declared for every field
  • Re-brand project: CONTENTdm Indexer. CDMBL doesn't necessarily require Blacklight. Moreover only handles indexing.

FAQs

Package last updated on 21 Mar 2021

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc