
Security News
ESLint Adds Support for Parallel Linting, Closing 10-Year-Old Feature Request
ESLint now supports parallel linting with a new --concurrency flag, delivering major speed gains and closing a 10-year-old feature request.
mongoid-collection-snapshot
Advanced tools
Easy maintenance of collections of processed data in MongoDB with Mongoid 3, 4, 5 and 6.
This is a forked, renamed, maintained and supported version of mongoid_collection_snapshot.
Suppose that you have a Mongoid model called Artwork
, stored in a MongoDB collection called artworks
and the underlying documents look something like:
{ name: 'Flowers', artist_id: ..., price: 3000000 }
From time to time, your system runs a map/reduce job to compute the total price of all artist's works, resulting in a collection called artist_artwork_price
that contains documents that look like:
{ _id: ..., artist_id: ..., sum: 1500000 }
If your system wants to maintain and use this price data, it has to do so at the level of raw MongoDB operations, since map/reduce result documents don't map well to models in Mongoid. Furthermore, even though map/reduce jobs can take some time to run, you probably want the entire artist_artwork_price
collection populated atomically from the point of view of your system, since otherwise you don't ever know the state of the data in the collection - you could access it in the middle of a map/reduce and get partial, incorrect results.
A mongoid-collection-snapshot solves this problem by providing an atomic view of collections of data like map/reduce results that live outside of Mongoid.
In the example below, we set up our artist price sum collection by including Mongoid::CollectionSnapshot
and implementing a build
method.
class ArtistArtworkPrice
include Mongoid::CollectionSnapshot
def build
map = <<-EOS
function() {
emit({ artist_id: this['artist_id']}, { count: 1, sum: this['price'] })
}
EOS
reduce = <<-EOS
function(key, values) {
var sum = 0;
var count = 0;
values.forEach(function(value) {
sum += value['sum'];
count += value['count'];
});
return({ count: count, sum: sum });
}
EOS
Artwork.map_reduce(map, reduce).out(inline: 1).each do |doc|
collection_snapshot.insert_one(
artist_id: doc['_id']['artist_id'],
count: doc['value']['count'],
sum: doc['value']['sum']
)
end
end
end
Now, if you want to schedule a recomputation, just call ArtistArtworkPrice.create
.
The latest snapshot is always available as ArtistArtworkPrice.latest
.
andy_warhol = Artist.where(name: 'Andy Warhol').first
andy_warhol_price = ArtistArtworkPrice.latest.collection_snapshot.where(artist_id: andy_warhol.id).first
average_price = andy_warhol_price['sum'] / andy_warhol_price['count']
By default, mongoid-collection-snapshot maintains the most recent two snapshots computed any given time. Set max_collection_snapshot_instances
to change this.
ArtistArtworkPrice.max_collection_snapshot_instances = 3
You can do better than the average price example above and define first-class models for your collection snapshot data, then access them as any other Mongoid collection via collection snapshot's .documents
method.
class AverageArtistPrice
document do
belongs_to :artist, inverse_of: nil
field :sum, type: Integer
field :count, type: Integer
end
def average_price(artist_name)
artist = Artist.where(name: artist_name).first
doc = documents.where(artist: artist).first
doc.sum / doc.count
end
end
The following example iterates through all latest artist price averages.
AverageArtistPrice.latest.documents.each do |doc|
puts "#{doc.artist.name}: #{doc.sum / doc.count}"
end
This code can be found in the example folder.
You can maintain multiple collections atomically within the same snapshot by passing unique collection identifiers to collection_snaphot
when you call it in your build or query methods:
class ArtistArtworkPrice
include Mongoid::CollectionSnapshot
def build
# define map/reduce for average and max aggregations
Mongoid.default_session.command('mapreduce' => 'artworks', map: map_avg, reduce: reduce_avg, out: collection_snapshot('average'))
Mongoid.default_session.command('mapreduce' => 'artworks', map: map_max, reduce: reduce_max, out: collection_snapshot('max'))
end
def average_price(artist)
doc = collection_snapshot('average').find('_id.artist' => artist).first
doc['value']['sum'] / doc['value']['count']
end
def max_price(artist)
doc = collection_snapshot('max').find('_id.artist' => artist).first
doc['value']['max']
end
end
Specify the name of the collection to define first class Mongoid models.
class ArtistArtworkPrice
document('average') do
field :value, type: Hash
end
document('max') do
field :value, type: Hash
end
end
Access these by name.
ArtistArtworkPrice.latest.documents('average')
ArtistArtworkPrice.latest.documents('max')
If fields across multiple collection snapshots are identical, a single default document
is sufficient.
class ArtistArtworkPrice
document do
field :value, type: Hash
end
end
Your class can specify a custom database for storage of collection snapshots by overriding the snapshot_session
instance method. In this example, we memoize the connection at the class level to avoid creating many separate connection instances.
class ArtistArtworkPrice
include Mongoid::CollectionSnapshot
def build
# ...
end
def snapshot_session
@snapshot_session ||= Mongo::Client.new('mongodb://localhost:27017/artists_and_artworks')
end
end
Another common way of configuring this is through mongoid.yml.
development:
sessions:
default:
database: dev_data
imports:
database: dev_imports
def snapshot_session
Mongoid.session('imports')
end
Copyright (c) 2011-2017 Art.sy Inc. and Contributors
MIT License, see LICENSE.txt for details.
FAQs
Unknown package
We found that mongoid-collection-snapshot demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
ESLint now supports parallel linting with a new --concurrency flag, delivering major speed gains and closing a 10-year-old feature request.
Research
/Security News
A malicious Go module posing as an SSH brute forcer exfiltrates stolen credentials to a Telegram bot controlled by a Russian-speaking threat actor.
Security News
Rspack launches Rslint, a fast TypeScript-first linter built on typescript-go, joining in on the trend of toolchains creating their own linters.