
Security News
Axios Supply Chain Attack Reaches OpenAI macOS Signing Pipeline, Forces Certificate Rotation
OpenAI rotated macOS signing certificates after a malicious Axios package reached its CI pipeline in a broader software supply chain attack.
annembed-ruby
Advanced tools
High-performance dimensionality reduction for Ruby, powered by the annembed Rust crate.
Add this line to your application's Gemfile:
gem 'annembed-ruby'
And then execute:
$ bundle install
Or install it yourself as:
$ gem install annembed-ruby
require 'annembed'
# Generate some sample data (2D array)
data = Array.new(1000) { Array.new(50) { rand } }
# Perform UMAP embedding
embedding = AnnEmbed.umap(data, n_components: 2, n_neighbors: 15)
# Or use the full API
embedder = AnnEmbed::Embedder.new(
method: :umap,
n_components: 2,
n_neighbors: 15,
min_dist: 0.1
)
embedding = embedder.fit_transform(data)
# Save the model for later use
embedder.save("model.ann")
# Load and transform new data
embedder = AnnEmbed::Embedder.load("model.ann")
new_embedding = embedder.transform(new_data)
This is a common use case when working with large language model embeddings (e.g., from OpenAI, Cohere, etc.) that need to be stored efficiently in a database while maintaining their semantic properties.
require 'annembed'
require 'json'
# Load your training embeddings from database
# These should be representative of your data distribution
training_embeddings = fetch_embeddings_from_database(limit: 10000) # Array of 1024-dim vectors
# Create and configure the UMAP model
embedder = AnnEmbed::Embedder.new(
method: :umap,
n_components: 64, # Reduce to 64 dimensions
n_neighbors: 30, # Higher for more global structure preservation
min_dist: 0.05, # Lower values = tighter clusters
metric: :euclidean, # or :cosine for normalized embeddings
random_seed: 42 # For reproducibility
)
# Train the model (this may take a few minutes for large datasets)
puts "Training UMAP model on #{training_embeddings.length} embeddings..."
reduced_embeddings = embedder.fit_transform(training_embeddings)
# Save the trained model to disk
MODEL_PATH = "models/umap_1024_to_64.ann"
embedder.save(MODEL_PATH)
puts "Model saved to #{MODEL_PATH}"
# Optionally, update your database with the reduced training embeddings
reduced_embeddings.each_with_index do |embedding, idx|
update_database_embedding(training_ids[idx], embedding)
end
require 'annembed'
# Load the pre-trained model once (e.g., at application startup)
EMBEDDER = AnnEmbed::Embedder.load("models/umap_1024_to_64.ann")
# Function to reduce a single embedding
def reduce_embedding(high_dim_embedding)
# Input: 1024-dimensional array
# Output: 64-dimensional array
EMBEDDER.transform([high_dim_embedding]).first
end
# Example: Process a new document
document = "Your text content here..."
high_dim_embedding = generate_embedding(document) # Returns 1024-dim vector
low_dim_embedding = reduce_embedding(high_dim_embedding)
# Store in database
save_to_database(document_id: 123,
embedding: low_dim_embedding,
original_embedding: high_dim_embedding) # Optionally keep original
# For better performance when processing multiple embeddings
def reduce_embeddings_batch(high_dim_embeddings)
# Input: Array of 1024-dimensional arrays
# Output: Array of 64-dimensional arrays
EMBEDDER.transform(high_dim_embeddings)
end
# Example: Process a batch of new documents
documents = fetch_new_documents(limit: 100)
high_dim_embeddings = documents.map { |doc| generate_embedding(doc.text) }
# Reduce all at once (much faster than one-by-one)
low_dim_embeddings = reduce_embeddings_batch(high_dim_embeddings)
# Bulk insert to database
documents.zip(low_dim_embeddings).each do |doc, embedding|
save_to_database(document_id: doc.id, embedding: embedding)
end
As your data distribution changes over time, you may want to retrain the model:
# Schedule this monthly/quarterly
def update_dimension_reduction_model
# Get recent embeddings that represent current data distribution
recent_embeddings = fetch_embeddings_from_database(
where: "created_at > ?",
date: 3.months.ago,
limit: 20000
)
# Train new model
new_embedder = AnnEmbed::Embedder.new(
method: :umap,
n_components: 64,
n_neighbors: 30,
min_dist: 0.05
)
new_embedder.fit_transform(recent_embeddings)
# Save with timestamp
model_path = "models/umap_1024_to_64_#{Date.today}.ann"
new_embedder.save(model_path)
# Test new model before deploying
test_embeddings = fetch_test_embeddings()
new_results = new_embedder.transform(test_embeddings)
if validate_results(new_results)
# Update symlink or config to point to new model
File.symlink(model_path, "models/umap_1024_to_64_current.ann")
end
end
The .ann model file contains:
The model file enables you to:
AnnEmbed.umap(data,
n_components: 2,
n_neighbors: 15,
min_dist: 0.1,
spread: 1.0
)
AnnEmbed.tsne(data,
n_components: 2,
perplexity: 30.0,
learning_rate: 200.0
)
embedder = AnnEmbed::Embedder.new(method: :largevis)
embedding = embedder.fit_transform(data)
embedder = AnnEmbed::Embedder.new(method: :diffusion)
embedding = embedder.fit_transform(data)
dimension = AnnEmbed.estimate_dimension(data, k: 10)
puts "Estimated intrinsic dimension: #{dimension}"
u, s, v = AnnEmbed.svd(matrix, k: 50)
n_threads optionef_construction and max_nb_connectionSee the examples/ directory for more detailed examples:
mnist_embedding.rb: Embedding MNIST digitstext_embedding.rb: Embedding text datavisualization.rb: Plotting embeddingsAfter checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests.
To install this gem onto your local machine, run bundle exec rake install.
Bug reports and pull requests are welcome on GitHub at https://github.com/yourusername/annembed-ruby.
The gem is available as open source under the terms of the MIT License.
This gem wraps the excellent annembed Rust crate by Jean-Pierre Both.
FAQs
Unknown package
We found that annembed-ruby demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
OpenAI rotated macOS signing certificates after a malicious Axios package reached its CI pipeline in a broader software supply chain attack.

Security News
Open source is under attack because of how much value it creates. It has been the foundation of every major software innovation for the last three decades. This is not the time to walk away from it.

Security News
Socket CEO Feross Aboukhadijeh breaks down how North Korea hijacked Axios and what it means for the future of software supply chain security.