You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP →

Book a Demo Install Sign in

vembed

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

vembed

Package providing methods to create Vector Embeddings from Strings, calculate similarities between lists of Strings, and Generate Visualizations such as Heatmaps from simple Lists.

0.242

PyPI

Maintainers: 1

vembed

Library for Generating Vector Embeddings, performing Similarity searches, and creating Visualizations from Data.


pip3 install vembed

Strings to Embeddings

Convert a String to a Vector Embedding.

from vembed import string_to_embedding

input_string = "This is a test sentence."

embedding = string_to_embedding(input_string)

#  [0.337, 0.143, 0.714 ...]

Use Batching to Convert Lists of Strings to their Vector Float Representations.

from vembed import lists_to_embeddings

embeddings = lists_to_embeddings(["Convert to a List[Float]", "Another String", "More Strings!"])

# print(embeddings) [[0.123, 0.456, ...], [0.789, 0.012, ...]]

Serialization

Functions for Embedding Serialization for Network Transfer.

Protobuf Serialization for usage with gRPC Services
JSON Serialization for usage with REST API's

from vembed import lists_to_embeddings, embeddings_to_proto_format, embeddings_to_json_format

embeddings = lists_to_embeddings(["CSV,Row,1,with,some,data" , "CSV,Row,2,with,other,cols"])

# Convert to a Protobuf Serializable Format to send over a gRPC Service
proto_embedding = embeddings_to_proto_format(embeddings)

# Convert to a JSON String for usage with REST API's
json_embedding = embeddings_to_json_format(embeddings)

Similarity

Semantic Similarity Between Entities

Extract Insights such as Patterns or Relevancy from your Data.

Calculating Similarity for Entities.

from vembed import calculate_similarities, plot_similarities

customer_feedback = ["Loved the recent update","The app is user-friendly",
                    "Facing issues after the update","The new interface is great"]

themes            = ["positive feedback","negative feedback","app interface","app functionality"]

cos_df, dot_df = calculate_similarities(customer_feedback, themes, print_results=True)

# Prints and Returns Results

# Results

Cosine Similarities:

Query: 'Loved the recent update'
  Data: 'positive feedback' => Similarity Score: 0.45
  Data: 'app functionality' => Similarity Score: 0.22
  Data: 'app interface'     => Similarity Score: 0.19
  Data: 'negative feedback' => Similarity Score: 0.11

Query: 'Facing issues after the update'
  Data: 'negative feedback' => Similarity Score: 0.31
  Data: 'positive feedback' => Similarity Score: 0.27
  Data: 'app interface'     => Similarity Score: 0.24
  Data: 'app functionality' => Similarity Score: 0.21

Dot Product Similarities:

Query: 'Loved the recent update'
  Data: 'positive feedback' => Similarity Score: 4.51
  Data: 'app functionality' => Similarity Score: 2.06
  Data: 'negative feedback' => Similarity Score: 1.91
  Data: 'app interface'     => Similarity Score: 1.80

Query: 'Facing issues after the update'
  Data: 'negative feedback' => Similarity Score: 2.92
  Data: 'positive feedback' => Similarity Score: 2.51
  Data: 'app interface'     => Similarity Score: 2.07
  Data: 'app functionality' => Similarity Score: 1.82

Generating Clean, Beautiful Visualizations from Data.

from vembed import plot_similarities

# .... cos_df, dot_df = calculate_similarities(queries, data)

# Create HeapMap for Visualizing Relationships

plot_similarities(cos_df, dot_df, save_path="heatmaps/customer_feedback_similarity.png")

# View and access the Heatmap at /heatmaps/customer_feedback_similarity.png

Cosine and Dot Product Vector Similarity Measures

@Coefficient Legend

Negative [ - ] - Low Similarity
Zero     [ 0 ] - Orthogonal , No Commonality
Positive [ + ] - Strong Similarity

Cosine Similarity

Ranges between -1 and 1
Recommended when the Context and Similarity is important - and Frequency is not important (Magnitude)

Use Case for Cosine Similarity
- In the following example, Direction (thematic orienation) - climate change, agriculture is relevant.
- Cosine Similarity is useful here as we want to find the relevancy of documents discussing similar topics (direction) - irrespective of the length of frequency of specific words (Magnitude)

@Usage

queries = ["Climate change effects on agriculture"]

data =    [
           "Effects of climate change on wheat production",
           "Agriculture in developing countries",
           "Climate change and its impact on global food security",
           "Advances in agricultural technology"
          ]

# Calculate cosine similarities
cos_df, _ = calculate_similarities(queries, data, sorted=True, print_results=True)

Dot Product Similarity

Ranges between any Real Number
When both the Magnitude and Direction of the vectors are important, and we are dealing with vectors in a similar scale.
When the Frequency (Magnitude) as well as the Direction (Relevancy) is both important.

Use Case for Dot Product
- Direction (Types of Articles) and Magnitude (Frequency of Reading Habits) - both have relevancy.

@Usage

user_reading_profile = [
                        "Read many articles on machine learning", 
                        "Occasionally reads about space exploration" 
                       ]

article_options      = [
                        "Latest trends in machine learning",
                        "Beginner's guide to space travel",
                        "In-depth analysis of neural networks",
                        "Recent discoveries in astronomy"
                       ]

# Calculate dot product similarities

_, dot_df = calculate_similarities(user_reading_profile, article_options, sorted=True, print_results=True)

Calculating Similarity

@Usage

queries =  [ 
            "What is the capital of France?", 
            "How is the weather today?"
           ]

data    =  [
             "Paris is the capital of France.",
             "The weather is sunny.",
             "Berlin is the capital of Germany.",
             "It is raining in Berlin."
           ]

# Calculate similarities and Print Results

cos_df, dot_df = calculate_similarities(queries, data, sorted=True, print_results=True)


# Cosine Similarities:
# ...

# Dot Product Similarities:
# ...

Visualizations

Generate Visualizations from Embeddings such as HeatMap Distributions

Create a Visualization to display the Entity Similarities using a Heatmap.

@Usage

customer_feedback = [
                      "Loved the recent update",
                      "The app is user-friendly",
                      "Facing issues after the update",
                      "The new interface is great"
                    ]

themes            = [
                      "positive feedback",
                      "negative feedback",
                      "app interface",
                      "app functionality"
                    ]

# Heatmap of Both Cosine and Dot Product
cos_df, dot_df = calculate_similarities(customer_feedback, themes, sorted=True)
plot_similarities(cos_df, dot_df, save_path="customer_feedback_similarity.png")


# Heatmap of Only Cosine Similarity
cos_df, _ = calculate_similarities(customer_feedback, themes, sorted=True)
plot_similarities(cos_df, None, save_path="customer_feedback_similarity.png")


# Heatmap of Only Dot Product Similarity
_, dot_df = calculate_similarities(customer_feedback, themes, sorted=True)
plot_similarities(None, dot_df, save_path="customer_feedback_similarity.png")


# View customer_feedback_similarity.png to see the Heatmap

Test Suite

Latest Test Run

gRPC Tests
JSON Tests
Numpy Tests
Serialization Tests
Embedding Generation Tests
Batched Embedding Generation Tests
Custom Model Tests
Caching Tests
Transformer Metadata Tests
Module Resolution Tests

Build and Run Locally from Source

git clone git@github.com:kuro337/vembed.git

# Create Isolated Virtual Env
python3 -m venv venv
source venv/bin/activate

# Install Deps
pip install -e .

# Run Tests
chmod +x RUN_TESTS.sh
./RUN_TESTS.sh

# Create Dist 
pip3 install build && python3 -m build

# Use Built Dist in any project
pip3 install ./vembed/dist/vembed-0.24-py3-none-any.whl

Dependencies

sentence_transformers
torch
transformers
pandas
matplotlib
seaborn

Note: vembed GPU Usage can be enabled from using Nvidia Cuda and Torch if a supported Nvidia Graphics Card is Available.

Run CUDA Tests from the vembed Test Suite to check Nvidia System GPU availability

# Nvidia CUDA and PyTorch Test
python3 tests/test_cuda.py

# Run all Tests
python3 -m unittest discover -s tests -v

Checking Virtual or System Environment Deps and Cache Size


# Check Disk Allocation for Packages 
du -h venv | sort -hr | head -n 10

2.8G    venv/lib/python3.11/site-packages/nvidia
1.4G    venv/lib/python3.11/site-packages/torch
1.3G    venv/lib/python3.11/site-packages/torch/lib
1.2G    venv/lib/python3.11/site-packages/nvidia/cudnn/lib
1.2G    venv/lib/python3.11/site-packages/nvidia/cudnn
596M    venv/lib/python3.11/site-packages/nvidia/cublas

# Checking System Cache

# Show pip cache location
pip cache dir # /home/user/.cache/pip

# Getting Top Folders from Cache by Size
du -h /home/user/.cache/pip | sort -hr | head -n 10

# Remove Cached Files
pip cache purge 

# Cached Files
pip cache list

# Installing Packages without Cache
pip install --no-cache-dir <package_name>

Author: kuro337

FAQs

What is vembed?

Is vembed well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

vembed

vembed

Strings to Embeddings

Serialization

Similarity

Cosine and Dot Product Vector Similarity Measures

Cosine Similarity

Dot Product Similarity

Visualizations

Test Suite

Build and Run Locally from Source

Dependencies

Related posts

Contagious Interview Campaign Escalates With 67 Malicious npm Packages and New Malware Loader

Meet Socket at Black Hat and DEF CON 2025 in Las Vegas