vembed
Library for Generating Vector Embeddings, performing Similarity searches, and creating Visualizations from Data.
pip3 install vembed
Strings to Embeddings
- Convert a String to a Vector Embedding.
from vembed import string_to_embedding
input_string = "This is a test sentence."
embedding = string_to_embedding(input_string)
- Use Batching to Convert Lists of Strings to their Vector Float Representations.
from vembed import lists_to_embeddings
embeddings = lists_to_embeddings(["Convert to a List[Float]", "Another String", "More Strings!"])
Serialization
Functions for Embedding Serialization for Network Transfer.
from vembed import lists_to_embeddings, embeddings_to_proto_format, embeddings_to_json_format
embeddings = lists_to_embeddings(["CSV,Row,1,with,some,data" , "CSV,Row,2,with,other,cols"])
proto_embedding = embeddings_to_proto_format(embeddings)
json_embedding = embeddings_to_json_format(embeddings)
Similarity
Semantic Similarity Between Entities
Extract Insights such as Patterns or Relevancy from your Data.
- Calculating Similarity for Entities.
from vembed import calculate_similarities, plot_similarities
customer_feedback = ["Loved the recent update","The app is user-friendly",
"Facing issues after the update","The new interface is great"]
themes = ["positive feedback","negative feedback","app interface","app functionality"]
cos_df, dot_df = calculate_similarities(customer_feedback, themes, print_results=True)
Cosine Similarities:
Query: 'Loved the recent update'
Data: 'positive feedback' => Similarity Score: 0.45
Data: 'app functionality' => Similarity Score: 0.22
Data: 'app interface' => Similarity Score: 0.19
Data: 'negative feedback' => Similarity Score: 0.11
Query: 'Facing issues after the update'
Data: 'negative feedback' => Similarity Score: 0.31
Data: 'positive feedback' => Similarity Score: 0.27
Data: 'app interface' => Similarity Score: 0.24
Data: 'app functionality' => Similarity Score: 0.21
Dot Product Similarities:
Query: 'Loved the recent update'
Data: 'positive feedback' => Similarity Score: 4.51
Data: 'app functionality' => Similarity Score: 2.06
Data: 'negative feedback' => Similarity Score: 1.91
Data: 'app interface' => Similarity Score: 1.80
Query: 'Facing issues after the update'
Data: 'negative feedback' => Similarity Score: 2.92
Data: 'positive feedback' => Similarity Score: 2.51
Data: 'app interface' => Similarity Score: 2.07
Data: 'app functionality' => Similarity Score: 1.82
- Generating Clean, Beautiful Visualizations from Data.
from vembed import plot_similarities
plot_similarities(cos_df, dot_df, save_path="heatmaps/customer_feedback_similarity.png")
Cosine and Dot Product Vector Similarity Measures
@Coefficient Legend
Negative [ - ] - Low Similarity
Zero [ 0 ] - Orthogonal , No Commonality
Positive [ + ] - Strong Similarity
Cosine Similarity
@Usage
queries = ["Climate change effects on agriculture"]
data = [
"Effects of climate change on wheat production",
"Agriculture in developing countries",
"Climate change and its impact on global food security",
"Advances in agricultural technology"
]
cos_df, _ = calculate_similarities(queries, data, sorted=True, print_results=True)
Dot Product Similarity
-
Ranges between any Real Number
-
When both the Magnitude and Direction of the vectors are important, and we are dealing with vectors in a similar scale.
-
When the Frequency (Magnitude) as well as the Direction (Relevancy) is both important.
-
Use Case for Dot Product
- Direction (Types of Articles) and Magnitude (Frequency of Reading Habits) - both have relevancy.
@Usage
user_reading_profile = [
"Read many articles on machine learning",
"Occasionally reads about space exploration"
]
article_options = [
"Latest trends in machine learning",
"Beginner's guide to space travel",
"In-depth analysis of neural networks",
"Recent discoveries in astronomy"
]
_, dot_df = calculate_similarities(user_reading_profile, article_options, sorted=True, print_results=True)
@Usage
queries = [
"What is the capital of France?",
"How is the weather today?"
]
data = [
"Paris is the capital of France.",
"The weather is sunny.",
"Berlin is the capital of Germany.",
"It is raining in Berlin."
]
cos_df, dot_df = calculate_similarities(queries, data, sorted=True, print_results=True)
Visualizations
Generate Visualizations from Embeddings such as HeatMap Distributions
- Create a Visualization to display the Entity Similarities using a Heatmap.
@Usage
customer_feedback = [
"Loved the recent update",
"The app is user-friendly",
"Facing issues after the update",
"The new interface is great"
]
themes = [
"positive feedback",
"negative feedback",
"app interface",
"app functionality"
]
cos_df, dot_df = calculate_similarities(customer_feedback, themes, sorted=True)
plot_similarities(cos_df, dot_df, save_path="customer_feedback_similarity.png")
cos_df, _ = calculate_similarities(customer_feedback, themes, sorted=True)
plot_similarities(cos_df, None, save_path="customer_feedback_similarity.png")
_, dot_df = calculate_similarities(customer_feedback, themes, sorted=True)
plot_similarities(None, dot_df, save_path="customer_feedback_similarity.png")
Test Suite
Latest Test Run
- gRPC Tests
- JSON Tests
- Numpy Tests
- Serialization Tests
- Embedding Generation Tests
- Batched Embedding Generation Tests
- Custom Model Tests
- Caching Tests
- Transformer Metadata Tests
- Module Resolution Tests
Build and Run Locally from Source
git clone git@github.com:kuro337/vembed.git
python3 -m venv venv
source venv/bin/activate
pip install -e .
chmod +x RUN_TESTS.sh
./RUN_TESTS.sh
pip3 install build && python3 -m build
pip3 install ./vembed/dist/vembed-0.24-py3-none-any.whl
Dependencies
sentence_transformers
torch
transformers
pandas
matplotlib
seaborn
Note: vembed GPU Usage can be enabled from using Nvidia Cuda
and Torch
if a supported Nvidia Graphics Card is Available.
python3 tests/test_cuda.py
python3 -m unittest discover -s tests -v
Checking Virtual or System Environment Deps and Cache Size
du -h venv | sort -hr | head -n 10
2.8G venv/lib/python3.11/site-packages/nvidia
1.4G venv/lib/python3.11/site-packages/torch
1.3G venv/lib/python3.11/site-packages/torch/lib
1.2G venv/lib/python3.11/site-packages/nvidia/cudnn/lib
1.2G venv/lib/python3.11/site-packages/nvidia/cudnn
596M venv/lib/python3.11/site-packages/nvidia/cublas
pip cache dir
du -h /home/user/.cache/pip | sort -hr | head -n 10
pip cache purge
pip cache list
pip install --no-cache-dir <package_name>
Author: kuro337