
Research
Security News
The Growing Risk of Malicious Browser Extensions
Socket researchers uncover how browser extensions in trusted stores are used to hijack sessions, redirect traffic, and manipulate user behavior.
singlestore-vectorstore
Advanced tools
A high-performance vector database library for storing and querying vector embeddings in SingleStore DB. Designed to efficiently manage and search through high-dimensional vector data for AI/ML applications, semantic search, and recommendation systems.
Install the package using pip:
pip install singlestore-vectorstore
SingleStore VectorStore is a Python library that provides:
from vectorstore import VectorDB, Metric, Vector
# Initialize the VectorDB
db = VectorDB(
host="localhost",
user="root",
password="password",
database="embeddings_db"
)
# Create an index
db.create_index(
name="my_embeddings",
dimension=1536, # e.g., for OpenAI embeddings
metric=Metric.COSINE,
)
# Get a reference to the index
index = db.Index("my_embeddings")
# Add vectors to the index
vectors = [
Vector(id="doc1", vector=[0.1, 0.2, 0.3, ...], metadata={"source": "article"}),
Vector(id="doc2", vector=[0.2, 0.3, 0.4, ...], metadata={"source": "webpage"})
]
index.upsert(vectors)
# Find similar vectors
results = index.query(
vector=[0.15, 0.25, 0.35, ...],
top_k=5,
include_metadata=True
)
# Print results
for match in results:
print(f"ID: {match['id']}, Score: {match['score']}, Metadata: {match['metadata']}")
There are several ways to connect to SingleStore DB:
Direct connection parameters can be passed as separate parameters:
from vectorstore import VectorDB
db = VectorDB(
host="localhost",
port=3306,
user="root",
password="password",
database="vectors"
)
Or as a connection URL:
from vectorstore import VectorDB
db = VectorDB(
host="root:password@localhost:3306/vectors"
)
Or as environment variables:
os.environ['SingleStore_URL'] = 'me:p455w0rd@s2-host.com/my_db'
db = VectorDB()
The VectorDB supports all ways of connection supported by original singlestordb python client.
from singlestoredb import connect
from vectorstore import VectorDB
# Create a connection
connection = connect(
host="localhost",
user="root",
password="password",
database="vectors"
)
# Use the existing connection
db = VectorDB(connection=connection)
from sqlalchemy.pool import QueuePool
from singlestoredb import connect
from vectorstore import VectorDB
# Create a connection pool
def create_connection():
return connect(
host="localhost",
user="root",
password="password",
database="vectors"
)
connection_pool = QueuePool(
creator=create_connection,
pool_size=10,
max_overflow=20,
timeout=30
)
# Use the connection pool
db = VectorDB(connection_pool=connection_pool)
from vectorstore import VectorDB, Metric, DeletionProtection
db = VectorDB(host="localhost", user="root", password="password", database="vectors")
# Create a simple index
basic_index = db.create_index(
name="basic_index",
dimension=1536,
)
# Create a more customized index
custom_index = db.create_index(
name="custom_index",
dimension=768,
metric=Metric.EUCLIDEAN,
deletion_protection=DeletionProtection.ENABLED,
tags={"model": "sentence-transformers", "version": "v1.0"},
use_vector_index=True,
vector_index_options={
"index_type": "IVF_PQFS",
"nlist": 1024,
"nprobe": 20
}
)
When creating an index with use_vector_index=True
, you can configure various index types and parameters to optimize for your specific use case. SingleStore supports several vector index types, each with different performance characteristics:
vector_index_options={
"index_type": "IVF_FLAT", # Specify the index type
"nlist": 1024, # Number of clusters/centroids
"nprobe": 20, # Number of clusters to search during query time
# Additional parameters specific to each index type...
}
FLAT
IVF_FLAT (Inverted File with Flat Quantizer)
nlist
: Number of centroids/clusters (default 100, higher values improve accuracy but slow down indexing)nprobe
: Number of clusters to search at query time (default 1, higher values improve accuracy but slow down search)IVF_SQ (Inverted File with Scalar Quantization)
nlist
, nprobe
: Same as IVF_FLATqtype
: Quantizer type, either "QT8" (8-bit) or "QT4" (4-bit)IVF_PQ (Inverted File with Product Quantization)
nlist
, nprobe
: Same as IVF_FLATm
: Number of subvectors (default: dimension / 2)nbits
: Bits per subvector (default: 8)IVF_PQFS (Inverted File with PQ Fast Scan)
nlist
, nprobe
: Same as IVF_FLATm
: Number of subvectors (must be multiple of 4)nbits
: Bits per subvector (must be 8)HNSW (Hierarchical Navigable Small World)
M
: Number of edges per node (default: 12)efConstruction
: Size of dynamic list during construction (default: 40)ef
: Size of dynamic list during search (default: 10)random_seed
: Random seed for reproducibility (default: current time)nlist
: Improves search speed but requires more memory and longer index build timenprobe
: Improves accuracy but slows down searchesm
values: Faster search but lower accuracym
values: Better accuracy but slower searchM
values: Better accuracy but larger index size and longer build timeef
values: Better accuracy but slower searchFor complete details on vector indexing options, see the SingleStore Vector Indexing documentation.
# Get all indexes
indexes = db.list_indexes()
# Print index details
for idx in indexes:
print(f"Index: {idx.name}, Dimension: {idx.dimension}, Metric: {idx.metric}")
# Get detailed information about an index
index_info = db.describe_index("my_index")
print(f"Name: {index_info.name}")
print(f"Dimension: {index_info.dimension}")
print(f"Metric: {index_info.metric}")
print(f"Deletion Protection: {index_info.deletion_protection}")
print(f"Tags: {index_info.tags}")
print(f"Uses Vector Index: {index_info.use_vector_index}")
print(f"Vector Index Options: {index_info.vector_index_options}")
# Update index settings
db.configure_index(
name="my_index",
deletion_protection=DeletionProtection.ENABLED,
tags={"updated": "true", "version": "v2.0"},
use_vector_index=True,
vector_index_options={
"index_type": "IVF_FLAT",
"nlist": 2048
}
)
if db.has_index("my_index"):
print("Index exists")
else:
print("Index doesn't exist")
# Delete an index
db.delete_index("my_index")
# This will fail if deletion protection is enabled
try:
db.delete_index("protected_index")
except ValueError as e:
print(f"Could not delete: {e}")
from vectorstore import Vector
# Method 1: Using Vector class
vectors = [
Vector(id="vec1", vector=[0.1, 0.2, 0.3], metadata={"category": "A"}),
Vector(id="vec2", vector=[0.4, 0.5, 0.6], metadata={"category": "B"})
]
# Method 2: Using tuples (id, values)
vectors_tuples = [
("vec3", [0.7, 0.8, 0.9]),
("vec4", [0.10, 0.11, 0.12])
]
# Method 3: Using tuples with metadata (id, values, metadata)
vectors_with_meta = [
("vec5", [0.13, 0.14, 0.15], {"category": "C"}),
("vec6", [0.16, 0.17, 0.18], {"category": "D"})
]
# Method 4: Using dictionaries
vectors_dict = [
{"id": "vec7", "values": [0.19, 0.20, 0.21], "metadata": {"category": "E"}},
{"id": "vec8", "values": [0.22, 0.23, 0.24], "metadata": {"category": "F"}}
]
# Get index reference
index = db.Index("my_index")
# Insert vectors
count = index.upsert(vectors)
print(f"Inserted {count} vectors")
# Insert with namespace
index.upsert(vectors_tuples, namespace="group1")
index.upsert(vectors_with_meta, namespace="group2")
import pandas as pd
# Create a DataFrame with vector data
df = pd.DataFrame([
{"id": "vec1", "values": [0.1, 0.2, 0.3], "metadata": {"category": "A"}},
{"id": "vec2", "values": [0.4, 0.5, 0.6], "metadata": {"category": "B"}}
])
# Upsert from DataFrame
count = index.upsert_from_dataframe(df, namespace="pandas_import")
print(f"Imported {count} vectors from DataFrame")
# Update vector values
index.update(
id="vec1",
values=[0.25, 0.35, 0.45]
)
# Update metadata only
index.update(
id="vec2",
set_metadata={"category": "updated", "version": 2}
)
# Update both values and metadata with namespace
index.update(
id="vec3",
values=[0.55, 0.65, 0.75],
set_metadata={"processed": True},
namespace="group1"
)
# Get vectors by ID
vectors = index.fetch(
ids=["vec1", "vec2", "vec3"]
)
# Get vectors by ID with namespace
vectors_in_namespace = index.fetch(
ids=["vec3", "vec4"],
namespace="group1"
)
# Access vector data
for vec_id, vec_obj in vectors.items():
print(f"ID: {vec_id}")
print(f"Vector: {vec_obj.vector[:5]}...") # Print first 5 elements
print(f"Metadata: {vec_obj.metadata}")
# Delete vectors by ID
index.delete(ids=["vec1", "vec2"])
# Delete vectors by ID in a namespace
index.delete(ids=["vec3", "vec4"], namespace="group1")
# Delete all vectors in a namespace
index.delete(delete_all=True, namespace="group2")
# Delete vectors matching a filter
index.delete(
filter={"category": "A"},
namespace="pandas_import"
)
# List all vector IDs
ids = index.list()
# List vectors with a prefix
ids_with_prefix = index.list(prefix="doc_")
# List vectors in a namespace
ids_in_namespace = index.list(namespace="group1")
# Get statistics about the index
stats = index.describe_index_stats()
print(f"Dimension: {stats['dimension']}")
print(f"Total Vector Count: {stats['total_vector_count']}")
# Namespace statistics
for ns_name, ns_stats in stats['namespaces'].items():
print(f"Namespace: {ns_name}, Vectors: {ns_stats['vector_count']}")
# Get filtered statistics
filtered_stats = index.describe_index_stats(
filter={"category": "A"}
)
# Query by vector values
results = index.query(
vector=[0.1, 0.2, 0.3, ...],
top_k=5
)
# Print results
for match in results:
print(f"ID: {match['id']}, Score: {match['score']}")
# Query with metadata and vector values in response
results = index.query(
vector=[0.1, 0.2, 0.3, ...],
top_k=10,
include_metadata=True,
include_values=True
)
# Query by existing vector ID
results = index.query(
id="vec1", # Use this vector's values for the query
top_k=5
)
# Query within a namespace
results = index.query(
vector=[0.1, 0.2, 0.3, ...],
namespace="group1",
top_k=5
)
# Query across multiple namespaces
results = index.query(
vector=[0.1, 0.2, 0.3, ...],
namespaces=["group1", "group2"],
top_k=5
)
# Simple equality filter
results = index.query(
vector=[0.1, 0.2, 0.3, ...],
top_k=10,
filter={"category": "A"}
)
# Comparison operators
results = index.query(
vector=[0.1, 0.2, 0.3, ...],
top_k=10,
filter={"year": {"$gt": 2020}}
)
# Multiple conditions with AND
results = index.query(
vector=[0.1, 0.2, 0.3, ...],
top_k=10,
filter={
"$and": [
{"category": "article"},
{"year": {"$gte": 2020}}
]
}
)
# Multiple conditions with OR
results = index.query(
vector=[0.1, 0.2, 0.3, ...],
top_k=10,
filter={
"$or": [
{"category": "article"},
{"category": "blog"}
]
}
)
# Check if field exists
results = index.query(
vector=[0.1, 0.2, 0.3, ...],
top_k=10,
filter={"author": {"$exists": True}}
)
# Collection operators
results = index.query(
vector=[0.1, 0.2, 0.3, ...],
top_k=10,
filter={"category": {"$in": ["article", "blog", "news"]}}
)
Vector indexes significantly accelerate similarity searches, especially with large datasets, but there's always a tradeoff between search speed and accuracy. Higher accuracy settings typically result in slower searches, while faster searches may return slightly less optimal results.
# Disable vector index for this query
results = index.query(
vector=[0.1, 0.2, 0.3, ...],
top_k=10,
disable_vector_index_use=True # Force brute-force search for maximum accuracy
)
# Customize search options based on index type
results = index.query(
vector=[0.1, 0.2, 0.3, ...],
top_k=10,
search_options={
# Parameters vary by index type
"nprobe": 50, # For IVF-based indexes
"ef": 100 # For HNSW indexes
}
)
Each vector index type supports different search-time parameters that control the speed vs. accuracy tradeoff:
ALL TYPES
python search_options={ "k": 50 # number of rows outputted by vector index scan. k must be >= top_k }
FLAT
IVF_FLAT, IVF_SQ, IVF_PQ, IVF_PQFS
search_options={
"nprobe": 20 # Number of clusters to search (higher = more accurate, but slower)
# Default is 1, common range: 5-100 depending on dataset size
}
HNSW
search_options={
"ef": 40 # Size of dynamic candidate list (higher = more accurate, but slower)
# Default is 10, common range: 20-200 depending on dataset size
}
nprobe
or ef
)import time
# Measure search time vs. accuracy tradeoff
for nprobe in [1, 10, 50, 100]:
start = time.time()
results = index.query(
vector=query_vector,
top_k=10,
search_options={"nprobe": nprobe}
)
end = time.time()
print(f"nprobe={nprobe}, time={end-start:.4f}s")
# Compare results with ground truth if available
For more details on vector index parameters, refer to the SingleStore Vector Indexing documentation.
# Create indexes with different metrics
cosine_index = db.create_index(
name="cosine_index",
dimension=1536,
metric=Metric.COSINE # Normalized dot product, best for comparing directions
)
dotproduct_index = db.create_index(
name="dotproduct_index",
dimension=1536,
metric=Metric.DOTPRODUCT # Raw dot product, good for comparing direction and magnitude
)
euclidean_index = db.create_index(
name="euclidean_index",
dimension=1536,
metric=Metric.EUCLIDEAN # Euclidean distance, good for spatial data
)
from vectorstore import (
FilterTypedDict, # Base filter type
AndFilter, # $and logical operator
OrFilter, # $or logical operator
SimpleFilter, # Direct field matching
ExactMatchFilter, # Exact field value matching
EqFilter, # $eq comparison
NeFilter, # $ne comparison
GtFilter, # $gt comparison
GteFilter, # $gte comparison
LtFilter, # $lt comparison
LteFilter, # $lte comparison
InFilter, # $in collection operator
NinFilter # $nin collection operator
)
# Complex filter example
complex_filter: FilterTypedDict = {
"$and": [
{
"$or": [
{"category": "article"},
{"category": "blog"}
]
},
{"year": {"$gte": 2020}},
{"author": {"$exists": True}}
]
}
results = index.query(
vector=[0.1, 0.2, 0.3, ...],
top_k=10,
filter=complex_filter
)
VectorDB
: Main entry point for creating and managing vector indexesIndexInterface
: Interface for interacting with a specific indexVector
: Class representing a vector with ID, values, and metadataIndexModel
: Configuration for an indexMetric
: Similarity metrics (COSINE, DOTPRODUCT, EUCLIDEAN)DeletionProtection
: Protection against accidental deletion (ENABLED, DISABLED)Connection Management:
Vector Indexing:
Namespaces:
Batch Operations:
Metrics Selection:
Deletion Protection:
VectorStore supports powerful metadata filtering capabilities that let you narrow down vector searches based on their associated metadata.
Simple Equality Filter
# Find vectors where category is exactly "article"
filter = {"category": "article"}
Comparison Operators
# Equal to
filter = {"year": {"$eq": 2023}}
# Not equal to
filter = {"year": {"$ne": 2023}}
# Greater than
filter = {"year": {"$gt": 2020}}
# Greater than or equal to
filter = {"year": {"$gte": 2020}}
# Less than
filter = {"year": {"$lt": 2023}}
# Less than or equal to
filter = {"year": {"$lte": 2023}}
Collection Operators
# Value is in a specified array
filter = {"category": {"$in": ["article", "blog", "news"]}}
# Value is not in a specified array
filter = {"category": {"$nin": ["video", "podcast"]}}
Existence Checks
# Field exists
filter = {"author": {"$exists": True}}
# Field does not exist
filter = {"author": {"$exists": False}}
Logical Operators
# AND - all conditions must match
filter = {
"$and": [
{"category": "article"},
{"year": {"$gte": 2020}}
]
}
# OR - at least one condition must match
filter = {
"$or": [
{"category": "article"},
{"category": "blog"}
]
}
Combined Complex Filters
# Articles or blogs from 2020 or later that have an author field
filter = {
"$and": [
{
"$or": [
{"category": "article"},
{"category": "blog"}
]
},
{"year": {"$gte": 2020}},
{"author": {"$exists": True}}
]
}
Metadata filters are translated into SQL expressions that filter results based on the JSON metadata stored with each vector. The filters are applied before distance calculation for SQL-level filtering, improving query efficiency.
Filters can be used in multiple operations:
In queries:
results = index.query(
vector=[0.1, 0.2, 0.3, ...],
top_k=10,
filter={"$and": [{"category": "article"}, {"year": {"$gte": 2020}}]}
)
For deletion operations:
# Remove outdated vectors
index.delete(
filter={"status": "outdated"}
)
For statistical analysis:
# Get statistics for a specific category
stats = index.describe_index_stats(
filter={"category": "article"}
)
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Future development plans include:
FAQs
Vector store interface for SingleStore Database
We found that singlestore-vectorstore demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket researchers uncover how browser extensions in trusted stores are used to hijack sessions, redirect traffic, and manipulate user behavior.
Research
Security News
An in-depth analysis of credential stealers, crypto drainers, cryptojackers, and clipboard hijackers abusing open source package registries to compromise Web3 development environments.
Security News
pnpm 10.12.1 introduces a global virtual store for faster installs and new options for managing dependencies with version catalogs.