Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
pgvector support for Python
Supports Django, SQLAlchemy, SQLModel, Psycopg 3, Psycopg 2, asyncpg, and Peewee
Run:
pip install pgvector
And follow the instructions for your database library:
Or check out some examples:
COPY
Create a migration to enable the extension
from pgvector.django import VectorExtension
class Migration(migrations.Migration):
operations = [
VectorExtension()
]
Add a vector field to your model
from pgvector.django import VectorField
class Item(models.Model):
embedding = VectorField(dimensions=3)
Also supports HalfVectorField
, BitField
, and SparseVectorField
Insert a vector
item = Item(embedding=[1, 2, 3])
item.save()
Get the nearest neighbors to a vector
from pgvector.django import L2Distance
Item.objects.order_by(L2Distance('embedding', [3, 1, 2]))[:5]
Also supports MaxInnerProduct
, CosineDistance
, L1Distance
, HammingDistance
, and JaccardDistance
Get the distance
Item.objects.annotate(distance=L2Distance('embedding', [3, 1, 2]))
Get items within a certain distance
Item.objects.alias(distance=L2Distance('embedding', [3, 1, 2])).filter(distance__lt=5)
Average vectors
from django.db.models import Avg
Item.objects.aggregate(Avg('embedding'))
Also supports Sum
Add an approximate index
from pgvector.django import HnswIndex, IvfflatIndex
class Item(models.Model):
class Meta:
indexes = [
HnswIndex(
name='my_index',
fields=['embedding'],
m=16,
ef_construction=64,
opclasses=['vector_l2_ops']
),
# or
IvfflatIndex(
name='my_index',
fields=['embedding'],
lists=100,
opclasses=['vector_l2_ops']
)
]
Use vector_ip_ops
for inner product and vector_cosine_ops
for cosine distance
Enable the extension
session.execute(text('CREATE EXTENSION IF NOT EXISTS vector'))
Add a vector column
from pgvector.sqlalchemy import Vector
class Item(Base):
embedding = mapped_column(Vector(3))
Also supports HALFVEC
, BIT
, and SPARSEVEC
Insert a vector
item = Item(embedding=[1, 2, 3])
session.add(item)
session.commit()
Get the nearest neighbors to a vector
session.scalars(select(Item).order_by(Item.embedding.l2_distance([3, 1, 2])).limit(5))
Also supports max_inner_product
, cosine_distance
, l1_distance
, hamming_distance
, and jaccard_distance
Get the distance
session.scalars(select(Item.embedding.l2_distance([3, 1, 2])))
Get items within a certain distance
session.scalars(select(Item).filter(Item.embedding.l2_distance([3, 1, 2]) < 5))
Average vectors
from pgvector.sqlalchemy import avg
session.scalars(select(avg(Item.embedding))).first()
Also supports sum
Add an approximate index
index = Index(
'my_index',
Item.embedding,
postgresql_using='hnsw',
postgresql_with={'m': 16, 'ef_construction': 64},
postgresql_ops={'embedding': 'vector_l2_ops'}
)
# or
index = Index(
'my_index',
Item.embedding,
postgresql_using='ivfflat',
postgresql_with={'lists': 100},
postgresql_ops={'embedding': 'vector_l2_ops'}
)
index.create(engine)
Use vector_ip_ops
for inner product and vector_cosine_ops
for cosine distance
Enable the extension
session.exec(text('CREATE EXTENSION IF NOT EXISTS vector'))
Add a vector column
from pgvector.sqlalchemy import Vector
from sqlalchemy import Column
class Item(SQLModel, table=True):
embedding: Any = Field(sa_column=Column(Vector(3)))
Also supports HALFVEC
, BIT
, and SPARSEVEC
Insert a vector
item = Item(embedding=[1, 2, 3])
session.add(item)
session.commit()
Get the nearest neighbors to a vector
session.exec(select(Item).order_by(Item.embedding.l2_distance([3, 1, 2])).limit(5))
Also supports max_inner_product
, cosine_distance
, l1_distance
, hamming_distance
, and jaccard_distance
Get the distance
session.exec(select(Item.embedding.l2_distance([3, 1, 2])))
Get items within a certain distance
session.exec(select(Item).filter(Item.embedding.l2_distance([3, 1, 2]) < 5))
Average vectors
from pgvector.sqlalchemy import avg
session.exec(select(avg(Item.embedding))).first()
Also supports sum
Add an approximate index
from sqlalchemy import Index
index = Index(
'my_index',
Item.embedding,
postgresql_using='hnsw',
postgresql_with={'m': 16, 'ef_construction': 64},
postgresql_ops={'embedding': 'vector_l2_ops'}
)
# or
index = Index(
'my_index',
Item.embedding,
postgresql_using='ivfflat',
postgresql_with={'lists': 100},
postgresql_ops={'embedding': 'vector_l2_ops'}
)
index.create(engine)
Use vector_ip_ops
for inner product and vector_cosine_ops
for cosine distance
Enable the extension
conn.execute('CREATE EXTENSION IF NOT EXISTS vector')
Register the vector type with your connection
from pgvector.psycopg import register_vector
register_vector(conn)
For async connections, use
from pgvector.psycopg import register_vector_async
await register_vector_async(conn)
Create a table
conn.execute('CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))')
Insert a vector
embedding = np.array([1, 2, 3])
conn.execute('INSERT INTO items (embedding) VALUES (%s)', (embedding,))
Get the nearest neighbors to a vector
conn.execute('SELECT * FROM items ORDER BY embedding <-> %s LIMIT 5', (embedding,)).fetchall()
Add an approximate index
conn.execute('CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)')
# or
conn.execute('CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)')
Use vector_ip_ops
for inner product and vector_cosine_ops
for cosine distance
Enable the extension
cur = conn.cursor()
cur.execute('CREATE EXTENSION IF NOT EXISTS vector')
Register the vector type with your connection or cursor
from pgvector.psycopg2 import register_vector
register_vector(conn)
Create a table
cur.execute('CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))')
Insert a vector
embedding = np.array([1, 2, 3])
cur.execute('INSERT INTO items (embedding) VALUES (%s)', (embedding,))
Get the nearest neighbors to a vector
cur.execute('SELECT * FROM items ORDER BY embedding <-> %s LIMIT 5', (embedding,))
cur.fetchall()
Add an approximate index
cur.execute('CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)')
# or
cur.execute('CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)')
Use vector_ip_ops
for inner product and vector_cosine_ops
for cosine distance
Enable the extension
await conn.execute('CREATE EXTENSION IF NOT EXISTS vector')
Register the vector type with your connection
from pgvector.asyncpg import register_vector
await register_vector(conn)
or your pool
async def init(conn):
await register_vector(conn)
pool = await asyncpg.create_pool(..., init=init)
Create a table
await conn.execute('CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))')
Insert a vector
embedding = np.array([1, 2, 3])
await conn.execute('INSERT INTO items (embedding) VALUES ($1)', embedding)
Get the nearest neighbors to a vector
await conn.fetch('SELECT * FROM items ORDER BY embedding <-> $1 LIMIT 5', embedding)
Add an approximate index
await conn.execute('CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)')
# or
await conn.execute('CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)')
Use vector_ip_ops
for inner product and vector_cosine_ops
for cosine distance
Add a vector column
from pgvector.peewee import VectorField
class Item(BaseModel):
embedding = VectorField(dimensions=3)
Also supports HalfVectorField
, FixedBitField
, and SparseVectorField
Insert a vector
item = Item.create(embedding=[1, 2, 3])
Get the nearest neighbors to a vector
Item.select().order_by(Item.embedding.l2_distance([3, 1, 2])).limit(5)
Also supports max_inner_product
, cosine_distance
, l1_distance
, hamming_distance
, and jaccard_distance
Get the distance
Item.select(Item.embedding.l2_distance([3, 1, 2]).alias('distance'))
Get items within a certain distance
Item.select().where(Item.embedding.l2_distance([3, 1, 2]) < 5)
Average vectors
from peewee import fn
Item.select(fn.avg(Item.embedding).coerce(True)).scalar()
Also supports sum
Add an approximate index
Item.add_index('embedding vector_l2_ops', using='hnsw')
Use vector_ip_ops
for inner product and vector_cosine_ops
for cosine distance
View the changelog
Everyone is encouraged to help improve this project. Here are a few ways you can help:
To get started with development:
git clone https://github.com/pgvector/pgvector-python.git
cd pgvector-python
pip install -r requirements.txt
createdb pgvector_python_test
pytest
To run an example:
cd examples/loading
pip install -r requirements.txt
createdb pgvector_example
python3 example.py
FAQs
pgvector support for Python
We found that pgvector demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.