Research
Security News
Malicious npm Packages Inject SSH Backdoors via Typosquatted Libraries
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
github.com/smhanov/syzgydb
SyzgyDB is a high-performance, embeddable vector database designed for applications requiring efficient handling of large datasets. Written in Go, it leverages disk-based storage to minimize memory usage, making it ideal for systems with limited resources. SyzgyDB supports a range of distance metrics, including Euclidean and Cosine, and offers multiple quantization levels to optimize storage and search performance.
With built-in integration for the Ollama server, SyzgyDB can automatically generate vector embeddings from text and images, simplifying the process of adding and querying data. This makes it well-suited for use cases such as image and video retrieval, recommendation systems, natural language processing, anomaly detection, and bioinformatics. With its RESTful API, SyzgyDB provides easy integration and management of collections and records, enabling developers to perform fast and flexible vector similarity searches.
docker run -p 8080:8080 -v /path/to/your/data:/data smhanov/syzgydb
This command will:
smhanov/syzgydb
image from Docker Hub./data
directory inside the container to /path/to/your/data
on your host system, ensuring that your data is persisted outside the container.The configuration settings can be specified on the command line, using an environment variable, or in a file /etc/syzgydb.conf.
Configuration Setting | Description | Default Value |
---|---|---|
DATA_FOLDER | Specifies where the persistent files are kept. | ./data (command line) or /data (Docker) |
OLLAMA_SERVER | The optional Ollama server used to create embeddings. | localhost:11434 |
TEXT_MODEL | The name of the text embedding model to use with Ollama. | all-minilm (384 dimensions) |
IMAGE_MODEL | The name of the image embedding model to use with Ollama. | minicpm-v |
SyzgyDB provides a RESTful API for managing collections and records. Below are the available endpoints and example curl
requests.
A collection is a database, and you can create them and get information about them.
Endpoint: POST /api/v1/collections
Description: Creates a new collection with specified parameters.
Request Body (JSON):
{
"name": "collection_name",
"vector_size": 128,
"quantization": 64,
"distance_function": "cosine"
}
Example curl
:
curl -X POST http://localhost:8080/api/v1/collections -H "Content-Type: application/json" -d '{"name":"collection_name","vector_size":128,"quantization":64,"distance_function":"cosine"}'
Endpoint: DELETE /api/v1/collections/{collection_name}
Description: Deletes the specified collection.
Example curl
:
curl -X DELETE http://localhost:8080/api/v1/collections/collection_name
Endpoint: GET /api/v1/collections/{collection_name}
Description: Retrieves information about a collection.
Example curl
:
curl -X GET http://localhost:8080/api/v1/collections/collection_name
Endpoint: POST /api/v1/collections/{collection_name}/records
Description: Inserts multiple records into a collection. Overwrites if the ID exists. You can provide either a vector
or a text
field for each record. If a text
field is provided, the server will automatically generate the vector embedding using the Ollama server. If an image field is provided, it should be in base64 format.
Request Body (JSON):
[
{
"id": 1234567890,
"text": "example text", // Optional: Provide text to generate vector
"vector": [0.1, 0.2, ..., 0.5], // Optional: Directly provide a vector
"metadata": {
"key1": "value1",
"key2": "value2"
}
},
{
"id": 1234567891,
"text": "another example text",
"metadata": {
"key1": "value3"
}
}
]
Example curl
:
curl -X POST http://localhost:8080/api/v1/collections/collection_name/records -H "Content-Type: application/json" -d '[{"id":1234567890,"vector":[0.1,0.2,0.3,0.4,0.5],"metadata":{"key1":"value1","key2":"value2"}},{"id":1234567891,"text":"example text","metadata":{"key1":"value1","key2":"value2"}}]'
Endpoint: PUT /api/v1/collections/{collection_name}/records/{id}/metadata
Description: Updates metadata for a record.
Request Body (JSON):
{
"metadata": {
"key1": "new_value1",
"key3": "value3"
}
}
Example curl
:
curl -X PUT http://localhost:8080/api/v1/collections/collection_name/records/1234567890/metadata -H "Content-Type: application/json" -d '{"metadata":{"key1":"new_value1","key3":"value3"}}'
Endpoint: DELETE /api/v1/collections/{collection_name}/records/{id}
Description: Deletes a record.
Example curl
:
curl -X DELETE http://localhost:8080/api/v1/collections/collection_name/records/1234567890
Endpoint: GET /api/v1/collections/{collection_name}/ids
Description: Retrieves a JSON array of all document IDs in the specified collection.
Example curl
:
curl -X GET http://localhost:8080/api/v1/collections/collection_name/ids
Endpoint: POST /api/v1/collections/{collection_name}/search
Description: Searches for records based on the provided criteria. If no search parameters are provided, it lists all records in the collection, allowing pagination with limit
and offset
.
Request Body (JSON):
{
"vector": [0.1, 0.2, 0.3, ..., 0.5], // Optional: Provide a vector for similarity search
"text": "example text", // Optional: Provide text to generate vector for search
"k": 5, // Optional: Number of nearest neighbors to return
"radius": 0, // Optional: Radius for range search
"limit": 0, // Optional: Maximum number of records to return
"offset": 0, // Optional: Number of records to skip for pagination
"precision": "", // Optional: Set to "exact" for exhaustive search
"filter": "age >= 18 AND status == 'active'" // Optional: Query filter expression
}
Parameters Explanation:
vector
: A numerical array representing the query vector. Used for similarity searches. If provided, the search will be based on this vector.text
: A string input that will be converted into a vector using the Ollama server. This is an alternative to providing a vector
directly.k
: Specifies the number of nearest neighbors to return. Used when performing a k-nearest neighbor search.radius
: Defines the radius for a range search. All records within this distance from the query vector will be returned.limit
: Limits the number of records returned in the response. Useful for paginating results.offset
: Skips the specified number of records before starting to return results. Used in conjunction with limit
for pagination.precision
: Specifies the search precision. Defaults to "medium". Set to "exact" to perform an exhaustive search of all points.filter
: A string containing a query filter expression. This allows for additional filtering of results based on metadata fields. See the Query Filter Language section for more details.Example curl
:
curl -X POST http://localhost:8080/api/v1/collections/collection_name/search -H "Content-Type: application/json" -d '{"vector":[0.1,0.2,0.3,0.4,0.5],"k":5,"limit":10,"offset":0,"filter":"age >= 18 AND status == \"active\""}'
Usage Scenarios:
limit
and offset
to paginate.text
parameter to perform a search based on the text's vector representation.vector
parameter for direct vector similarity searches.radius
to perform a range query, returning all records within the specified distance.k
parameter to find the top k
nearest records to the query vector.filter
parameter to apply additional constraints based on metadata fields.You don't need to use the docker or REST api. You can build it right in to your go project. Here's how.
import "github.com/smhanov/syzgydb"
To create a new collection, define the collection options and initialize the collection:
options := syzgydb.CollectionOptions{
Name: "example.dat",
DistanceMethod: syzgydb.Euclidean, // or Cosine
DimensionCount: 128, // Number of dimensions for each vector
Quantization: 64, // Quantization level (4, 8, 16, 32, 64)
}
collection := syzgydb.NewCollection(options)
Add documents to the collection by specifying an ID, vector, and optional metadata:
vector := []float64{0.1, 0.2, 0.3, ..., 0.128} // Example vector
metadata := []byte("example metadata")
collection.AddDocument(1, vector, metadata)
Perform a search to find similar vectors using either nearest neighbor or radius-based search:
searchVector := []float64{0.1, 0.2, 0.3, ..., 0.128} // Example search vector
// Nearest neighbor search
args := syzgydb.SearchArgs{
Vector: searchVector,
K: 5, // Return top 5 results
}
results := collection.Search(args)
// Radius-based search
args = syzgydb.SearchArgs{
Vector: searchVector,
Radius: 0.5, // Search within a radius of 0.5
}
results = collection.Search(args)
You can apply a filter function during the search to include only documents that meet certain criteria. There are two ways to create a filter function:
filterFn := func(id uint64, metadata []byte) bool {
return id%2 == 0 // Include only documents with even IDs
}
args := syzgydb.SearchArgs{
Vector: searchVector,
K: 5, // Return top 5 results
Filter: filterFn,
}
results := collection.Search(args)
BuildFilter
method with a query string:queryString := `age >= 18 AND status == \"active\"`
filterFn, err := syzgydb.BuildFilter(queryString)
if err != nil {
log.Fatalf("Error building filter: %v", err)
}
args := syzgydb.SearchArgs{
Vector: searchVector,
K: 5, // Return top 5 results
Filter: filterFn,
}
results := collection.Search(args)
The BuildFilter
method allows you to create a filter function from a query string using the Query Filter Language described in this document. This provides a flexible way to filter search results based on metadata fields without writing custom Go code for each filter.
Update the metadata of an existing document or remove a document from the collection:
// Update document metadata
err := collection.UpdateDocument(1, []byte("updated metadata"))
// Remove a document
err = collection.RemoveDocument(1)
To dump the collection for inspection or backup, use the DumpIndex
function:
syzgydb.DumpIndex("example.dat")
A Python client for SyzgyDB is available, making it easy to integrate SyzgyDB with your Python projects.
You can install the Python client using pip:
pip install syzgy
The Python client package is available on PyPI at https://pypi.org/project/syzgy/0.1.0/
For usage instructions and more details, please refer to the Python client documentation.
SyzgyDB supports a powerful query filter language that allows you to filter search results based on metadata fields. This language can be used in the filter
parameter of the search API.
Field Comparison: field_name operator value
age >= 18
Logical Operations: Combine conditions using AND
, OR
, NOT
(age >= 18 AND status == "active") OR role == "admin"
Parentheses: Use to group conditions and control evaluation order
(status == "active" AND age >= 18) OR role == "admin"
==
, !=
, >
, <
, >=
, <=
CONTAINS
, STARTS_WITH
, ENDS_WITH
, MATCHES
(regex)EXISTS
, DOES NOT EXIST
IN
, NOT IN
field.length
: Returns the length of a string or arrayBasic Comparison:
age >= 18 AND status == "active"
String Operations:
name STARTS_WITH "John" AND email ENDS_WITH "@example.com"
Array Operations:
status IN ["important", "urgent"]
Nested Fields:
user.profile.verified == true AND user.friends.length > 5
Existence Checks:
phone_number EXISTS AND emergency_contact DOES NOT EXIST
Combining Existence with Other Conditions:
(status == "active" OR status == "pending") AND profile_picture EXISTS
Complex Query:
(status == "active" AND age >= 18) OR (role == "admin" AND NOT (department == "IT")) AND last_login EXISTS
Contributions are welcome! Please feel free to submit a pull request or open an issue to discuss improvements or report bugs.
This project is licensed under the MIT License. See the LICENSE file for details.
FAQs
Unknown package
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.