Research
Security News
Malicious npm Packages Inject SSH Backdoors via Typosquatted Libraries
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Local lightweight AI Native database for RAG, incluing embedding vectors and text search for LLM generation
Easily Use - No boring database schema definition. No need to pay attention to vector indexing details.
Realtime Search - Lock free realtime index keeps new data fresh with millisecond level latency. No wait no manual operation.
Stability - AwaDB builds upon over 4 years experience at JD.com running production workloads at scale using a system called Vearch, combined with best-of-breed ideas and practices from the community.
First install awadb:
pip3 install awadb'
Then use as below:
import awadb
# 1. Initialize awadb client!
awadb_client = awadb.Client()
# 2. Create table
awadb_client.Create("test_llm1")
# 3. Add sentences, the sentence is embedded with SentenceTransformer by default
# You can also embed the sentences all by yourself with OpenAI or other LLMs
awadb_client.Add([{'embedding_text':'The man is happy'}, {'source' : 'pic1'}])
awadb_client.Add([{'embedding_text':'The man is very happy'}, {'source' : 'pic2'}])
awadb_client.Add([{'embedding_text':'The cat is happy'}, {'source' : 'pic3'}])
awadb_client.Add([{'embedding_text':'The man is eating'}, {'source':'pic4'}])
# 4. Search the most Top3 sentences by the specified query
query = "The man is happy"
results = awadb_client.Search(query, 3)
# Output the results
print(results)
Here the text is embedded by SentenceTransformer which is supported by Hugging Face
More detailed python local library usage you can read here
If you are on the Windows platform or want a awadb service, you can download and deploy the awadb docker. The installation of awadb docker please see here
First, Install gRPC and awadb service python client as below:
pip3 install grpcio
pip3 install awadb-client
A simple example as below:
# Import the package and module
from awadb_client import Awa
# Initialize awadb client
client = Awa()
# Add dict with vector to table 'example1'
client.add("example1", {'name':'david', 'feature':[1.3, 2.5, 1.9]})
client.add("example1", {'name':'jim', 'feature':[1.1, 1.4, 2.3]})
# Search
results = client.search("example1", [1.0, 2.0, 3.0])
# Output results
print(results)
# '_id' is the primary key of each document
# It can be specified clearly when adding documents
# Here no field '_id' is specified, it is generated by the awadb server
db_name: "default"
table_name: "example1"
results {
total: 2
msg: "Success"
result_items {
score: 0.860000074
fields {
name: "_id"
value: "64ddb69d-6038-4311-9118-605686d758d9"
}
fields {
name: "name"
value: "jim"
}
}
result_items {
score: 1.55
fields {
name: "_id"
value: "f9f3035b-faaf-48d4-a947-801416c005b3"
}
fields {
name: "name"
value: "david"
}
}
}
result_code: SUCCESS
More python sdk for service is here
More detailed quick start examples you can find here
# add documents to table 'test' of db 'default', no need to create table first
curl -H "Content-Type: application/json" -X POST -d '{"db":"default", "table":"test", "docs":[{"_id":1, "name":"lj", "age":23 "f":[1,0]},{"_id":2, "name":"david", "age":32, "f":[1,2]}]}' http://localhost:8080/add
# search documents by the vector field 'f' of the value '[1, 1]'
curl -H "Content-Type: application/json" -X POST -d '{"db":"default", "table":"test", "vector_query":{"f":[1, 1]}}' http://localhost:8080/search
More detailed RESTful API is here
Any unstructured data(image/text/audio/video) can be transferred to vectors which are generally understanded by computers through AI(LLMs or other deep neural networks).
For example, "The man is happy"-this sentence can be transferred to a 384-dimension vector(a list of numbers [0.23, 1.98, ....]
) by SentenceTransformer language model. This process is called embedding.
More detailed information about embeddings can be read from OpenAI
Awadb uses Sentence Transformers to embed the sentence by default, while you can also use OpenAI or other LLMs to do the embeddings according to your needs.
Examples of combining LLaMa or quantized Alpaca with llama.cpp to do local knowledge database please see here
Examples of combining ChatGLM to do local knowledge database please see here
Join the AwaDB community to share any problem, suggestion, or discussion with us:
FAQs
Local lightweight AI Native database for RAG, incluing embedding vectors and text search for LLM generation
We found that awadb demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.