Research
Security News
Malicious npm Packages Inject SSH Backdoors via Typosquatted Libraries
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
github.com/zjulearning/efanna_graph
EFANNA is a flexible and efficient library for approximate nearest neighbor search (ANN search) on large scale data. It implements the algorithms of our paper EFANNA : Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph.
This project (efanna_graph) contains only the approximate nearest neighbor graph construction part in our EFANNA paper.
The reasons are as follows:
Case 1: If the dataset is very large (more than 4M points), you may want to use faiss to build a graph for initilization. Then you need to downlad the faiss package and compile with it.
cd efanna_graph/extern_libraries/
git clone https://github.com/facebookresearch/faiss.git
cd faiss/
Please see the documentation of faiss for how to compile the faiss.
Case 2: If you don't need the faiss for initilization, you can compile efanna_graph without the faiss support.
cd efanna_graph/
rm include/efanna2e/index_pq.h src/index_pq.cpp
comment the last two lines in tests/CMakeLists.txt
Then go to the root directory of efanna_graph and make.
cd efanna_graph/
cmake .
make
kNN graph building with nndescent:
cd tests/
./test_nndescent data_file save_graph K L iter S R
Meaning of the parameters:
**data_file** is the path of the origin data.
**save_graph** is the path of the kNN graph to be saved.
**K** is the 'K' of kNN graph.
**L** is the parameter controlling the graph quality, larger is more accurate but slower, no smaller than K.
**iter** is the parameter controlling the iteration times, iter usually < 30.
**S** is the parameter contollling the graph quality, larger is more accurate but slower.
**R** is the parameter controlling the graph quality, larger is more accurate but slower.
kNN graph initilization with KD-tree and refine by nndescent (efanna):
cd tests/
./test_kdtree_graph data_file nTrees mLevel K saving_graph
./test_nndescent_refine data_file init_graph save_graph K L iter S R
Meaning of the parameters:
**nTrees** is the number of trees used to build the graph (larger is more accurate but slower)
**mLevel** conquer-to-depeth (smaller is more accurate but slower)
**K** is the 'K' of kNN graph.
**saveing_graph** is the path of the kNN graph to be saved.
**init_graph** is the graph built by test_kdtree_graph, same with **saveing\_graph** here.
kNN graph initilization with faiss and refine by nndescent:
cd tests/
./test_faiss_graph data_file IndexKey SearchKey K saving_graph
./test_nndescent_refine data_file init_graph save_graph K L iter S R
Meaning of the parameters(from left to right):
**IndexKey** is the parameter in faiss to build the index.
**SearchKey** is the parameter in faiss to search with the index.
Please see the documentation of faiss for how to set these two parameters.
Suppose the database has N points, and numbered from 0 to N-1. You want to build an approximate kNN graph. The graph can be regarded as a N * k Matrix. The saved kNN graph binary file saves the matrix by row. The first 4 bytes of each row saves the int value of k, next 4 bytes saves the value of M and next 4 bytes saves the float value of the norm of the point. Then it follows k*4 bytes, saving the indices of the k nearest neighbors of respective point. The N rows are saved continuously without seperating characters.
Because there is no unified format for input data, users may need to write input function to read your own data. You may imitate the input function in our sample code (sample/efanna_efanna_index_buildgraph.cc) to load the data into our matrix. To use SIMD instruction optimization, you should pay attention to the data alignment problem of SSE / AVX instruction.
./test_nndescent gist_base.fvecs gist.100NN.graph 100 120 10 15 100
With the above command, one can build a 100NN graph on GIST1M with around 96% accuracy in 960 seconds on a i7-4790K cpu with 8 threads.
./test_nndescent sift_base.fvecs sift.50NN.graph 50 70 8 10 100
With the above command, one can build a 50NN graph on SIFT1M with around 90% accuracy in 72 seconds on a i7-4790K cpu with 8 threads.
./test_nndescent sift_base.fvecs sift.50NN.graph 50 70 10 10 50
With the above command, one can build a 50NN graph on SIFT1M with around 97% accuracy in 106 seconds on a i7-4790K cpu with 8 threads.
The implemnetation of nndescent is taken from Kgraph. They proposed the nndescent algorithm. Many thanks to them for inspiration.
FAQs
Unknown package
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.