Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
GNES is Generic Neural Elastic Search, a cloud-native semantic search system based on deep neural network.
Highlights • Overview • Install • Getting Started • Hub • Documentation • Tutorial • Contributing • Release Notes • Blog
GNES [jee-nes] is Generic Neural Elastic Search, a cloud-native semantic search system based on deep neural network.
GNES enables large-scale index and semantic search for text-to-text, image-to-image, video-to-video and any-to-any content form.
💭 To know more about the key tenets of GNES, read this blog post
☁️Cloud-Native & Elastic | 🐣Easy-to-Use | 🔬State-of-the-Art |
---|---|---|
GNES is all-in-microservice! Encoder, indexer, preprocessor and router are all running in their own containers. They communicate via versioned APIs and collaborate under the orchestration of Docker Swarm/Kubernetes etc. Scaling, load-balancing, automated recovering, they come off-the-shelf in GNES. | How long would it take to deploy a change that involves just switching a layer in VGG? In GNES, this is just one line change in a YAML file. We abstract the encoding and indexing logic to a YAML config, so that you can change or stack encoders and indexers without even touching the codebase. | Taking advantage of fast-evolving AI/ML/NLP/CV communities, we learn from best-of-breed deep learning models and plug them into GNES, making sure you always enjoy the state-of-the-art performance. |
🌌Generic & Universal | 📦Model as Plugin | 💯Best Practice |
Searching for texts, image or even short-videos? Using Python/C/Java/Go/HTTP as the client? Doesn't matter which content form you have or which language do you use, GNES can handle them all. | When built-in models do not meet your requirments, simply build your own with GNES Hub. Pack your model as a docker container and use it as a plugin. | We love to learn the best practice from the community, helping our GNES to achieve the next level of availability, resiliency, performance, and durability. If you have any ideas or suggestions, feel free to contribute. |
GNES Hub ship AI/ML models as Docker containers and use Docker containers as plugins. It offers a clean and sustainable way to port external algorithms (with the dependencies) into the GNES framework. GNES Hub is hosted on the Docker Hub. |
There are two ways to get GNES, either as a Docker image or as a PyPi package. For cloud users, we highly recommend using GNES via Docker.
docker run gnes/gnes:latest-alpine
This command downloads the latest GNES image (based on Alpine Linux) and runs it in a container. When the container runs, it prints an informational message and exits.
Besides the alpine
image optimized for the space, we also provide Buster (Debian 10.0), Ubuntu 18.04 and Ubuntu 16.04-based images. The table below summarizes all available GNES tags. One can fill in {ver}
with latest
, stable
or v0..xx
. latest
refers to the latest master of this repository, which may not be stable. We recommend you to use an official release by changing the latest
to a version number, say v0.0.24
, or simply using stable
for the last release, e.g. gnes:stable-ubuntu
⚠️ Since 2019/10/21, we have stopped hosting the public mirror Tencent Cloud. The old Docker images still exist, but there won't be new images available on Tencent Cloud anymore.
We also provide a public mirror Github packages. Select the mirror that serves you well.
docker login --username=xxx docker.pkg.github.com/gnes-ai/gnes # login to github package so that we can pull from it
docker run docker.pkg.github.com/gnes-ai/gnes/gnes:latest-alpine
The table below shows the status of the build pipeline.
Registry | Build status |
---|---|
Docker Hubgnes/gnes:[tag] | |
Github Packagedocker.pkg.github.com/gnes-ai/gnes/gnes:[tag] |
pip
You can also install GNES as a Python3 package via:
pip install gnes
Note that this will only install a "barebone" version of GNES, consists of the minimal dependencies for running GNES. No third-party pretrained models, deep learning/NLP/CV packages will be installed. We make this setup as the default installation behavior, as a model interested to NLP engineers may not be interested to CV engineers. In GNES, models serve as Docker plugins.
🚸 Tensorflow, Pytorch and torchvision are not part of GNES installation. Depending on your model, you may have to install them in advance.
Though not recommended, you can install GNES with full dependencies via:
pip install gnes[all]
pip install gnes[bert] | bert-serving-server>=1.8.6, bert-serving-client>=1.8.6 |
pip install gnes[flair] | flair>=0.4.1 |
pip install gnes[annoy] | annoy==1.15.2 |
pip install gnes[chinese] | jieba |
pip install gnes[vision] | opencv-python>=4.0.0, imagehash>=4.0 |
pip install gnes[leveldb] | plyvel>=1.0.5 |
pip install gnes[test] | pylint, memory_profiler>=0.55.0, psutil>=5.6.1, gputil>=1.4.0 |
pip install gnes[transformers] | pytorch-transformers |
pip install gnes[onnx] | onnxruntime |
pip install gnes[audio] | librosa>=0.7.0 |
pip install gnes[scipy] | scipy |
pip install gnes[nlp] | bert-serving-server>=1.8.6, pytorch-transformers, flair>=0.4.1, bert-serving-client>=1.8.6 |
pip install gnes[cn_nlp] | pytorch-transformers, bert-serving-client>=1.8.6, bert-serving-server>=1.8.6, jieba, flair>=0.4.1 |
pip install gnes[all] | pylint, psutil>=5.6.1, pytorch-transformers, annoy==1.15.2, bert-serving-client>=1.8.6, gputil>=1.4.0, bert-serving-server>=1.8.6, imagehash>=4.0, onnxruntime, memory_profiler>=0.55.0, jieba, flair>=0.4.1, librosa>=0.7.0, scipy, plyvel>=1.0.5, opencv-python>=4.0.0 |
A good way to cherry-pick dependencies is following the example in GNES Hub and building you own GNES image.
Either way, if you end up reading the following message after $ gnes
or $ docker run gnes/gnes
, then you are ready to go!
Before we start, let me first introduce two important concepts in GNES: microservice and workflow.
For machine learning engineers and data scientists who are not familiar with the concept of cloud-native and microservice, one can picture a microservice as an app on your smartphone. Each app runs independently, and an app may cooperate with other apps to accomplish a task. In GNES, we have four fundamental apps, aka. microservices, they are:
In GNES, we have implemented dozens of preprocessor, encoder, indexer to process different content forms, such as image, text, video. It is also super easy to plug in your own implementation, which we shall see an example in the sequel.
Now that we have a bunch of apps, what are we expecting them to do? A typical search system has two fundamental tasks: index and query. Index is storing the documents, query is searching the documents. In a neural search system, one may face another task: train, where one fine-tunes an encoder/preprocessor according to the data distribution in order to achieve better search relevance.
These three tasks correspond to three different workflows in GNES.
📣 Since
v0.0.46
GNES Flow has become the main interface of GNES. GNES Flow provides a pythonic and intuitive way to implement a workflow, enabling users to run or debug GNES on a local machine. By default, GNES Flow orchestrates all microservices using multi-thread or multi-process backend, it can be also exported to a Docker Swarm/Kubernetes YAML config, allowing one to deliver GNES to the cloud.
🔰 The complete example and the corresponding Jupyter Notebook can be found at here.
In this example, we will use the new gnes.flow
API (gnes >= 0.0.46
is required) to build a toy image search system for indexing and retrieving flowers based on their similarities.
Let's first define the indexing workflow by:
from gnes.flow import Flow
flow = (Flow(check_version=False)
.add_preprocessor(name='prep', yaml_path='yaml/prep.yml')
.add_encoder(yaml_path='yaml/incep.yml')
.add_indexer(name='vec_idx', yaml_path='yaml/vec.yml')
.add_indexer(name='doc_idx', yaml_path='yaml/doc.yml', recv_from='prep')
.add_router(name='sync', yaml_path='BaseReduceRouter', num_part=2, recv_from=['vec_idx', 'doc_idx']))
Here, we use the inceptionV4 pretrained model as the encoder and the built-in indexers for storing vectors and documents. The flow should be quite self-explanatory, if not, you can always convert it to a SVG image and see its visualization:
flow.build(backend=None).to_url()
To index our flower data, we need an iterator that generates bytes
strings and feed those bytes
strings into the defined flow.
def read_flowers(sample_rate=1.0):
with tarfile.open('17flowers.tgz') as fp:
for m in fp.getmembers():
if m.name.endswith('.jpg') and random.random() <= sample_rate:
yield fp.extractfile(m).read()
We can now do indexing via the multi-process backend:
with flow(backend='process') as fl:
fl.index(bytes_gen=read_flowers(), batch_size=64)
It will take few minutes depending on your machine.
We simply sample 20 flower images as queries and search for their top-10 similar images:
num_q = 20
topk = 10
sample_rate = 0.05
# do the query
results = []
with flow.build(backend='process') as fl:
for q, r in fl.query(bytes_gen=read_flowers(sample_rate)):
q_img = q.search.query.raw_bytes
r_imgs = [k.doc.raw_bytes for k in r.search.topk_results]
r_scores = [k.score.value for k in r.search.topk_results]
results.append((q_img, r_imgs, r_scores))
if len(results) > num_q:
break
Here is the result, where queries are on the first row.
To increase the number of parallel components in the flow, simply add replicas
to each service:
flow = (Flow(check_version=False, ctrl_with_ipc=True)
.add_preprocessor(name='prep', yaml_path='yaml/prep.yml', replicas=5)
.add_encoder(yaml_path='yaml/incep.yml', replicas=6)
.add_indexer(name='vec_idx', yaml_path='yaml/vec.yml')
.add_indexer(name='doc_idx', yaml_path='yaml/doc.yml', recv_from='prep')
.add_router(name='sync', yaml_path='BaseReduceRouter', num_part=2, recv_from=['vec_idx', 'doc_idx']))
flow.build(backend=None).to_url()
One can convert a Flow
object to Docker Swarm/Kubernetes YAML compose file very easily via:
flow.build(backend=None).to_swarm_yaml()
version: '3.4'
services:
Frontend0:
image: gnes/gnes:latest-alpine
command: frontend --port_in 56086 --port_out 52674 --port_ctrl 49225 --check_version
False --ctrl_with_ipc True
prep:
image: gnes/gnes:latest-alpine
command: preprocess --port_in 52674 --port_out 65461 --host_in Frontend0 --socket_in
PULL_CONNECT --socket_out PUB_BIND --port_ctrl 49281 --check_version False --ctrl_with_ipc
True --yaml_path yaml/prep.yml
Encoder0:
image: gnes/gnes:latest-alpine
command: encode --port_in 65461 --port_out 50488 --host_in prep --socket_in SUB_CONNECT
--port_ctrl 62298 --check_version False --ctrl_with_ipc True --yaml_path yaml/incep.yml
vec_idx:
image: gnes/gnes:latest-alpine
command: index --port_in 50488 --port_out 57791 --host_in Encoder0 --host_out
sync --socket_in PULL_CONNECT --socket_out PUSH_CONNECT --port_ctrl 58367 --check_version
False --ctrl_with_ipc True --yaml_path yaml/vec.yml
doc_idx:
image: gnes/gnes:latest-alpine
command: index --port_in 65461 --port_out 57791 --host_in prep --host_out sync
--socket_in SUB_CONNECT --socket_out PUSH_CONNECT --port_ctrl 50333 --check_version
False --ctrl_with_ipc True --yaml_path yaml/doc.yml
sync:
image: gnes/gnes:latest-alpine
command: route --port_in 57791 --port_out 56086 --host_out Frontend0 --socket_out
PUSH_CONNECT --port_ctrl 51285 --check_version False --ctrl_with_ipc True --yaml_path
BaseReduceRouter --num_part 2
To deploy it, simply copy the generated YAML config to a file say my-gnes.yml
, and then do
docker stack deploy --compose-file my-gnes.yml gnes-531
In this example, we will build a semantic poem search engine using GNES. Unlike the previous flower search example, here we run each service as an isolated Docker container and then orchestrate them via Docker Swarm. It represents a common scenario in the cloud settings. You will learn how to use powerful and customized GNES images from GNES hub.
🔰 Please checkout this repository for details and follow the instructions to reproduce.
Let's make a short recap of what we have learned.
The official documentation of GNES is hosted on docs.gnes.ai. It is automatically built, updated and archived on every new release.
🚧 Tutorial is still under construction. Stay tuned! Meanwhile, we sincerely welcome you to contribute your own learning experience / case study with GNES!
bert-as-service
We have setup this repository to track the network latency over different GNES versions. As a part of CICD pipeline, this repo gets automatically updated when the GNES master is updated or a new GNES version is released.
❤️ The beginning is always the hardest. But fear not, even if you find a typo, a missing docstring or unit test, you can simply correct them by making a commit to GNES. Here are the steps:
fix-gnes-typo-1
fix(readme): improve the readability and move sections
fix(readme): improve the readability and move sections
Well done! Once a PR gets merged, here are the things happened next:
-latest
will be automatically updated in an hour. You may check the its building status at here-stable
will be updated accordindly.More details can be found in the contributor guidelines.
If you use GNES in an academic paper, you are more than welcome to make a citation. Here are the two ways of citing GNES:
\footnote{https://github.com/gnes-ai/gnes}
@misc{2019GNES,
title={GNES: Generic Neural Elastic Search},
author={Xiao, Han and Yan, Jianfeng and Wang, Feng and Fu, Jie and Liu, Kai},
howpublished={\url{https://github.com/gnes-ai}},
year={2019}
}
If you have downloaded a copy of the GNES binary or source code, please note that the GNES binary and source code are both licensed under the GNU Lesser General Public License version 3.
FAQs
GNES is Generic Neural Elastic Search, a cloud-native semantic search system based on deep neural network.
We found that gnes demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.