
Research
/Security News
Contagious Interview Campaign Escalates With 67 Malicious npm Packages and New Malware Loader
North Korean threat actors deploy 67 malicious npm packages using the newly discovered XORIndex malware loader.
x2vec, Towhee is all you need!
Towhee makes it easy to build neural data processing pipelines for AI applications. We provide hundreds of models, algorithms, and transformations that can be used as standard pipeline building blocks. You can use Towhee's Pythonic API to build a prototype of your pipeline and automatically optimize it for production-ready environments.
:art: Various Modalities: Towhee supports data processing on a variety of modalities, including images, videos, text, audio, molecular structures, etc.
:mortar_board: SOTA Models: Towhee provides SOTA models across 5 fields (CV, NLP, Multimodal, Audio, Medical), 15 tasks, and 140+ model architectures. These include BERT, CLIP, ViT, SwinTransformer, MAE, and data2vec, all pretrained and ready to use.
:package: Data Processing: Towhee also provides traditional methods alongside neural network models to help you build practical data processing pipelines. We have a rich pool of operators available, such as video decoding, audio slicing, frame sampling, feature vector dimension reduction, ensembling, and database operations.
:snake: Pythonic API: Towhee includes a Pythonic method-chaining API for describing custom data processing pipelines. We also support schemas, which makes processing unstructured data as easy as handling tabular data.
v1.0.0rc1 May. 4, 2023
v0.9.0 Dec. 2, 2022
v0.8.1 Sep. 30, 2022
v0.8.0 Aug. 16, 2022
v0.7.3 Jul. 27, 2022
v0.7.1 Jul. 1, 2022
v0.7.0 Jun. 24, 2022
v0.6.1 May. 13, 2022
Towhee requires Python 3.6+. You can install Towhee via pip
:
pip install towhee towhee.models
If you run into any pip-related install problems, please try to upgrade pip with pip install -U pip
.
Let's try your first Towhee pipeline. Below is an example for how to create a CLIP-based cross modal retrieval pipeline.
The example needs towhee 1.0.0, which can be installed with pip install towhee==1.0.0
, The latest usage documentation.
from glob import glob
from towhee import ops, pipe, DataCollection
# create image embeddings and build index
p = (
pipe.input('file_name')
.map('file_name', 'img', ops.image_decode.cv2())
.map('img', 'vec', ops.image_text_embedding.clip(model_name='clip_vit_base_patch32', modality='image'))
.map('vec', 'vec', ops.towhee.np_normalize())
.map(('vec', 'file_name'), (), ops.ann_insert.faiss_index('./faiss', 512))
.output()
)
for f_name in ['https://raw.githubusercontent.com/towhee-io/towhee/main/assets/dog1.png',
'https://raw.githubusercontent.com/towhee-io/towhee/main/assets/dog2.png',
'https://raw.githubusercontent.com/towhee-io/towhee/main/assets/dog3.png']:
p(f_name)
# Delete the pipeline object, make sure the faiss data is written to disk.
del p
# search image by text
decode = ops.image_decode.cv2('rgb')
p = (
pipe.input('text')
.map('text', 'vec', ops.image_text_embedding.clip(model_name='clip_vit_base_patch32', modality='text'))
.map('vec', 'vec', ops.towhee.np_normalize())
# faiss op result format: [[id, score, [file_name], ...]
.map('vec', 'row', ops.ann_search.faiss_index('./faiss', 3))
.map('row', 'images', lambda x: [decode(item[2][0]) for item in x])
.output('text', 'images')
)
DataCollection(p('a cat')).show()
Learn more examples from the Towhee Examples.
Towhee is composed of four main building blocks - Operators
, Pipelines
, DataCollection API
and Engine
.
Operators: An operator is a single building block of a neural data processing pipeline. Different implementations of operators are categorized by tasks, with each task having a standard interface. An operator can be a deep learning model, a data processing method, or a Python function.
Pipelines: A pipeline is composed of several operators interconnected in the form of a DAG (directed acyclic graph). This DAG can direct complex functionalities, such as embedding feature extraction, data tagging, and cross modal data analysis.
DataCollection API: A Pythonic and method-chaining style API for building custom pipelines. A pipeline defined by the DataColltion API can be run locally on a laptop for fast prototyping and then be converted to a docker image, with end-to-end optimizations, for production-ready environments.
Engine: The engine sits at Towhee's core. Given a pipeline, the engine will drive dataflow among individual operators, schedule tasks, and monitor compute resource usage (CPU/GPU/etc). We provide a basic engine within Towhee to run pipelines on a single-instance machine and a Triton-based engine for docker containers.
Writing code is not the only way to contribute! Submitting issues, answering questions, and improving documentation are just some of the many ways you can help our growing community. Check out our contributing page for more information.
Special thanks goes to these folks for contributing to Towhee, either on Github, our Towhee Hub, or elsewhere:
Looking for a database to store and index your embedding vectors? Check out Milvus.
FAQs
Towhee is a framework that helps you encode your unstructured data into embeddings.
We found that towhee demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
/Security News
North Korean threat actors deploy 67 malicious npm packages using the newly discovered XORIndex malware loader.
Security News
Meet Socket at Black Hat & DEF CON 2025 for 1:1s, insider security talks at Allegiant Stadium, and a private dinner with top minds in software supply chain security.
Security News
CAI is a new open source AI framework that automates penetration testing tasks like scanning and exploitation up to 3,600× faster than humans.