
Security News
AGENTS.md Gains Traction as an Open Format for AI Coding Agents
AGENTS.md is a fast-growing open format giving AI coding agents a shared, predictable way to understand project setup, style, and workflows.
An AI-driven framework for synthesizing adaptive taxonomies, enabling automated data categorization and classification within dynamic hierarchical structures.
taxonomy-synthesis
An AI-driven framework for synthesizing adaptive taxonomies, enabling automated data categorization and classification within dynamic hierarchical structures.
TLDR: copy this README and throw it into ChatGPT. It will figure things out for you. (will create a "GPT" soon)
Join our Discord Community for questions, discussions, and collaboration!
Check out our YouTube demo video to see Taxonomy Synthesis in action!
Imagine you have a big box of different animals, but youβre not sure how to group them. You know there are "Mammals" and "Reptiles," but you donβt know the smaller groups they belong to, like which mammals are more similar or which reptiles go together. This tool uses smart AI helpers to figure out those smaller groups for you, like finding out there are "Rodents" and "Primates" among the mammals, and "Lizards" and "Snakes" among the reptiles. It then helps you sort all the animals into the right groups automatically, keeping everything neatly organized!
In this quickstart, we'll walk you through the process of using taxonomy-synthesis
to create a simplified phylogenetic tree for a list of animals. We'll demonstrate how to initialize the package, set up an OpenAI client, manually create a taxonomy tree, generate subcategories automatically, and classify items using AI.
First, ensure you have the package installed. You can install taxonomy-synthesis directly using pip:
pip install taxonomy-synthesis
Before proceeding, make sure you have an OpenAI API key.
# Set up the OpenAI client
from openai import OpenAI
client = OpenAI(api_key='sk-...')
We'll start with a list of 10 animal species, each represented with an arbitrary schema containing fields like name
, fun fact
, lifespan
, and emoji
. The only required field is id
, which should be unique for each item.
# Prepare a list of items (animals) with various attributes
items = [
{"id": "π¦", "name": "Kangaroo", "fun_fact": "Can hop at high speeds", "lifespan_years": 23, "emoji": "π¦"},
{"id": "π¨", "name": "Koala", "fun_fact": "Sleeps up to 22 hours a day", "lifespan_years": 18, "emoji": "π¨"},
{"id": "π", "name": "Elephant", "fun_fact": "Largest land animal", "lifespan_years": 60, "emoji": "π"},
{"id": "π", "name": "Dog", "fun_fact": "Best friend of humans", "lifespan_years": 15, "emoji": "π"},
{"id": "π", "name": "Cow", "fun_fact": "Gives milk", "lifespan_years": 20, "emoji": "π"},
{"id": "π", "name": "Mouse", "fun_fact": "Can squeeze through tiny gaps", "lifespan_years": 2, "emoji": "π"},
{"id": "π", "name": "Crocodile", "fun_fact": "Lives in water and land", "lifespan_years": 70, "emoji": "π"},
{"id": "π", "name": "Snake", "fun_fact": "No legs", "lifespan_years": 9, "emoji": "π"},
{"id": "π’", "name": "Turtle", "fun_fact": "Can live over 100 years", "lifespan_years": 100, "emoji": "π’"},
{"id": "π¦", "name": "Gecko", "fun_fact": "Can climb walls", "lifespan_years": 5, "emoji": "π¦"}
]
Create the root node for our taxonomy tree and initialize two subclasses: Mammals
and Reptiles
.
from taxonomy_synthesis.models import Category, Item
from taxonomy_synthesis.tree.tree_node import TreeNode
# Create root node and two primary subclasses
root_category = Category(name="Animals", description="All animals")
mammal_category = Category(name="Mammals", description="Mammal species")
reptile_category = Category(name="Reptiles", description="Reptile species")
root_node = TreeNode(value=root_category)
mammal_node = TreeNode(value=mammal_category)
reptile_node = TreeNode(value=reptile_category)
# Add subclasses to the root node
root_node.add_child(mammal_node)
root_node.add_child(reptile_node)
Classify all items under the root node into Mammals
or Reptiles
using the AI classifier.
from taxonomy_synthesis.tree.node_operator import NodeOperator
from taxonomy_synthesis.classifiers.gpt_classifier import GPTClassifier
# Initialize the GPT classifier and node operator
classifier = GPTClassifier(client=client)
generator = None # We'll use manual generation for this part
operator = NodeOperator(classifier=classifier, generator=generator)
# Convert dictionary items to Item objects and classify
item_objects = [Item(**item) for item in items]
classified_items = operator.classify_items(root_node, item_objects)
print("After initial classification:")
print(root_node.print_tree())
Output:
After initial classification:
Animals: []
Mammals: [π¦, π¨, π, π, π, π]
Reptiles: [π, π, π’, π¦]
Use AI to automatically generate subcategories under Mammals
based on the provided data.
from taxonomy_synthesis.generator.taxonomy_generator import TaxonomyGenerator
# Initialize the Taxonomy Generator
generator = TaxonomyGenerator(
client=client,
max_categories=2,
generation_method="Create categories inaccordance to the philogenetic tree."
)
operator.generator = generator
# Generate subcategories under Mammals
new_categories = operator.generate_subcategories(mammal_node)
print("Generated subcategories under 'Mammals':")
print(mammal_node.print_tree())
Output:
Generated subcategories under 'Mammals':
Mammals: [π¦, π¨, π, π, π, π]
marsupials: []
placentals: []
Now classify the items specifically under the Mammals
node into their newly generated subcategories.
# Reclassify items under Mammals based on the new subcategories
classified_items = operator.classify_items(mammal_node, mammal_node.get_all_items())
print("After reclassification under 'Mammals':")
print(root_node.print_tree())
Output:
After reclassification under 'Mammals':
Mammals: []
marsupials: [π¦, π¨]
placentals: [π, π, π, π]
Finally, print the entire tree to see the categorized structure.
# Print the final tree structure
print("Final taxonomy tree structure:")
print(root_node.print_tree())
Output:
Final taxonomy tree structure:
Animals: []
Mammals: []
marsupials: [π¦, π¨]
placentals: [π, π, π, π]
Reptiles: [π, π, π’, π¦]
For a visual representation of the system architecture and its components, refer to the following diagram:
Contributions are welcome! To get started, follow these steps to set up your development environment:
Clone the Repository:
git clone https://github.com/CakeCrusher/TaxonomySynthesis.git
cd taxonomy-synthesis
Install Poetry (if not already installed):
curl -sSL https://install.python-poetry.org | python3 -
Install Dependencies:
Use Poetry to install all the dependencies in a virtual environment:
poetry install
Activate the Virtual Environment:
To activate the virtual environment created by Poetry:
poetry shell
Run Pre-Commit Hooks:
To maintain code quality, please run pre-commit hooks before submitting any pull requests:
poetry run pre-commit install
poetry run pre-commit run --all-files
We encourage you to open issues for any bugs you encounter or features you'd like to see added. Pull requests are also highly appreciated! Let's work together to improve and expand this project.
FAQs
An AI-driven framework for synthesizing adaptive taxonomies, enabling automated data categorization and classification within dynamic hierarchical structures.
We found that taxonomy-synthesis demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.Β It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
AGENTS.md is a fast-growing open format giving AI coding agents a shared, predictable way to understand project setup, style, and workflows.
Security News
/Research
Malicious npm package impersonates Nodemailer and drains wallets by hijacking crypto transactions across multiple blockchains.
Security News
This episode explores the hard problem of reachability analysis, from static analysis limits to handling dynamic languages and massive dependency trees.