oracle-n

A custom model for sentiment analysis

0.1.0
PyPI

Maintainers: 1

Oracle-n Model

Oracle-n is an AI model based on the BERT architecture, designed for text and sentiment analysis. Using the foundational strengths of BERT, Oracle-n is designed to to optimize performance for specific needs. This repository includes the model code, tokenizer, and training scripts.

Features

Customized BERT Configuration: Tailored configurations to fit text and sentiment analysis tasks.
Oracle-n Tokenizer: Custom tokenizer designed for preprocessing text data efficiently.
Sentiment Analysis: Trained on the IMDb dataset to perform sentiment analysis tasks.

Directory Structure

aclImdb/: Directory containing the IMDb dataset files in Parquet format.
dataset.py: Script for handling the dataset loading and preprocessing.
logs/: Directory for TensorBoard logs during training.
oracle-n-model/: Directory containing the saved model.
oracle-n-tokenizer/: Directory containing the saved tokenizer.
scripts/: Directory containing additional scripts for training and evaluation.
.gitignore: Git ignore file to exclude unnecessary files from the repository.
requirements.txt: File listing the dependencies required for the project.

Setup and Installation

Clone the repository:
```
git clone https://github.com/hilarl/oracle-n.git
cd oracle-n
```
Install dependencies:
```
pip install -r requirements.txt
```
Download the dataset: Ensure you have the IMDb dataset files in the aclImdb/ directory. If needed, you can download them from the IMDb dataset page and convert them to Parquet format.

Usage

Training the Model

Prepare the Dataset: Ensure the dataset files are in the aclImdb/ directory in Parquet format.
Run the Training Script:
```
python scripts/train_model.py
```
Monitor Training with TensorBoard:
```
tensorboard --logdir logs
```

Evaluating the Model

Run the Evaluation Script:
```
python scripts/evaluate_model.py
```

Customizing the Model

Modify the Configuration: Edit the dataset.py script to change the model configuration parameters such as hidden size, number of layers, and attention heads.
Add Your Own Tokenizer: Customize the tokenizer by editing the oracle-n-tokenizer directory.

Contribution

Contributions are welcome! Please fork the repository and submit a pull request.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Acknowledgments

BERT: This model is based on the BERT architecture developed by Google.
Hugging Face: Leveraging the Hugging Face Transformers library for model development.

Contact

For any questions or suggestions, please open an issue on GitHub or contact us at hilal@tenzro.com.

This README provides an overview of the Oracle-n model, its features, and how to set up and use it. Feel free to customize it further based on your specific requirements and details.

FAQs

What is oracle-n?

Is oracle-n well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install