MLModelAnalysis is a versatile and reusable Python class designed to streamline training, evaluation, and prediction processes for various machine learning regression models. This tool allows users to switch seamlessly between models, perform consistent data preprocessing, evaluate models, and make predictions, making it highly adaptable for different machine learning tasks.
Supported Models
- Linear Regression (
linear_regression
)
- Decision Tree Regressor (
decision_tree
)
- Random Forest Regressor (
random_forest
)
- Support Vector Machine (
svm
)
- Gradient Boosting Regressor (
gradient_boosting
)
- K-Nearest Neighbors (
knn
)
- AdaBoost Regressor (
ada_boost
)
- Neural Network (MLP Regressor) (
mlp
)
- XGBoost Regressor (
xgboost
)
Installation
To use MLModelAnalysis, install the following dependencies:
pip install scikit-learn pandas numpy plotly xgboost
Usage
1. Initializing the Model
Initialize the MLModelAnalysis class by specifying the model_type
parameter, which sets the machine learning model you wish to use.
from ml_model_analysis import MLModelAnalysis
analysis = MLModelAnalysis(model_type='linear_regression')
analysis = MLModelAnalysis(model_type='random_forest')
analysis = MLModelAnalysis(model_type='xgboost')
2. Training and Evaluating the Model
The train_and_evaluate
method handles data preprocessing, model training, and metric evaluation. Optionally, it can save the trained model, scaler, and encoders for later use.
Parameters
csv_file
: Path to the CSV file containing the dataset.
x_elements
: List of feature columns.
y_element
: Name of the target column.
model_save_path
(Optional): Path to save the trained model, scaler, and encoders.
Example
csv_file = 'data.csv'
x_elements = ['feature1', 'feature2']
y_element = 'target'
analysis = MLModelAnalysis(model_type='random_forest')
analysis.train_and_evaluate(csv_file=csv_file, x_elements=x_elements, y_element=y_element, model_save_path='random_forest_model.pkl')
After running this code, the model displays R-squared and Mean Squared Error (MSE) metrics for both the training and test sets. If model_save_path
is specified, the model will be saved for future predictions.
3. Loading the Model and Making Predictions
The load_model_and_predict
method allows you to load a saved model and make predictions on new input data.
Parameters
model_path
: Path to the saved model file.
input_data
: Dictionary containing feature names and values for prediction.
Example
input_data = {
'feature1': 5.1,
'feature2': 2.3
}
prediction = analysis.load_model_and_predict(model_path='random_forest_model.pkl', input_data=input_data)
print(f'Prediction: {prediction}')
4. Visualization
For linear_regression
or svm
models with only one feature, the train_and_evaluate
method will automatically generate a Plotly plot of actual vs. predicted values for quick visualization.
Example Use Cases
-
Regression Analysis with Random Forest
analysis = MLModelAnalysis(model_type='random_forest')
analysis.train_and_evaluate(csv_file='data.csv', x_elements=['feature1', 'feature2'], y_element='target', model_save_path='random_forest_model.pkl')
-
Quick Prediction with a Pre-trained Model
prediction = analysis.load_model_and_predict(model_path='random_forest_model.pkl', input_data={'feature1': 5.1, 'feature2': 2.3})
print(f'Prediction: {prediction}')
-
Effortless Model Switching
analysis = MLModelAnalysis(model_type='xgboost')
Additional Notes
- Plotting: Visualizations are supported for linear models and SVM with single-feature datasets.
- Model Saving: The
model_save_path
parameter in train_and_evaluate
stores the model, scaler, and encoders, allowing consistent predictions when reloading the model later.
- Dependencies: Ensure required libraries are installed (
scikit-learn
, pandas
, numpy
, plotly
, and xgboost
).
License
This project is licensed under the MIT License.
PyTorch Image Classification Library
A comprehensive and flexible image classification library built on PyTorch, designed to simplify the process of training, evaluating, and deploying image classification models.
Features
- Support for multiple neural network architectures:
- Simple CNN
- VGG16
- ResNet50
- MobileNetV2
- InceptionV3
- DenseNet121
- EfficientNet-B0
- Automatic model selection based on dataset size
- Training from directory structure or CSV file
- Fine-tuning options for pre-trained models
- Comprehensive evaluation metrics and visualizations
- Easy-to-use prediction interface
- Model saving and loading capabilities
Installation
Requirements
- Python 3.6+
- PyTorch 1.7+
- torchvision
- pandas
- numpy
- Pillow
- scikit-learn
- matplotlib
Install dependencies
pip install torch torchvision pandas numpy pillow scikit-learn matplotlib
Usage
Training a model
From directory structure
from image_classifier import ImageClassifier
classifier = ImageClassifier()
classifier.train(
train_dir='path/to/train_data',
test_dir='path/to/test_data',
epochs=10,
model_name='resnet50',
finetune=True,
batch_size=32,
learning_rate=0.001,
save_path='my_model.pth'
)
From CSV file
from image_classifier import ImageClassifier
classifier = ImageClassifier()
classifier.train(
csv_file='images.csv',
img_column='image_path',
label_column='label',
epochs=10,
model_name='mobilenet',
finetune=False,
batch_size=32,
learning_rate=0.001,
save_path='my_model.pth'
)
Loading a model and making predictions
from image_classifier import ImageClassifier
from PIL import Image
classifier = ImageClassifier()
classifier.load('my_model.pth', model_name='resnet50')
predictions = classifier.predict('path/to/image.jpg', top_k=3)
for class_name, probability in predictions:
print(f"{class_name}: {probability:.4f}")
image = Image.open('path/to/image.jpg')
predictions = classifier.predict(image=image, top_k=3)
Command Line Interface
The library also provides a command-line interface:
python -m image_classifier --mode train --train_dir path/to/train_data --test_dir path/to/test_data --model_name resnet50 --epochs 10 --batch_size 32 --model_path my_model.pth
python -m image_classifier --mode train --csv_file images.csv --img_column image_path --label_column label --model_name mobilenet --epochs 10 --model_path my_model.pth
python -m image_classifier --mode predict --model_path my_model.pth --image_path path/to/image.jpg --top_k 3
Class Structure
ImageDataset
The ImageDataset
class extends PyTorch's Dataset class to handle loading images from:
- Directory structure with class folders
- CSV files with image paths and class labels
ImageClassifier
The main class that provides the following functionality:
- Model selection and initialization
- Data loading and preprocessing
- Training and evaluation
- Visualization of training metrics
- Model persistence
- Prediction interface
Model Selection
The library can automatically select an appropriate model architecture based on your dataset size:
- Small datasets (<1,000 images): Simple CNN
- Medium datasets (1,000-5,000 images): VGG16
- Large datasets (>5,000 images): ResNet50
You can also explicitly specify any supported model architecture.
Visualizations
During training, the library generates:
- Training and validation accuracy plots
- Training and validation loss plots
- Confusion matrix visualization
These plots are automatically saved as PNG files.
Example Directory Structure
When using directory-based training, organize your data as follows:
train_data/
├── class1/
│ ├── image1.jpg
│ ├── image2.jpg
│ └── ...
├── class2/
│ ├── image1.jpg
│ ├── image2.jpg
│ └── ...
└── ...
Example CSV Format
When using CSV-based training, your CSV file should contain at least two columns:
image_path,label
/path/to/image1.jpg,class1
/path/to/image2.jpg,class2
...
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
#### CTScanProcessor ####
CTScanProcessor is a Python class designed for advanced processing and quality evaluation of CT scan images. This tool is highly beneficial for applications in medical imaging, data science, and deep learning, providing noise reduction, contrast enhancement, detail preservation, and quality evaluation.
Features
- Sharpening: Enhances image details by applying a sharpening filter.
- Median Denoising: Reduces noise while preserving edges using a median filter.
- Contrast Enhancement: Enhances contrast using CLAHE (Contrast Limited Adaptive Histogram Equalization).
- Quality Metrics: Calculates image quality metrics such as MSE, PSNR, SNR, and Detail Preservation Ratio to evaluate the effectiveness of processing.
- Image Comparison: Creates side-by-side comparisons of original and processed images.
Installation
This class requires the following libraries:
To install the required dependencies, use:
pip install opencv-python-headless numpy scipy
Usage
-
Initialize the Processor
from ct_scan_processor import CTScanProcessor
processor = CTScanProcessor(kernel_size=5, clip_limit=2.0, tile_grid_size=(8, 8))
-
Process a CT Scan
Use the process_ct_scan
method to process a CT scan image and get quality metrics.
denoised, metrics = processor.process_ct_scan("path_to_ct_scan.jpg", "output_folder", compare=True)
-
Quality Metrics
After processing, the class returns metrics such as Mean Squared Error (MSE), Peak Signal-to-Noise Ratio (PSNR), Signal-to-Noise Ratio (SNR), and Detail Preservation Ratio.
-
Compare Images
If compare=True
, a side-by-side comparison image is saved in the specified comparison folder.
Example
if __name__ == "__main__":
processor = CTScanProcessor()
denoised, metrics = processor.process_ct_scan("path_to_ct_scan.jpg", "output_folder", compare=True)
Quality Metrics
The following metrics are calculated to evaluate the quality of the denoised image:
- MSE: Mean Squared Error between the original and processed images.
- PSNR: Peak Signal-to-Noise Ratio to measure image quality.
- SNR: Signal-to-Noise Ratio to measure signal strength relative to noise.
- Detail Preservation: Percentage of preserved details after processing.
Methods
sharpen(image)
: Sharpens the input image.
median_denoise(image)
: Denoises the input image using a median filter.
enhance_contrast(image)
: Enhances contrast using CLAHE.
enhanced_denoise(image_path)
: Processes a CT scan image with denoising, contrast enhancement, and sharpening.
evaluate_quality(original, denoised)
: Computes MSE, PSNR, SNR, and Detail Preservation.
compare_images(original, processed, output_path)
: Saves a side-by-side comparison of the original and processed images.
process_ct_scan(input_path, output_folder, comparison_folder="comparison", compare=False)
: Runs the full CT scan processing pipeline and saves the results.
License
This project is licensed under the MIT License.
Contributions
Contributions are welcome! Feel free to submit pull requests or open issues.
AudioRecognition
The AudioRecognition
class in gurulearn
provides tools for audio data augmentation, feature extraction, model training, and prediction, making it suitable for tasks like audio classification and speech recognition.
Key Features
- Data Augmentation: Supports time-stretching, pitch-shifting, and noise addition for audio data augmentation.
- Feature Extraction: Extracts MFCCs, FAISS, and spectral contrast features from audio signals.
- Model Training: Trains a deep learning model for audio classification using a Conv1D and BiLSTM-based architecture.
- Prediction: Predicts the class of a given audio file based on a trained model.
Usage
To use the AudioRecognition
class, follow these steps:
1. Importing and Initializing
from gurulearn import AudioRecognition
audio_recognition = AudioRecognition()
2. Loading Data with Augmentation
The load_data_with_augmentation
method loads audio data from a specified directory and performs augmentation to improve model generalization.
data_dir = "path/to/audio/data"
X, y = audio_recognition.load_data_with_augmentation(data_dir)
This method returns feature vectors (X
) and labels (y
) for training.
3. Training the Model
The audiotrain
method trains an audio classification model. This method also generates a confusion matrix and training history plot, which are saved in the specified model directory.
audio_recognition.audiotrain(
data_path="path/to/audio/data",
epochs=50,
batch_size=32,
test_size=0.2,
learning_rate=0.001,
model_dir='model_folder'
)
Parameters:
- data_path: Directory path where audio data is stored (organized by class label).
- epochs: Number of training epochs (default: 50).
- batch_size: Training batch size (default: 32).
- test_size: Proportion of data to use for testing (default: 0.2).
- learning_rate: Initial learning rate for model training (default: 0.001).
- model_dir: Directory where the model and label mappings will be saved.
4. Predicting the Class of an Audio File
After training, you can predict the class of a new audio file using the predict
or predict_class
methods.
input_wav = "path/to/audio/file.wav"
predicted_label = audio_recognition.predict(input_wav)
print(f"Predicted Label: {predicted_label}")
The predict
method returns the predicted label (text), while predict_class
returns the numeric class index.
Example Workflow
audio_recognition = AudioRecognition()
X, y = audio_recognition.load_data_with_augmentation('data/audio_files')
audio_recognition.audiotrain(
data_path='data/audio_files',
epochs=30,
batch_size=32,
learning_rate=0.001
)
predicted_label = audio_recognition.predict('data/test_audio.wav')
print("Predicted Label:", predicted_label)
Files Created
- Confusion Matrix:
confusion_matrix.png
- Saved in the current directory after training.
- Training History:
training_history.png
- Contains plots for model accuracy and loss.
- Model:
audio_recognition_model.h5
- Saved in the specified model directory.
- Label Mapping:
label_mapping.json
- Contains mappings of class indices to labels.
Introducing FlowBot
FlowBot
is a flexible framework for creating dynamic, guided interactions (chatbots, booking systems, surveys) that adapt to user input and filter datasets in real time. Perfect for travel booking, customer support, or personalized recommendations!
Installation
pip install gurulearn
Quick Start
Build a Travel Booking Bot in 5 Steps:
import pandas as pd
from gurulearn import FlowBot
hotels = pd.DataFrame({
'destination': ['Paris', 'Tokyo', 'New York'],
'price_range': ['$$$', '$$', '$'],
'hotel_name': ['Luxury Palace', 'Mountain View', 'Downtown Inn']
})
bot = FlowBot(hotels)
bot.add_personal_info("email", "Please enter your email:")
bot.add("destination", "Where would you like to go?", required=True)
bot.add("price_range", "Choose your budget:", required=False)
bot.finish("hotel_name", "price_range")
response = bot.process("user123", "")
print(response['message'])
print(response['suggestions'])
response = bot.process("user123", "Paris")
print(response['message'])
print(response['suggestions'])
Key Features
1. Dynamic Suggestions
Auto-filter valid options based on prior choices:
bot.add("activity", "Choose an activity:", required=True)
2. Personalized Data Collection
bot.add_personal_info("phone", "Your phone number:", required=True)
3. Session Management
Resume progress or reset conversations:
bot.reset_session("user123")
4. Save Results
User data and chat history auto-saved to JSON:
user_data/user123.json
{
"personal_info": {"email": "user@example.com"},
"chat_history": [...]
}
Detailed Usage
Initialize FlowBot
bot = FlowBot(
data=df,
)
Add Workflow Steps
bot.add(
field="room_type",
prompt="Select room type:",
required=True
)
Get Final Results
results = response['results']
🔧 Dependencies
📜 License
MIT License
Get Help
Found a bug? Open an issue.
Happy Building!
Tag your projects with #gurulearn to share them with the community!
GuruLearn
QAAgent: Create intelligent QA systems with RAG
GuruLearn's QAAgent provides a simple yet powerful way to create domain-specific question-answering agents using Retrieval Augmented Generation (RAG).
Installation
pip install gurulearn
Dependencies
GuruLearn requires the following dependencies:
- langchain-ollama
- langchain-core
- FAISS
- pandas
- langchain-community
Install them using:
Quick Start
from gurulearn import QAAgent
import pandas as pd
df = pd.read_csv("customer_support_tickets.csv")
support_agent = QAAgent(
data=df,
page_content_fields=["Title", "Description"],
metadata_fields=["Category", "Priority"],
system_prompt="You are a helpful customer support agent."
)
answer = support_agent.query("How do I reset my password?")
print(answer)
support_agent.interactive_mode()
Features
- Simple Setup: Create powerful RAG-based QA systems with minimal code
- Flexible Data Support: Works with pandas DataFrames or lists of dictionaries
- Custom Prompting: Define system prompts and instructions to shape agent responses
- Vector Database Integration: Automatically creates and manages embeddings for efficient retrieval
- Interactive Mode: Built-in console interface for quick experimentation
API Reference
QAAgent
QAAgent(
data,
page_content_fields,
metadata_fields=None,
llm_model="llama3.2",
k=5,
embedding_model="mxbai-embed-large",
db_location="./langchain_db",
collection_name="documents",
prompt_template=None,
system_prompt="You are an expert in answering questions about the provided information."
)
Methods
- query(question): Query the agent with a question and get a response
- interactive_mode(): Start an interactive console session for querying the agent
Examples
Restaurant Review QA System
from gurulearn import QAAgent
import pandas as pd
df = pd.read_csv("restaurant_reviews.csv")
restaurant_agent = QAAgent(
data=df,
page_content_fields=["Title", "Review"],
metadata_fields=["Rating", "Date"],
llm_model="llama3.2",
k=5,
db_location="./restaurant_db",
collection_name="restaurant_reviews",
system_prompt="You are an expert in answering questions about a pizza restaurant."
)
result = restaurant_agent.query("What do customers say about the pepperoni pizza?")
print(result)
HR Policy Assistant
from gurulearn import QAAgent
hr_documents = [
{"Policy": "Parental Leave", "Description": "Employees are entitled to 12 weeks of paid parental leave...", "Department": "HR", "LastUpdated": "2023-09-01"},
{"Policy": "Remote Work", "Description": "Employees may work remotely up to 3 days per week...", "Department": "HR", "LastUpdated": "2023-10-15"},
]
hr_agent = QAAgent(
data=hr_documents,
page_content_fields=["Policy", "Description"],
metadata_fields=["Department", "LastUpdated"],
db_location="./_db",
collection_name="hr_policies",
system_prompt="You are an HR assistant providing information about company policies."
)
hr_agent.interactive_mode()
Advanced Usage
Custom Prompt Templates
You can define custom prompt templates to control how the agent processes and responds to queries:
custom_template = """
You are a technical support specialist for computer hardware.
CONTEXT:
{reviews}
USER QUESTION:
{question}
Please provide a concise answer focusing only on the information found in the context.
If the information isn't in the context, admit you don't know.
"""
support_agent = QAAgent(
data=support_df,
page_content_fields=["Issue", "Resolution"],
prompt_template=custom_template
)
Using with Different Models
QAAgent works with any Ollama model:
medical_agent = QAAgent(
data=medical_data,
page_content_fields="text",
llm_model="nous-hermes2:Q5_K_M",
embedding_model="nomic-embed-text"
)
License
MIT