🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more

gurulearn

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

gurulearn

GuruLearn is a comprehensive Python library that seamlessly integrates machine learning, computer vision, audio processing, and conversational AI capabilities in one package. Through six specialized modules (MLModelAnalysis, image classification, CTScanProcessor, AudioRecognition, FlowBot, and QAAgent), it empowers developers to build sophisticated AI solutions with minimal setup, accelerating the journey from prototype to production for data-driven applications across multiple domains.

4.1
Maintainers
1

MLModelAnalysis is a versatile and reusable Python class designed to streamline training, evaluation, and prediction processes for various machine learning regression models. This tool allows users to switch seamlessly between models, perform consistent data preprocessing, evaluate models, and make predictions, making it highly adaptable for different machine learning tasks.

Supported Models

  • Linear Regression (linear_regression)
  • Decision Tree Regressor (decision_tree)
  • Random Forest Regressor (random_forest)
  • Support Vector Machine (svm)
  • Gradient Boosting Regressor (gradient_boosting)
  • K-Nearest Neighbors (knn)
  • AdaBoost Regressor (ada_boost)
  • Neural Network (MLP Regressor) (mlp)
  • XGBoost Regressor (xgboost)

Installation

To use MLModelAnalysis, install the following dependencies:

pip install scikit-learn pandas numpy plotly xgboost

Usage

1. Initializing the Model

Initialize the MLModelAnalysis class by specifying the model_type parameter, which sets the machine learning model you wish to use.

from ml_model_analysis import MLModelAnalysis

# Initialize with Linear Regression
analysis = MLModelAnalysis(model_type='linear_regression')

# Initialize with Random Forest
analysis = MLModelAnalysis(model_type='random_forest')

# Initialize with XGBoost
analysis = MLModelAnalysis(model_type='xgboost')

2. Training and Evaluating the Model

The train_and_evaluate method handles data preprocessing, model training, and metric evaluation. Optionally, it can save the trained model, scaler, and encoders for later use.

Parameters

  • csv_file: Path to the CSV file containing the dataset.
  • x_elements: List of feature columns.
  • y_element: Name of the target column.
  • model_save_path (Optional): Path to save the trained model, scaler, and encoders.

Example

# Set the parameters
csv_file = 'data.csv'                     # Path to the data file
x_elements = ['feature1', 'feature2']      # Feature columns
y_element = 'target'                       # Target column

# Initialize the model
analysis = MLModelAnalysis(model_type='random_forest')

# Train and evaluate the model
analysis.train_and_evaluate(csv_file=csv_file, x_elements=x_elements, y_element=y_element, model_save_path='random_forest_model.pkl')

After running this code, the model displays R-squared and Mean Squared Error (MSE) metrics for both the training and test sets. If model_save_path is specified, the model will be saved for future predictions.

3. Loading the Model and Making Predictions

The load_model_and_predict method allows you to load a saved model and make predictions on new input data.

Parameters

  • model_path: Path to the saved model file.
  • input_data: Dictionary containing feature names and values for prediction.

Example

# Define input data for prediction
input_data = {
    'feature1': 5.1,
    'feature2': 2.3
}

# Load the model and make a prediction
prediction = analysis.load_model_and_predict(model_path='random_forest_model.pkl', input_data=input_data)
print(f'Prediction: {prediction}')

4. Visualization

For linear_regression or svm models with only one feature, the train_and_evaluate method will automatically generate a Plotly plot of actual vs. predicted values for quick visualization.

Example Use Cases

  • Regression Analysis with Random Forest

    analysis = MLModelAnalysis(model_type='random_forest')
    analysis.train_and_evaluate(csv_file='data.csv', x_elements=['feature1', 'feature2'], y_element='target', model_save_path='random_forest_model.pkl')
    
  • Quick Prediction with a Pre-trained Model

    prediction = analysis.load_model_and_predict(model_path='random_forest_model.pkl', input_data={'feature1': 5.1, 'feature2': 2.3})
    print(f'Prediction: {prediction}')
    
  • Effortless Model Switching

    # Specify a new model type to use a different algorithm
    analysis = MLModelAnalysis(model_type='xgboost')
    

Additional Notes

  • Plotting: Visualizations are supported for linear models and SVM with single-feature datasets.
  • Model Saving: The model_save_path parameter in train_and_evaluate stores the model, scaler, and encoders, allowing consistent predictions when reloading the model later.
  • Dependencies: Ensure required libraries are installed (scikit-learn, pandas, numpy, plotly, and xgboost).

License

This project is licensed under the MIT License.

PyTorch Image Classification Library

A comprehensive and flexible image classification library built on PyTorch, designed to simplify the process of training, evaluating, and deploying image classification models.

Features

  • Support for multiple neural network architectures:
    • Simple CNN
    • VGG16
    • ResNet50
    • MobileNetV2
    • InceptionV3
    • DenseNet121
    • EfficientNet-B0
  • Automatic model selection based on dataset size
  • Training from directory structure or CSV file
  • Fine-tuning options for pre-trained models
  • Comprehensive evaluation metrics and visualizations
  • Easy-to-use prediction interface
  • Model saving and loading capabilities

Installation

Requirements

  • Python 3.6+
  • PyTorch 1.7+
  • torchvision
  • pandas
  • numpy
  • Pillow
  • scikit-learn
  • matplotlib

Install dependencies

pip install torch torchvision pandas numpy pillow scikit-learn matplotlib

Usage

Training a model

From directory structure

from image_classifier import ImageClassifier

# Initialize classifier
classifier = ImageClassifier()

# Train from directory
classifier.train(
    train_dir='path/to/train_data',  # Directory with class subfolders
    test_dir='path/to/test_data',    # Optional test directory
    epochs=10,
    model_name='resnet50',          # Model architecture to use
    finetune=True,                  # Whether to finetune all layers
    batch_size=32,
    learning_rate=0.001,
    save_path='my_model.pth'
)

From CSV file

from image_classifier import ImageClassifier

# Initialize classifier
classifier = ImageClassifier()

# Train from CSV file
classifier.train(
    csv_file='images.csv',          # CSV with image paths and labels
    img_column='image_path',        # Column with image paths
    label_column='label',           # Column with class labels
    epochs=10,
    model_name='mobilenet',
    finetune=False,
    batch_size=32,
    learning_rate=0.001,
    save_path='my_model.pth'
)

Loading a model and making predictions

from image_classifier import ImageClassifier
from PIL import Image

# Initialize classifier
classifier = ImageClassifier()

# Load model
classifier.load('my_model.pth', model_name='resnet50')

# Predict from image path
predictions = classifier.predict('path/to/image.jpg', top_k=3)
for class_name, probability in predictions:
    print(f"{class_name}: {probability:.4f}")

# Or predict from PIL Image
image = Image.open('path/to/image.jpg')
predictions = classifier.predict(image=image, top_k=3)

Command Line Interface

The library also provides a command-line interface:

# Train a model
python -m image_classifier --mode train --train_dir path/to/train_data --test_dir path/to/test_data --model_name resnet50 --epochs 10 --batch_size 32 --model_path my_model.pth

# Train from CSV
python -m image_classifier --mode train --csv_file images.csv --img_column image_path --label_column label --model_name mobilenet --epochs 10 --model_path my_model.pth

# Make predictions
python -m image_classifier --mode predict --model_path my_model.pth --image_path path/to/image.jpg --top_k 3

Class Structure

ImageDataset

The ImageDataset class extends PyTorch's Dataset class to handle loading images from:

  • Directory structure with class folders
  • CSV files with image paths and class labels

ImageClassifier

The main class that provides the following functionality:

  • Model selection and initialization
  • Data loading and preprocessing
  • Training and evaluation
  • Visualization of training metrics
  • Model persistence
  • Prediction interface

Model Selection

The library can automatically select an appropriate model architecture based on your dataset size:

  • Small datasets (<1,000 images): Simple CNN
  • Medium datasets (1,000-5,000 images): VGG16
  • Large datasets (>5,000 images): ResNet50

You can also explicitly specify any supported model architecture.

Visualizations

During training, the library generates:

  • Training and validation accuracy plots
  • Training and validation loss plots
  • Confusion matrix visualization

These plots are automatically saved as PNG files.

Example Directory Structure

When using directory-based training, organize your data as follows:

train_data/
├── class1/
│   ├── image1.jpg
│   ├── image2.jpg
│   └── ...
├── class2/
│   ├── image1.jpg
│   ├── image2.jpg
│   └── ...
└── ...

Example CSV Format

When using CSV-based training, your CSV file should contain at least two columns:

image_path,label
/path/to/image1.jpg,class1
/path/to/image2.jpg,class2
...

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

#### CTScanProcessor ####

CTScanProcessor is a Python class designed for advanced processing and quality evaluation of CT scan images. This tool is highly beneficial for applications in medical imaging, data science, and deep learning, providing noise reduction, contrast enhancement, detail preservation, and quality evaluation.

Features

  • Sharpening: Enhances image details by applying a sharpening filter.
  • Median Denoising: Reduces noise while preserving edges using a median filter.
  • Contrast Enhancement: Enhances contrast using CLAHE (Contrast Limited Adaptive Histogram Equalization).
  • Quality Metrics: Calculates image quality metrics such as MSE, PSNR, SNR, and Detail Preservation Ratio to evaluate the effectiveness of processing.
  • Image Comparison: Creates side-by-side comparisons of original and processed images.

Installation

This class requires the following libraries:

  • OpenCV
  • NumPy
  • SciPy

To install the required dependencies, use:

pip install opencv-python-headless numpy scipy

Usage

  • Initialize the Processor

    from ct_scan_processor import CTScanProcessor
    processor = CTScanProcessor(kernel_size=5, clip_limit=2.0, tile_grid_size=(8, 8))
    
  • Process a CT Scan Use the process_ct_scan method to process a CT scan image and get quality metrics.

    denoised, metrics = processor.process_ct_scan("path_to_ct_scan.jpg", "output_folder", compare=True)
    
  • Quality Metrics After processing, the class returns metrics such as Mean Squared Error (MSE), Peak Signal-to-Noise Ratio (PSNR), Signal-to-Noise Ratio (SNR), and Detail Preservation Ratio.

  • Compare Images If compare=True, a side-by-side comparison image is saved in the specified comparison folder.

Example

if __name__ == "__main__":
    processor = CTScanProcessor()
    denoised, metrics = processor.process_ct_scan("path_to_ct_scan.jpg", "output_folder", compare=True)

Quality Metrics

The following metrics are calculated to evaluate the quality of the denoised image:

  • MSE: Mean Squared Error between the original and processed images.
  • PSNR: Peak Signal-to-Noise Ratio to measure image quality.
  • SNR: Signal-to-Noise Ratio to measure signal strength relative to noise.
  • Detail Preservation: Percentage of preserved details after processing.

Methods

  • sharpen(image): Sharpens the input image.
  • median_denoise(image): Denoises the input image using a median filter.
  • enhance_contrast(image): Enhances contrast using CLAHE.
  • enhanced_denoise(image_path): Processes a CT scan image with denoising, contrast enhancement, and sharpening.
  • evaluate_quality(original, denoised): Computes MSE, PSNR, SNR, and Detail Preservation.
  • compare_images(original, processed, output_path): Saves a side-by-side comparison of the original and processed images.
  • process_ct_scan(input_path, output_folder, comparison_folder="comparison", compare=False): Runs the full CT scan processing pipeline and saves the results.

License

This project is licensed under the MIT License.

Contributions

Contributions are welcome! Feel free to submit pull requests or open issues.

AudioRecognition

The AudioRecognition class in gurulearn provides tools for audio data augmentation, feature extraction, model training, and prediction, making it suitable for tasks like audio classification and speech recognition.

Key Features

  • Data Augmentation: Supports time-stretching, pitch-shifting, and noise addition for audio data augmentation.
  • Feature Extraction: Extracts MFCCs, FAISS, and spectral contrast features from audio signals.
  • Model Training: Trains a deep learning model for audio classification using a Conv1D and BiLSTM-based architecture.
  • Prediction: Predicts the class of a given audio file based on a trained model.

Usage

To use the AudioRecognition class, follow these steps:

1. Importing and Initializing

from gurulearn import AudioRecognition

# Initialize the audio recognition class
audio_recognition = AudioRecognition()

2. Loading Data with Augmentation

The load_data_with_augmentation method loads audio data from a specified directory and performs augmentation to improve model generalization.

data_dir = "path/to/audio/data"
X, y = audio_recognition.load_data_with_augmentation(data_dir)

This method returns feature vectors (X) and labels (y) for training.

3. Training the Model

The audiotrain method trains an audio classification model. This method also generates a confusion matrix and training history plot, which are saved in the specified model directory.

audio_recognition.audiotrain(
    data_path="path/to/audio/data",
    epochs=50,
    batch_size=32,
    test_size=0.2,
    learning_rate=0.001,
    model_dir='model_folder'
)

Parameters:

  • data_path: Directory path where audio data is stored (organized by class label).
  • epochs: Number of training epochs (default: 50).
  • batch_size: Training batch size (default: 32).
  • test_size: Proportion of data to use for testing (default: 0.2).
  • learning_rate: Initial learning rate for model training (default: 0.001).
  • model_dir: Directory where the model and label mappings will be saved.

4. Predicting the Class of an Audio File

After training, you can predict the class of a new audio file using the predict or predict_class methods.

# Path to the input audio file
input_wav = "path/to/audio/file.wav"

# Predict the label of the audio file
predicted_label = audio_recognition.predict(input_wav)
print(f"Predicted Label: {predicted_label}")

The predict method returns the predicted label (text), while predict_class returns the numeric class index.

Example Workflow

# Initialize the audio recognition instance
audio_recognition = AudioRecognition()

# Load data and perform augmentation
X, y = audio_recognition.load_data_with_augmentation('data/audio_files')

# Train the model on the audio dataset
audio_recognition.audiotrain(
    data_path='data/audio_files',
    epochs=30,
    batch_size=32,
    learning_rate=0.001
)

# Predict the class of a new audio sample
predicted_label = audio_recognition.predict('data/test_audio.wav')
print("Predicted Label:", predicted_label)

Files Created

  • Confusion Matrix: confusion_matrix.png - Saved in the current directory after training.
  • Training History: training_history.png - Contains plots for model accuracy and loss.
  • Model: audio_recognition_model.h5 - Saved in the specified model directory.
  • Label Mapping: label_mapping.json - Contains mappings of class indices to labels.

Introducing FlowBot

FlowBot is a flexible framework for creating dynamic, guided interactions (chatbots, booking systems, surveys) that adapt to user input and filter datasets in real time. Perfect for travel booking, customer support, or personalized recommendations!

Installation

pip install gurulearn

Quick Start

Build a Travel Booking Bot in 5 Steps:

import pandas as pd
from gurulearn import FlowBot

# Sample dataset
hotels = pd.DataFrame({
    'destination': ['Paris', 'Tokyo', 'New York'],
    'price_range': ['$$$', '$$', '$'],
    'hotel_name': ['Luxury Palace', 'Mountain View', 'Downtown Inn']
})

# Initialize FlowBot
bot = FlowBot(hotels)

# Collect user email first
bot.add_personal_info("email", "Please enter your email:")

# Define workflow
bot.add("destination", "Where would you like to go?", required=True)
bot.add("price_range", "Choose your budget:", required=False)
bot.finish("hotel_name", "price_range")  # Final output columns

# Simulate user interaction
response = bot.process("user123", "")  # Start flow!
print(response['message'])  # "Where would you like to go?"
print(response['suggestions'])  # ["Paris", "Tokyo", "New York"]

# User selects 'Paris'
response = bot.process("user123", "Paris")
print(response['message'])  # "Choose your budget:"
print(response['suggestions'])  # ["$$$", "$$"]

Key Features

1. Dynamic Suggestions

Auto-filter valid options based on prior choices:

bot.add("activity", "Choose an activity:", required=True)
# Suggests only activities available in the selected destination

2. Personalized Data Collection

bot.add_personal_info("phone", "Your phone number:", required=True)

3. Session Management

Resume progress or reset conversations:

bot.reset_session("user123")  # Restart workflow

4. Save Results

User data and chat history auto-saved to JSON:

user_data/user123.json
{
  "personal_info": {"email": "user@example.com"},
  "chat_history": [...]
}

Detailed Usage

Initialize FlowBot

bot = FlowBot(
    data=df,  # Your pandas DataFrame
)

Add Workflow Steps

bot.add(
    field="room_type",  # DataFrame column to filter
    prompt="Select room type:",  # User prompt
    required=True  # Force valid input
)

Get Final Results

results = response['results']  # Filtered DataFrame rows as dicts
# Example: [{'hotel_name': 'Luxury Palace', 'price_range': '$$$'}]

🔧 Dependencies

  • Python 3.7+
  • pandas

📜 License

MIT License

Get Help

Found a bug? Open an issue.

Happy Building!
Tag your projects with #gurulearn to share them with the community!

GuruLearn

QAAgent: Create intelligent QA systems with RAG

GuruLearn's QAAgent provides a simple yet powerful way to create domain-specific question-answering agents using Retrieval Augmented Generation (RAG).

Installation

pip install gurulearn

Dependencies

GuruLearn requires the following dependencies:

  • langchain-ollama
  • langchain-core
  • FAISS
  • pandas
  • langchain-community

Install them using:

Quick Start

from gurulearn import QAAgent
import pandas as pd

# Load your data
df = pd.read_csv("customer_support_tickets.csv")

# Create an agent
support_agent = QAAgent(
    data=df,
    page_content_fields=["Title", "Description"],
    metadata_fields=["Category", "Priority"],
    system_prompt="You are a helpful customer support agent."
)

# Query the agent
answer = support_agent.query("How do I reset my password?")
print(answer)

# Or use interactive mode
support_agent.interactive_mode()

Features

  • Simple Setup: Create powerful RAG-based QA systems with minimal code
  • Flexible Data Support: Works with pandas DataFrames or lists of dictionaries
  • Custom Prompting: Define system prompts and instructions to shape agent responses
  • Vector Database Integration: Automatically creates and manages embeddings for efficient retrieval
  • Interactive Mode: Built-in console interface for quick experimentation

API Reference

QAAgent

QAAgent(
    data,                    # DataFrame or list of dictionaries containing the source data
    page_content_fields,     # Field(s) to use as document content
    metadata_fields=None,    # Fields to include as metadata
    llm_model="llama3.2",    # Ollama model to use for generation
    k=5,                     # Number of documents to retrieve
    embedding_model="mxbai-embed-large",  # Ollama model for embeddings
    db_location="./langchain_db",  # Directory to store vector database
    collection_name="documents",          # Name of the collection in the vector store
    prompt_template=None,    # Custom prompt template (if None, a default will be used)
    system_prompt="You are an expert in answering questions about the provided information."
)

Methods

  • query(question): Query the agent with a question and get a response
  • interactive_mode(): Start an interactive console session for querying the agent

Examples

Restaurant Review QA System

from gurulearn import QAAgent
import pandas as pd

# Load restaurant review data
df = pd.read_csv("restaurant_reviews.csv")

# Create a restaurant review agent
restaurant_agent = QAAgent(
    data=df,
    page_content_fields=["Title", "Review"],
    metadata_fields=["Rating", "Date"],
    llm_model="llama3.2",
    k=5,
    db_location="./restaurant_db",
    collection_name="restaurant_reviews",
    system_prompt="You are an expert in answering questions about a pizza restaurant."
)

# Ask questions about the restaurant
result = restaurant_agent.query("What do customers say about the pepperoni pizza?")
print(result)

HR Policy Assistant

from gurulearn import QAAgent

# Create an HR policy assistant
hr_documents = [
    {"Policy": "Parental Leave", "Description": "Employees are entitled to 12 weeks of paid parental leave...", "Department": "HR", "LastUpdated": "2023-09-01"},
    {"Policy": "Remote Work", "Description": "Employees may work remotely up to 3 days per week...", "Department": "HR", "LastUpdated": "2023-10-15"},
    # More policy documents...
]

hr_agent = QAAgent(
    data=hr_documents,
    page_content_fields=["Policy", "Description"],
    metadata_fields=["Department", "LastUpdated"],
    db_location="./_db",
    collection_name="hr_policies",
    system_prompt="You are an HR assistant providing information about company policies."
)

# Query the HR assistant
hr_agent.interactive_mode()

Advanced Usage

Custom Prompt Templates

You can define custom prompt templates to control how the agent processes and responds to queries:

custom_template = """
You are a technical support specialist for computer hardware.

CONTEXT:
{reviews}

USER QUESTION:
{question}

Please provide a concise answer focusing only on the information found in the context.
If the information isn't in the context, admit you don't know.
"""

support_agent = QAAgent(
    data=support_df,
    page_content_fields=["Issue", "Resolution"],
    prompt_template=custom_template
)

Using with Different Models

QAAgent works with any Ollama model:

# Using with different Ollama models
medical_agent = QAAgent(
    data=medical_data,
    page_content_fields="text",
    llm_model="nous-hermes2:Q5_K_M", 
    embedding_model="nomic-embed-text"
)

License

MIT

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts