
Research
/Security News
Toptal’s GitHub Organization Hijacked: 10 Malicious Packages Published
Threat actors hijacked Toptal’s GitHub org, publishing npm packages with malicious payloads that steal tokens and attempt to wipe victim systems.
GuruLearn is a comprehensive Python library that seamlessly integrates machine learning, computer vision, audio processing, and conversational AI capabilities in one package. Through six specialized modules (MLModelAnalysis, image classification, CTScanProcessor, AudioRecognition, FlowBot, and QAAgent), it empowers developers to build sophisticated AI solutions with minimal setup, accelerating the journey from prototype to production for data-driven applications across multiple domains.
MLModelAnalysis is a versatile and reusable Python class designed to streamline training, evaluation, and prediction processes for various machine learning regression models. This tool allows users to switch seamlessly between models, perform consistent data preprocessing, evaluate models, and make predictions, making it highly adaptable for different machine learning tasks.
linear_regression
)decision_tree
)random_forest
)svm
)gradient_boosting
)knn
)ada_boost
)mlp
)xgboost
)To use MLModelAnalysis, install the following dependencies:
pip install scikit-learn pandas numpy plotly xgboost
Initialize the MLModelAnalysis class by specifying the model_type
parameter, which sets the machine learning model you wish to use.
from ml_model_analysis import MLModelAnalysis
# Initialize with Linear Regression
analysis = MLModelAnalysis(model_type='linear_regression')
# Initialize with Random Forest
analysis = MLModelAnalysis(model_type='random_forest')
# Initialize with XGBoost
analysis = MLModelAnalysis(model_type='xgboost')
The train_and_evaluate
method handles data preprocessing, model training, and metric evaluation. Optionally, it can save the trained model, scaler, and encoders for later use.
csv_file
: Path to the CSV file containing the dataset.x_elements
: List of feature columns.y_element
: Name of the target column.model_save_path
(Optional): Path to save the trained model, scaler, and encoders.# Set the parameters
csv_file = 'data.csv' # Path to the data file
x_elements = ['feature1', 'feature2'] # Feature columns
y_element = 'target' # Target column
# Initialize the model
analysis = MLModelAnalysis(model_type='random_forest')
# Train and evaluate the model
analysis.train_and_evaluate(csv_file=csv_file, x_elements=x_elements, y_element=y_element, model_save_path='random_forest_model.pkl')
After running this code, the model displays R-squared and Mean Squared Error (MSE) metrics for both the training and test sets. If model_save_path
is specified, the model will be saved for future predictions.
The load_model_and_predict
method allows you to load a saved model and make predictions on new input data.
model_path
: Path to the saved model file.input_data
: Dictionary containing feature names and values for prediction.# Define input data for prediction
input_data = {
'feature1': 5.1,
'feature2': 2.3
}
# Load the model and make a prediction
prediction = analysis.load_model_and_predict(model_path='random_forest_model.pkl', input_data=input_data)
print(f'Prediction: {prediction}')
For linear_regression
or svm
models with only one feature, the train_and_evaluate
method will automatically generate a Plotly plot of actual vs. predicted values for quick visualization.
Regression Analysis with Random Forest
analysis = MLModelAnalysis(model_type='random_forest')
analysis.train_and_evaluate(csv_file='data.csv', x_elements=['feature1', 'feature2'], y_element='target', model_save_path='random_forest_model.pkl')
Quick Prediction with a Pre-trained Model
prediction = analysis.load_model_and_predict(model_path='random_forest_model.pkl', input_data={'feature1': 5.1, 'feature2': 2.3})
print(f'Prediction: {prediction}')
Effortless Model Switching
# Specify a new model type to use a different algorithm
analysis = MLModelAnalysis(model_type='xgboost')
model_save_path
parameter in train_and_evaluate
stores the model, scaler, and encoders, allowing consistent predictions when reloading the model later.scikit-learn
, pandas
, numpy
, plotly
, and xgboost
).This project is licensed under the MIT License.
A comprehensive and flexible image classification library built on PyTorch, designed to simplify the process of training, evaluating, and deploying image classification models.
pip install torch torchvision pandas numpy pillow scikit-learn matplotlib
from image_classifier import ImageClassifier
# Initialize classifier
classifier = ImageClassifier()
# Train from directory
classifier.train(
train_dir='path/to/train_data', # Directory with class subfolders
test_dir='path/to/test_data', # Optional test directory
epochs=10,
model_name='resnet50', # Model architecture to use
finetune=True, # Whether to finetune all layers
batch_size=32,
learning_rate=0.001,
save_path='my_model.pth'
)
from image_classifier import ImageClassifier
# Initialize classifier
classifier = ImageClassifier()
# Train from CSV file
classifier.train(
csv_file='images.csv', # CSV with image paths and labels
img_column='image_path', # Column with image paths
label_column='label', # Column with class labels
epochs=10,
model_name='mobilenet',
finetune=False,
batch_size=32,
learning_rate=0.001,
save_path='my_model.pth'
)
from image_classifier import ImageClassifier
from PIL import Image
# Initialize classifier
classifier = ImageClassifier()
# Load model
classifier.load('my_model.pth', model_name='resnet50')
# Predict from image path
predictions = classifier.predict('path/to/image.jpg', top_k=3)
for class_name, probability in predictions:
print(f"{class_name}: {probability:.4f}")
# Or predict from PIL Image
image = Image.open('path/to/image.jpg')
predictions = classifier.predict(image=image, top_k=3)
The library also provides a command-line interface:
# Train a model
python -m image_classifier --mode train --train_dir path/to/train_data --test_dir path/to/test_data --model_name resnet50 --epochs 10 --batch_size 32 --model_path my_model.pth
# Train from CSV
python -m image_classifier --mode train --csv_file images.csv --img_column image_path --label_column label --model_name mobilenet --epochs 10 --model_path my_model.pth
# Make predictions
python -m image_classifier --mode predict --model_path my_model.pth --image_path path/to/image.jpg --top_k 3
The ImageDataset
class extends PyTorch's Dataset class to handle loading images from:
The main class that provides the following functionality:
The library can automatically select an appropriate model architecture based on your dataset size:
You can also explicitly specify any supported model architecture.
During training, the library generates:
These plots are automatically saved as PNG files.
When using directory-based training, organize your data as follows:
train_data/
├── class1/
│ ├── image1.jpg
│ ├── image2.jpg
│ └── ...
├── class2/
│ ├── image1.jpg
│ ├── image2.jpg
│ └── ...
└── ...
When using CSV-based training, your CSV file should contain at least two columns:
image_path,label
/path/to/image1.jpg,class1
/path/to/image2.jpg,class2
...
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
CTScanProcessor is a Python class designed for advanced processing and quality evaluation of CT scan images. This tool is highly beneficial for applications in medical imaging, data science, and deep learning, providing noise reduction, contrast enhancement, detail preservation, and quality evaluation.
This class requires the following libraries:
To install the required dependencies, use:
pip install opencv-python-headless numpy scipy
Initialize the Processor
from ct_scan_processor import CTScanProcessor
processor = CTScanProcessor(kernel_size=5, clip_limit=2.0, tile_grid_size=(8, 8))
Process a CT Scan
Use the process_ct_scan
method to process a CT scan image and get quality metrics.
denoised, metrics = processor.process_ct_scan("path_to_ct_scan.jpg", "output_folder", compare=True)
Quality Metrics After processing, the class returns metrics such as Mean Squared Error (MSE), Peak Signal-to-Noise Ratio (PSNR), Signal-to-Noise Ratio (SNR), and Detail Preservation Ratio.
Compare Images
If compare=True
, a side-by-side comparison image is saved in the specified comparison folder.
if __name__ == "__main__":
processor = CTScanProcessor()
denoised, metrics = processor.process_ct_scan("path_to_ct_scan.jpg", "output_folder", compare=True)
The following metrics are calculated to evaluate the quality of the denoised image:
sharpen(image)
: Sharpens the input image.median_denoise(image)
: Denoises the input image using a median filter.enhance_contrast(image)
: Enhances contrast using CLAHE.enhanced_denoise(image_path)
: Processes a CT scan image with denoising, contrast enhancement, and sharpening.evaluate_quality(original, denoised)
: Computes MSE, PSNR, SNR, and Detail Preservation.compare_images(original, processed, output_path)
: Saves a side-by-side comparison of the original and processed images.process_ct_scan(input_path, output_folder, comparison_folder="comparison", compare=False)
: Runs the full CT scan processing pipeline and saves the results.This project is licensed under the MIT License.
Contributions are welcome! Feel free to submit pull requests or open issues.
The AudioRecognition
class in gurulearn
provides tools for audio data augmentation, feature extraction, model training, and prediction, making it suitable for tasks like audio classification and speech recognition.
To use the AudioRecognition
class, follow these steps:
from gurulearn import AudioRecognition
# Initialize the audio recognition class
audio_recognition = AudioRecognition()
The load_data_with_augmentation
method loads audio data from a specified directory and performs augmentation to improve model generalization.
data_dir = "path/to/audio/data"
X, y = audio_recognition.load_data_with_augmentation(data_dir)
This method returns feature vectors (X
) and labels (y
) for training.
The audiotrain
method trains an audio classification model. This method also generates a confusion matrix and training history plot, which are saved in the specified model directory.
audio_recognition.audiotrain(
data_path="path/to/audio/data",
epochs=50,
batch_size=32,
test_size=0.2,
learning_rate=0.001,
model_dir='model_folder'
)
Parameters:
After training, you can predict the class of a new audio file using the predict
or predict_class
methods.
# Path to the input audio file
input_wav = "path/to/audio/file.wav"
# Predict the label of the audio file
predicted_label = audio_recognition.predict(input_wav)
print(f"Predicted Label: {predicted_label}")
The predict
method returns the predicted label (text), while predict_class
returns the numeric class index.
# Initialize the audio recognition instance
audio_recognition = AudioRecognition()
# Load data and perform augmentation
X, y = audio_recognition.load_data_with_augmentation('data/audio_files')
# Train the model on the audio dataset
audio_recognition.audiotrain(
data_path='data/audio_files',
epochs=30,
batch_size=32,
learning_rate=0.001
)
# Predict the class of a new audio sample
predicted_label = audio_recognition.predict('data/test_audio.wav')
print("Predicted Label:", predicted_label)
confusion_matrix.png
- Saved in the current directory after training.training_history.png
- Contains plots for model accuracy and loss.audio_recognition_model.h5
- Saved in the specified model directory.label_mapping.json
- Contains mappings of class indices to labels.FlowBot
is a flexible framework for creating dynamic, guided interactions (chatbots, booking systems, surveys) that adapt to user input and filter datasets in real time. Perfect for travel booking, customer support, or personalized recommendations!
pip install gurulearn
Build a Travel Booking Bot in 5 Steps:
import pandas as pd
from gurulearn import FlowBot
# Sample dataset
hotels = pd.DataFrame({
'destination': ['Paris', 'Tokyo', 'New York'],
'price_range': ['$$$', '$$', '$'],
'hotel_name': ['Luxury Palace', 'Mountain View', 'Downtown Inn']
})
# Initialize FlowBot
bot = FlowBot(hotels)
# Collect user email first
bot.add_personal_info("email", "Please enter your email:")
# Define workflow
bot.add("destination", "Where would you like to go?", required=True)
bot.add("price_range", "Choose your budget:", required=False)
bot.finish("hotel_name", "price_range") # Final output columns
# Simulate user interaction
response = bot.process("user123", "") # Start flow!
print(response['message']) # "Where would you like to go?"
print(response['suggestions']) # ["Paris", "Tokyo", "New York"]
# User selects 'Paris'
response = bot.process("user123", "Paris")
print(response['message']) # "Choose your budget:"
print(response['suggestions']) # ["$$$", "$$"]
Auto-filter valid options based on prior choices:
bot.add("activity", "Choose an activity:", required=True)
# Suggests only activities available in the selected destination
bot.add_personal_info("phone", "Your phone number:", required=True)
Resume progress or reset conversations:
bot.reset_session("user123") # Restart workflow
User data and chat history auto-saved to JSON:
user_data/user123.json
{
"personal_info": {"email": "user@example.com"},
"chat_history": [...]
}
bot = FlowBot(
data=df, # Your pandas DataFrame
)
bot.add(
field="room_type", # DataFrame column to filter
prompt="Select room type:", # User prompt
required=True # Force valid input
)
results = response['results'] # Filtered DataFrame rows as dicts
# Example: [{'hotel_name': 'Luxury Palace', 'price_range': '$$$'}]
pandas
Found a bug? Open an issue.
Happy Building!
Tag your projects with #gurulearn to share them with the community!
GuruLearn's QAAgent provides a simple yet powerful way to create domain-specific question-answering agents using Retrieval Augmented Generation (RAG).
pip install gurulearn
GuruLearn requires the following dependencies:
Install them using:
from gurulearn import QAAgent
import pandas as pd
# Load your data
df = pd.read_csv("customer_support_tickets.csv")
# Create an agent
support_agent = QAAgent(
data=df,
page_content_fields=["Title", "Description"],
metadata_fields=["Category", "Priority"],
system_prompt="You are a helpful customer support agent."
)
# Query the agent
answer = support_agent.query("How do I reset my password?")
print(answer)
# Or use interactive mode
support_agent.interactive_mode()
QAAgent(
data, # DataFrame or list of dictionaries containing the source data
page_content_fields, # Field(s) to use as document content
metadata_fields=None, # Fields to include as metadata
llm_model="llama3.2", # Ollama model to use for generation
k=5, # Number of documents to retrieve
embedding_model="mxbai-embed-large", # Ollama model for embeddings
db_location="./langchain_db", # Directory to store vector database
collection_name="documents", # Name of the collection in the vector store
prompt_template=None, # Custom prompt template (if None, a default will be used)
system_prompt="You are an expert in answering questions about the provided information."
)
from gurulearn import QAAgent
import pandas as pd
# Load restaurant review data
df = pd.read_csv("restaurant_reviews.csv")
# Create a restaurant review agent
restaurant_agent = QAAgent(
data=df,
page_content_fields=["Title", "Review"],
metadata_fields=["Rating", "Date"],
llm_model="llama3.2",
k=5,
db_location="./restaurant_db",
collection_name="restaurant_reviews",
system_prompt="You are an expert in answering questions about a pizza restaurant."
)
# Ask questions about the restaurant
result = restaurant_agent.query("What do customers say about the pepperoni pizza?")
print(result)
from gurulearn import QAAgent
# Create an HR policy assistant
hr_documents = [
{"Policy": "Parental Leave", "Description": "Employees are entitled to 12 weeks of paid parental leave...", "Department": "HR", "LastUpdated": "2023-09-01"},
{"Policy": "Remote Work", "Description": "Employees may work remotely up to 3 days per week...", "Department": "HR", "LastUpdated": "2023-10-15"},
# More policy documents...
]
hr_agent = QAAgent(
data=hr_documents,
page_content_fields=["Policy", "Description"],
metadata_fields=["Department", "LastUpdated"],
db_location="./_db",
collection_name="hr_policies",
system_prompt="You are an HR assistant providing information about company policies."
)
# Query the HR assistant
hr_agent.interactive_mode()
You can define custom prompt templates to control how the agent processes and responds to queries:
custom_template = """
You are a technical support specialist for computer hardware.
CONTEXT:
{reviews}
USER QUESTION:
{question}
Please provide a concise answer focusing only on the information found in the context.
If the information isn't in the context, admit you don't know.
"""
support_agent = QAAgent(
data=support_df,
page_content_fields=["Issue", "Resolution"],
prompt_template=custom_template
)
QAAgent works with any Ollama model:
# Using with different Ollama models
medical_agent = QAAgent(
data=medical_data,
page_content_fields="text",
llm_model="nous-hermes2:Q5_K_M",
embedding_model="nomic-embed-text"
)
MIT
FAQs
GuruLearn is a comprehensive Python library that seamlessly integrates machine learning, computer vision, audio processing, and conversational AI capabilities in one package. Through six specialized modules (MLModelAnalysis, image classification, CTScanProcessor, AudioRecognition, FlowBot, and QAAgent), it empowers developers to build sophisticated AI solutions with minimal setup, accelerating the journey from prototype to production for data-driven applications across multiple domains.
We found that gurulearn demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
/Security News
Threat actors hijacked Toptal’s GitHub org, publishing npm packages with malicious payloads that steal tokens and attempt to wipe victim systems.
Research
/Security News
Socket researchers investigate 4 malicious npm and PyPI packages with 56,000+ downloads that install surveillance malware.
Security News
The ongoing npm phishing campaign escalates as attackers hijack the popular 'is' package, embedding malware in multiple versions.