
RePlay is an advanced framework designed to facilitate the development and evaluation of recommendation systems. It provides a robust set of tools covering the entire lifecycle of a recommendation system pipeline:
🚀 Features:
- Data Preprocessing and Splitting: Streamlines the data preparation process for recommendation systems, ensuring optimal data structure and format for efficient processing.
- Wide Range of Recommendation Models: Enables building of recommendation models from State-of-the-Art to commonly-used baselines and evaluate their performance and quality.
- Hyperparameter Optimization: Offers tools for fine-tuning model parameters to achieve the best possible performance, reducing the complexity of the optimization process.
- Comprehensive Evaluation Metrics: Incorporates a wide range of evaluation metrics to assess the accuracy and effectiveness of recommendation models.
- Model Ensemble and Hybridization: Supports combining predictions from multiple models and creating two-level (ensemble) models to enhance the quality of recommendations.
- Seamless Mode Transition: Facilitates easy transition from offline experimentation to online production environments, ensuring scalability and flexibility.
💻 Hardware and Environment Compatibility:
- Diverse Hardware Support: Compatible with various hardware configurations including CPU, GPU, Multi-GPU.
- Cluster Computing Integration: Integrating with PySpark for distributed computing, enabling scalability for large-scale recommendation systems.
Table of Contents
📈 Quickstart
pip install replay-rec[all]
Pyspark-based model and fast polars-based data preprocessing:
from polars import from_pandas
from rs_datasets import MovieLens
from replay.data import Dataset, FeatureHint, FeatureInfo, FeatureSchema, FeatureType
from replay.data.dataset_utils import DatasetLabelEncoder
from replay.metrics import HitRate, NDCG, Experiment
from replay.models import ItemKNN
from replay.utils.spark_utils import convert2spark
from replay.utils.session_handler import State
from replay.splitters import RatioSplitter
spark = State().session
ml_1m = MovieLens("1m")
K = 10
interactions = from_pandas(ml_1m.ratings)
splitter = RatioSplitter(
test_size=0.3,
divide_column="user_id",
query_column="user_id",
item_column="item_id",
timestamp_column="timestamp",
drop_cold_items=True,
drop_cold_users=True,
)
train, test = splitter.split(interactions)
feature_schema = FeatureSchema(
[
FeatureInfo(
column="user_id",
feature_type=FeatureType.CATEGORICAL,
feature_hint=FeatureHint.QUERY_ID,
),
FeatureInfo(
column="item_id",
feature_type=FeatureType.CATEGORICAL,
feature_hint=FeatureHint.ITEM_ID,
),
FeatureInfo(
column="rating",
feature_type=FeatureType.NUMERICAL,
feature_hint=FeatureHint.RATING,
),
FeatureInfo(
column="timestamp",
feature_type=FeatureType.NUMERICAL,
feature_hint=FeatureHint.TIMESTAMP,
),
]
)
train_dataset = Dataset(feature_schema=feature_schema, interactions=train)
test_dataset = Dataset(feature_schema=feature_schema, interactions=test)
encoder = DatasetLabelEncoder()
train_dataset = encoder.fit_transform(train_dataset)
test_dataset = encoder.transform(test_dataset)
train_dataset.to_spark()
test_dataset.to_spark()
model = ItemKNN()
model.fit(train_dataset)
encoded_recs = model.predict(
dataset=train_dataset,
k=K,
queries=test_dataset.query_ids,
filter_seen_items=True,
)
recs = encoder.query_and_item_id_encoder.inverse_transform(encoded_recs)
metrics = Experiment(
[NDCG(K), HitRate(K)],
test,
query_column="user_id",
item_column="item_id",
rating_column="rating",
)
metrics.add_result("ItemKNN", recs)
print(metrics.results)
🔧 Installation
Installation via pip
package manager is recommended by default:
pip install replay-rec
In this case it will be installed the core
package without PySpark
and PyTorch
dependencies.
Also experimental
submodule will not be installed.
To install experimental
submodule please specify the version with rc0
suffix.
For example:
pip install replay-rec==XX.YY.ZZrc0
In addition to the core package, several extras are also provided, including:
[spark]
: Install PySpark functionality
[torch]
: Install PyTorch and Lightning functionality
Example:
pip install replay-rec[spark]
pip install replay-rec[spark]==XX.YY.ZZrc0
Additionally, replay-rec[torch]
may be installed with CPU-only version of torch
by providing its respective index URL during installation:
pip install replay-rec[torch] --extra-index-url https://download.pytorch.org/whl/cpu
To build RePlay from sources please use the instruction.
Optional features
RePlay includes a set of optional features which require users to install optional dependencies manually. These features include:
- Hyperpearameter search via Optuna:
pip install optuna
- Model compilation via OpenVINO:
pip install openvino onnx
- Vector database and hierarchical search support:
pip install hnswlib fixed-install-nmslib
📑 Resources
Usage examples
Videos and papers
-
Video guides:
-
Research papers:
- RePlay: a Recommendation Framework for Experimentation and Production Use Alexey Vasilev, Anna Volodkevich, Denis Kulandin, Tatiana Bysheva, Anton Klenitskiy. In The 18th ACM Conference on Recommender Systems (RecSys '24)
- Turning Dross Into Gold Loss: is BERT4Rec really better than SASRec? Anton Klenitskiy, Alexey Vasilev. In The 17th ACM Conference on Recommender Systems (RecSys '23)
- The Long Tail of Context: Does it Exist and Matter?. Konstantin Bauman, Alexey Vasilev, Alexander Tuzhilin. In Workshop on Context-Aware Recommender Systems (CARS) (RecSys '22)
- Multiobjective Evaluation of Reinforcement Learning Based Recommender Systems. Alexey Grishanov, Anastasia Ianina, Konstantin Vorontsov. In The 16th ACM Conference on Recommender Systems (RecSys '22)
- Quality Metrics in Recommender Systems: Do We Calculate Metrics Consistently? Yan-Martin Tamm, Rinchin Damdinov, Alexey Vasilev. In The 15th ACM Conference on Recommender Systems (RecSys '21)
💡 Contributing to RePlay
We welcome community contributions. For details please check our contributing guidelines.