Security News
tea.xyz Spam Plagues npm and RubyGems Package Registries
Tea.xyz, a crypto project aimed at rewarding open source contributions, is once again facing backlash due to an influx of spam packages flooding public package registries.
Readme
RePlay is an advanced framework designed to facilitate the development and evaluation of recommendation systems. It provides a robust set of tools covering the entire lifecycle of a recommendation system pipeline:
Installation via pip
package manager is recommended by default:
pip install replay-rec
In this case it will be installed the core
package without PySpark
and PyTorch
dependencies.
Also experimental
submodule will not be installed.
To install experimental
submodule please specify the version with rc0
suffix.
For example:
pip install replay-rec==XX.YY.ZZrc0
In addition to the core package, several extras are also provided, including:
[spark]
: Install PySpark functionality[torch]
: Install PyTorch and Lightning functionality[all]
: [spark]
[torch]
Example:
# Install core package with PySpark dependency
pip install replay-rec[spark]
# Install package with experimental submodule and PySpark dependency
pip install replay-rec[spark]==XX.YY.ZZrc0
To build RePlay from sources please use the instruction.
If you encounter an error during RePlay installation, check the troubleshooting guide.
from rs_datasets import MovieLens
from replay.data import Dataset, FeatureHint, FeatureInfo, FeatureSchema, FeatureType
from replay.data.dataset_utils import DatasetLabelEncoder
from replay.metrics import HitRate, NDCG, Experiment
from replay.models import ItemKNN
from replay.utils.spark_utils import convert2spark
from replay.utils.session_handler import State
from replay.splitters import RatioSplitter
spark = State().session
ml_1m = MovieLens("1m")
K=10
# data preprocessing
interactions = convert2spark(ml_1m.ratings)
# data splitting
splitter = RatioSplitter(
test_size=0.3,
divide_column="user_id",
query_column="user_id",
item_column="item_id",
timestamp_column="timestamp",
drop_cold_items=True,
drop_cold_users=True,
)
train, test = splitter.split(interactions)
# dataset creating
feature_schema = FeatureSchema(
[
FeatureInfo(
column="user_id",
feature_type=FeatureType.CATEGORICAL,
feature_hint=FeatureHint.QUERY_ID,
),
FeatureInfo(
column="item_id",
feature_type=FeatureType.CATEGORICAL,
feature_hint=FeatureHint.ITEM_ID,
),
FeatureInfo(
column="rating",
feature_type=FeatureType.NUMERICAL,
feature_hint=FeatureHint.RATING,
),
FeatureInfo(
column="timestamp",
feature_type=FeatureType.NUMERICAL,
feature_hint=FeatureHint.TIMESTAMP,
),
]
)
train_dataset = Dataset(
feature_schema=feature_schema,
interactions=train,
)
test_dataset = Dataset(
feature_schema=feature_schema,
interactions=test,
)
# data encoding
encoder = DatasetLabelEncoder()
train_dataset = encoder.fit_transform(train_dataset)
test_dataset = encoder.transform(test_dataset)
# model training
model = ItemKNN()
model.fit(train_dataset)
# model inference
encoded_recs = model.predict(
dataset=train_dataset,
k=K,
queries=test_dataset.query_ids,
filter_seen_items=True,
)
recs = encoder.query_and_item_id_encoder.inverse_transform(encoded_recs)
# model evaluation
metrics = Experiment(
[NDCG(K), HitRate(K)],
test,
query_column="user_id",
item_column="item_id",
rating_column="rating",
)
metrics.add_result("ItemKNN", recs)
print(metrics.results)
Video guides:
Research papers:
We welcome community contributions. For details please check our contributing guidelines.
FAQs
RecSys Library
We found that replay-rec demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Tea.xyz, a crypto project aimed at rewarding open source contributions, is once again facing backlash due to an influx of spam packages flooding public package registries.
Security News
As cyber threats become more autonomous, AI-powered defenses are crucial for businesses to stay ahead of attackers who can exploit software vulnerabilities at scale.
Security News
UnitedHealth Group disclosed that the ransomware attack on Change Healthcare compromised protected health information for millions in the U.S., with estimated costs to the company expected to reach $1 billion.