Name	Name	Last commit message	Last commit date
Latest commit History 30 Commits
MODELS	MODELS
data/ml10m	data/ml10m
.gitignore	.gitignore
MultiProcessParallelSolver.py	MultiProcessParallelSolver.py
README.md	README.md
__init__.py	__init__.py
batched_inv_joblib.py	batched_inv_joblib.py
cofactor.py	cofactor.py
content_wmf.py	content_wmf.py
global_constants.py	global_constants.py
model_runner.py	model_runner.py
parallel_rme.py	parallel_rme.py
produce_negative_cooccurrence.py	produce_negative_cooccurrence.py
produce_negative_embedding.py	produce_negative_embedding.py
produce_positive_cooccurrence.py	produce_positive_cooccurrence.py
rec_eval.py	rec_eval.py
rme_preprocess.py	rme_preprocess.py
rme_rec.py	rme_rec.py
text_utils.py	text_utils.py
utils.py	utils.py
wmf.py	wmf.py

RME

This repo contains source code for our paper: "Regularizing Matrix Factorization with User and Item Embeddings for Recommendation" published in CIKM 2018. We implemented using multi-threads, so it is very fast to run with big datasets.

DATA FORMAT

Data format:

First line: the header "userId,movieId"
Second line --> last line: [userId],[movieId]

data for running our source code: ml10m.

We preprocessed it and splitted into train, vad/dev, test. Their paths are:

data/ml10m/train.csv
data/ml10m/test.csv
data/ml10m/validation.csv

format of user and disliked items: same as previous format:

First line: the header "userId,movieId"
Second line --> last line: [userId],[movieId]

When we have available users and dislike items:

do 2 steps:

saved it to data/ml10m/train_neg.csv
build the disliked item-item co-occurrence by running (assume that the dataset is ml10m): python produce_negative_cooccurrence.py --dataset ml10m

RUNNING:

Step 1.1: produce user-user co-occurrence matrix and item-item co-occurrence matrix

python produce_positive_cooccurrence.py --dataset ml10m

Step 1.2: prduce negative co-occurrence matrix of item-item (if the dislike items are available, if not available, that's ok, we will infer disliked item in step 2):

python produce_negative_cooccurrence.py --dataset ml10m

Step 2.1: run RME with available disliked items (in case you ran step 1.2 already):

python rme_rec.py --dataset ml10m --model rme --neg_item_inference 0 --n_factors 40 --reg 1.0 --reg_embed 1.0

Step 2.2: run RME with our user-oriented EM-like algorithm to infer disliked items for users (in case disliked items are not available, and you are not able to run step 1.2):

python rme_rec.py --dataset ml10m --model rme --neg_item_inference 1 --n_factors 40 --reg 1.0 --reg_embed 1.0

where:

model: the model to run. There are 3 choices: rme (our model), wmf, cofactor.
reg: is the regularization hyper-parameter for user and item latent factors (alpha and beta).
reg_emb: is the regularization hyper-parameter for user and item context latent factors (gamma, theta, delta).
n_factors: number of latent factors (or embedding size). Default: n_factors = 40.
neg_item_inference: whether or not running our user-oriented EM like algorithm for sampling disliked items for users. In case we have available user-disliked_items --> set this to 0.
neg_sample_ratio: negative sample ratio per user. If a user consumed 10 items, and this neg_sample_ratio = 0.2 --> randomly sample 2 negative items for the user. Default: 0.2.

other hyper-parameters:

s: the shifted constant, which is a hyper-parameter to control density of SPPMI matrix. Default: s = 1.
data_path: path to the data. Default: data.
saved_model_path: path to saved the optimal model using validation/development dataset. Default: MODELS.

You may get some results like:

top-5 results: recall@5 = 0.1559, ndcg@5 = 0.1613, map@5 = 0.1076
top-10 results: recall@10 = 0.1513, ndcg@10 = 0.1547, map@10 = 0.0851
top-20 results: recall@20 = 0.1477, ndcg@20 = 0.1473, map@20 = 0.0669
top-50 results: recall@50 = 0.1819, ndcg@50 = 0.1553, map@50 = 0.0562
top-100 results: recall@100 = 0.2533, ndcg@100 = 0.1825, map@100 = 0.0579

running some baselines: Cofactor, WMF:

Running cofactor:

python rme_rec.py --dataset ml10m --model cofactor --n_factors 40 --reg 1.0 --reg_embed 1.0

You may get the results like:

top-5 results: recall@5 = 0.1522, ndcg@5 = 0.1537, map@5 = 0.1000
top-10 results: recall@10 = 0.1383, ndcg@10 = 0.1425, map@10 = 0.0756
top-20 results: recall@20 = 0.1438, ndcg@20 = 0.1391, map@20 = 0.0606
top-50 results: recall@50 = 0.1762, ndcg@50 = 0.1484, map@50 = 0.0518
top-100 results: recall@100 = 0.2545, ndcg@100 = 0.1783, map@100 = 0.0540

Running WMF:

python rme_rec.py --dataset ml10m --model wmf --n_factors 40 --reg 1.0 --reg_embed 1.0

You may get the results like:

top-5 results: recall@5 = 0.1258, ndcg@5 = 0.1283, map@5 = 0.0810
top-10 results: recall@10 = 0.1209, ndcg@10 = 0.1231, map@10 = 0.0624
top-20 results: recall@20 = 0.1290, ndcg@20 = 0.1230, map@20 = 0.0507
top-50 results: recall@50 = 0.1641, ndcg@50 = 0.1349, map@50 = 0.0442
top-100 results: recall@100 = 0.2375, ndcg@100 = 0.1640, map@100 = 0.0470

CITATION:

If you use our method (RME) in your paper, please cite the paper:

@inproceedings{tran2018regularizing,
 title={Regularizing Matrix Factorization with User and Item Embeddings for Recommendation},
 author={Tran, Thanh and Lee, Kyumin and Liao, Yiming and Lee, Dongwon},
 booktitle={Proceedings of the 27th ACM International Conference on Information and Knowledge Management},
 pages={687--696},
 year={2018},
 organization={ACM}
}

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libeibei95/RME

Folders and files

Latest commit

History

Repository files navigation

RME

DATA FORMAT

Data format:

data for running our source code: ml10m.

format of user and disliked items: same as previous format:

When we have available users and dislike items:

RUNNING:

Step 1.1: produce user-user co-occurrence matrix and item-item co-occurrence matrix

Step 1.2: prduce negative co-occurrence matrix of item-item (if the dislike items are available, if not available, that's ok, we will infer disliked item in step 2):

Step 2.1: run RME with available disliked items (in case you ran step 1.2 already):

Step 2.2: run RME with our user-oriented EM-like algorithm to infer disliked items for users (in case disliked items are not available, and you are not able to run step 1.2):

other hyper-parameters:

running some baselines: Cofactor, WMF:

CITATION:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

libeibei95/RME

Folders and files

Latest commit

History

Repository files navigation

RME

DATA FORMAT

Data format:

data for running our source code: ml10m.

format of user and disliked items: same as previous format:

When we have available users and dislike items:

RUNNING:

Step 1.1: produce user-user co-occurrence matrix and item-item co-occurrence matrix

Step 1.2: prduce negative co-occurrence matrix of item-item (if the dislike items are available, if not available, that's ok, we will infer disliked item in step 2):

Step 2.1: run RME with available disliked items (in case you ran step 1.2 already):

Step 2.2: run RME with our user-oriented EM-like algorithm to infer disliked items for users (in case disliked items are not available, and you are not able to run step 1.2):

other hyper-parameters:

running some baselines: Cofactor, WMF:

CITATION:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages