Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
/ RME Public
forked from thanhdtran/RME

Regularizing Matrix Factorization with User and Item Embeddings for Recommendation -- CIKM 2018

Notifications You must be signed in to change notification settings

libeibei95/RME

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

30 Commits

Repository files navigation

RME

This repo contains source code for our paper: "Regularizing Matrix Factorization with User and Item Embeddings for Recommendation" published in CIKM 2018. We implemented using multi-threads, so it is very fast to run with big datasets.

DATA FORMAT

Data format:

  • First line: the header "userId,movieId"
  • Second line --> last line: [userId],[movieId]

data for running our source code: ml10m.

We preprocessed it and splitted into train, vad/dev, test. Their paths are:

  • data/ml10m/train.csv

  • data/ml10m/test.csv

  • data/ml10m/validation.csv

format of user and disliked items: same as previous format:

  • First line: the header "userId,movieId"
  • Second line --> last line: [userId],[movieId]

When we have available users and dislike items:

do 2 steps:

  • saved it to data/ml10m/train_neg.csv
  • build the disliked item-item co-occurrence by running (assume that the dataset is ml10m): python produce_negative_cooccurrence.py --dataset ml10m

RUNNING:

Step 1.1: produce user-user co-occurrence matrix and item-item co-occurrence matrix

python produce_positive_cooccurrence.py --dataset ml10m

Step 1.2: prduce negative co-occurrence matrix of item-item (if the dislike items are available, if not available, that's ok, we will infer disliked item in step 2):

python produce_negative_cooccurrence.py --dataset ml10m

Step 2.1: run RME with available disliked items (in case you ran step 1.2 already):

python rme_rec.py --dataset ml10m --model rme --neg_item_inference 0 --n_factors 40 --reg 1.0 --reg_embed 1.0

Step 2.2: run RME with our user-oriented EM-like algorithm to infer disliked items for users (in case disliked items are not available, and you are not able to run step 1.2):

python rme_rec.py --dataset ml10m --model rme --neg_item_inference 1 --n_factors 40 --reg 1.0 --reg_embed 1.0

where:

  • model: the model to run. There are 3 choices: rme (our model), wmf, cofactor.
  • reg: is the regularization hyper-parameter for user and item latent factors (alpha and beta).
  • reg_emb: is the regularization hyper-parameter for user and item context latent factors (gamma, theta, delta).
  • n_factors: number of latent factors (or embedding size). Default: n_factors = 40.
  • neg_item_inference: whether or not running our user-oriented EM like algorithm for sampling disliked items for users. In case we have available user-disliked_items --> set this to 0.
  • neg_sample_ratio: negative sample ratio per user. If a user consumed 10 items, and this neg_sample_ratio = 0.2 --> randomly sample 2 negative items for the user. Default: 0.2.

other hyper-parameters:

  • s: the shifted constant, which is a hyper-parameter to control density of SPPMI matrix. Default: s = 1.
  • data_path: path to the data. Default: data.
  • saved_model_path: path to saved the optimal model using validation/development dataset. Default: MODELS.

You may get some results like:

top-5 results: recall@5 = 0.1559, ndcg@5 = 0.1613, map@5 = 0.1076
top-10 results: recall@10 = 0.1513, ndcg@10 = 0.1547, map@10 = 0.0851
top-20 results: recall@20 = 0.1477, ndcg@20 = 0.1473, map@20 = 0.0669
top-50 results: recall@50 = 0.1819, ndcg@50 = 0.1553, map@50 = 0.0562
top-100 results: recall@100 = 0.2533, ndcg@100 = 0.1825, map@100 = 0.0579

running some baselines: Cofactor, WMF:

  • Running cofactor:

python rme_rec.py --dataset ml10m --model cofactor --n_factors 40 --reg 1.0 --reg_embed 1.0

You may get the results like:

top-5 results: recall@5 = 0.1522, ndcg@5 = 0.1537, map@5 = 0.1000
top-10 results: recall@10 = 0.1383, ndcg@10 = 0.1425, map@10 = 0.0756
top-20 results: recall@20 = 0.1438, ndcg@20 = 0.1391, map@20 = 0.0606
top-50 results: recall@50 = 0.1762, ndcg@50 = 0.1484, map@50 = 0.0518
top-100 results: recall@100 = 0.2545, ndcg@100 = 0.1783, map@100 = 0.0540
  • Running WMF:

python rme_rec.py --dataset ml10m --model wmf --n_factors 40 --reg 1.0 --reg_embed 1.0

You may get the results like:

top-5 results: recall@5 = 0.1258, ndcg@5 = 0.1283, map@5 = 0.0810
top-10 results: recall@10 = 0.1209, ndcg@10 = 0.1231, map@10 = 0.0624
top-20 results: recall@20 = 0.1290, ndcg@20 = 0.1230, map@20 = 0.0507
top-50 results: recall@50 = 0.1641, ndcg@50 = 0.1349, map@50 = 0.0442
top-100 results: recall@100 = 0.2375, ndcg@100 = 0.1640, map@100 = 0.0470

CITATION:

If you use our method (RME) in your paper, please cite the paper:

@inproceedings{tran2018regularizing,
 title={Regularizing Matrix Factorization with User and Item Embeddings for Recommendation},
 author={Tran, Thanh and Lee, Kyumin and Liao, Yiming and Lee, Dongwon},
 booktitle={Proceedings of the 27th ACM International Conference on Information and Knowledge Management},
 pages={687--696},
 year={2018},
 organization={ACM}
}

About

Regularizing Matrix Factorization with User and Item Embeddings for Recommendation -- CIKM 2018

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%

AltStyle によって変換されたページ (->オリジナル) /