12

First time Map/Reduce user here, and using MongoDB. I have a lot of page visit data which I'd like to make some sense of by using Map/Reduce. Below is basically what I want to do, but as a total beginner a Map/Reduce, I think this is above my knowledge!

  1. Go through all the pages with visits in the last 30 days, and where external = true.
  2. Then for each page, find all visits
  3. Group all visits by referral location
  4. For each referral location, calculate how many then went to visit a page which has a certain "type" and also has a certain word in the "tags".

The database and collection are organised as

$mongo->dbname->visits

A sample document is:

{"url": "www.example.com", "type": "a", "refer": {"external": true, "domain": "twitter.com", "url": "http://www.twitter.com/page"}, "page": "1235", "user": "1232", "time": 1234567890}

And then I want to find documents of type B with a certain tag.

{"url": "www.example.com", "type": "b", "page": "745", "user": "1232", "time": 1234567890, "tags": {"a", "b", "c"}}

I'm using the normal Mongo PHP extension if that has an impact.

halfdan
34.3k8 gold badges83 silver badges88 bronze badges
asked Jun 9, 2010 at 2:57
7
  • 1
    What database structure do you have? How is your collections and documents organized? Commented Jun 9, 2010 at 4:46
  • Added to above post. That help? Commented Jun 9, 2010 at 12:01
  • OK, your sample document does not include a "referral" an "external" or a "tags" field. What you're suggesting is indeed complicated, so you'll probably need to show us more than one document. And you'll probably need to show it with all of the details. Commented Jun 10, 2010 at 5:43
  • I've been working on something that is exactly the same as this (visit tracking using mongo), post a few more details and I can perhaps help. Commented Jun 10, 2010 at 10:56
  • Updated, this provide anymore info for you guys? Thanks Commented Jun 10, 2010 at 13:29

2 Answers 2

16

Ok, I've come up with something that I think may do what you want. Note, that this may not work exactly since I'm not 100% sure of your schema (considering your examples show refer available in type a, but not b (I'm not sure if that's an omission, or what considering you want to view by referer)... Anyway, here's what I've come up with:

The map function:

function() {
 var obj = {
 "types": {},
 "tags": {},
 }
 obj.types[this.type] = 1;
 if (this.tags) {
 for (var tag in this.tags) {
 obj.tags[this.tags[tag]] = 1;
 }
 }
 emit(this.refer.url, obj);
}

The Reduce function:

function(key, values) {
 var obj = {
 "types": {},
 "tags": {},
 }
 for (var i = 0; i < values.length; i++) {
 for (var type in values[i].types) {
 if (!type in obj.types) {
 obj.types[type] = 0;
 }
 obj.types[type] += values[i].types[type];
 }
 for (var tag in values[i].tags) {
 if (!tag in obj.tags) {
 obj.tags[tag] = 0;
 }
 obj.tags[tag] += values[i].tags[tag];
 }
 }
 return obj;
}

So basically, how it works is this. The Map function uses a key of refer.url (what I guessed based on your description). So the end result will look like an array with _id equal to refer.url (It groups based on url). It then creates an object that has two objects under it (types and tags). The reason for the object is so that map and reduce can emit the same format object. Other than that, I THINK that it should be relatively self explanatory (If you don't understand, I can try to explain more)...

So let's implement this in PHP (Assuming that $map and $reduce are strings with the above contained with them for terseness):

$mapFunc = new MongoCode($map);
$reduceFunc = new MongoCode($reduce);
$query = array(
 'time' => array('$gte' => time() - (60*60*60*24*30)),
 'refer.external' => true
);
$collection = 'visits';
$command = array(
 'mapreduce' => $collection,
 'map' => $mapFunc,
 'reduce' => $reduceFunc,
 'query' => $query,
);
$statsInfo = $db->command($command);
$statsCollection = $db->selectCollection($sales['result']);
$stats = $statsCollection->find();
foreach ($stats as $stat) {
 echo $stats['_id'] .' Visited ';
 foreach ($stats['value']['types'] as $type => $times) {
 echo "Type $type $times Times, ";
 }
 foreach ($stats['value']['tags'] as $tag => $times) {
 echo "Tag $tag $times Times, ";
 }
 echo "\n";
}

Note, I haven't tested this. This is just what I've come up with based on my understanding of your schema, and from my understanding of Mongo and its Map-Reduce implementation...

answered Jun 16, 2010 at 13:01
Sign up to request clarification or add additional context in comments.

1 Comment

$statsCollection = $db->selectCollection($sales['result']); $sales?
0
answered Mar 15, 2011 at 17:32

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.