0

I'm new to the map reduce concept and even though I'm making some slow progress, I'm finding some issues that I need some help with.

I have a simple collection consisting of an id, city and and destination, something like this:

{ "_id" : "5230e7e00000000000000000", "city" : "Boston", "to" : "Chicago" },
{ "_id" : "523fe7e00000000000000000", "city" : "New York", "to" : "Miami" },
{ "_id" : "5240e1e00000000000000000", "city" : "Boston", "to" : "Miami" },
{ "_id" : "536fe4e00000000000000000", "city" : "Washington D.C.", "to" : "Boston" },
{ "_id" : "53ffe7e00000000000000000", "city" : "New York", "to" : "Boston" },
{ "_id" : "5740e1e00000000000000000", "city" : "Boston", "to" : "Miami" },
...

(Please do note that this data is just made up for example purposes)

I'd like to group by city the destinations including a count:

{ "city" : "Boston", values : [{"Chicago",1}, {"Miami",2}] }
{ "city" : "New York", values : [{"Miami",1}, {"Boston",1}] }
{ "city" : "Washington D.C.", values : [{"Boston", 1}] }

For this I'm starting to playing with this function to map:

 function() {
 emit(this.city, this.to);
 }

which performs the expected grouping. My reduce function is this:

 function(key, values) {
 var reduced = {"to":[]};
 for (var i in values) {
 var item = values[i];
 reduced.to.push(item);
 }
 return reduced;
 }

which gives somewhat an expected output:

{ "_id" : ObjectId("522f8a9181f01e671a853adb"), "value" : { "to" : [ "Boston", "Miami" ] } }
{ "_id" : ObjectId("522f933a81f01e671a853ade"), "value" : { "to" : [ "Chicago", "Miami", "Miami" ] } }
{ "_id" : ObjectId("5231f0ed81f01e671a853ae0"), "value" : "Boston" }

As you can see, I still haven't counted the repeated cities, but as can be seen above, for some reason the last result in the output doesn't look good. I'd expected it to be

{ "_id" : ObjectId("5231f0ed81f01e671a853ae0"), "value" : { "to" : ["Boston"] } }

Has this anything to do with the fact that there is a single item? Is there any way to obtain this?

Thank you.

asked Sep 18, 2013 at 17:48
2
  • Is there a reason that you've picked map reduce over the aggregation frame work to do this(there are valid reasons, but the AF is generally a better choice)? Commented Sep 18, 2013 at 20:17
  • No other reason than being complete newbie on this. I read that both map reduce and AF would be superior over "Single Purpose Aggregation Operations" so I started with the one most "popular" online. Commented Sep 18, 2013 at 23:48

1 Answer 1

2

I see you are asking about a PHP issue, but you are using javascript to ask, so I’m assuming a javascript answer will help you move things along. As such here is the javascript needed in the shell to run your aggregation. I strong suggest getting your aggregation working in the shell(or some other javascript editor) in general and then translating it into the language of your choice. It is a lot easier to see what is going on and there faster using this method. You can then run:

use admin
db.runCommand( { setParameter: 1, logLevel: 2 } )

to check the bson output of your selected language vs what the shell looks like. This will appear in the terminal if mongo is in the foreground, otherwise you’ll have ot look in the logs.

Summing the routes in the aggregation framework [AF] with Mongo is fairly strait forward. The AF is faster and easier to use then map reduce[MR]. Though in this case they both have similar issues, simply pushing to an array won’t yield a count in and of itself (in MR you either need more logic in your reduce function or to use a finalize function).

With the AF using the example data provided this pipeline is useful:

db.agg1.aggregate([
 {$group:{
 _id: { city: "$city", to: "$to" }, 
 count: { $sum: 1 }
 }},
 {$group: {
 _id: "$_id.city",
 to:{ $push: {to: "$_id.to", count: "$count"}}
 }}
]);

The aggregation framework can only operate on known fields, but many pipeline operations so a problem needs to broken down with that as a consideration. Above, the 1st stage calculates the numbers need, for which there are 3 fixed fields: the source, the destination, and the count. The second stage has 2 fixed fields, one of which is an array, which is only being pushed to (all the data for the final form is there).

For MR you can do this:

var map = function() {
 var key = {source:this.city, dest:this.to};
 emit(key, 1);
};
var reduce = function(key, values) {
 return Array.sum(values);
};

A separate function will have to pretty it however.

If you have any additional questions please don’t hesitate to ask.

Best, Charlie

answered Sep 18, 2013 at 22:17
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks so much for this. It answers my question as thoroughly as I could think of. I think I could continue from now on, but will let you know in case I get into some other issue. Regarding my PHP comment, I was actually wondering if the PHP driver would allow any way to preprocess the input collection using PHP code, as the examples I've seen make use of a helper class "MongoCode" to inline the JS code. But that should probably go in a separate question (after I do some research by myself)

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.