1

I'm mining 500 million users, and their "followers" from a social network using their API. The extraction of data itself is not a problem, since I can do it with my scripts. However having 500 million users and their followers in a list in memory can be very costly.

My script created two lists,one with the users that I already got their followers, and one with the users to be looked at (I would get each user, put their followers in the queue, write to file, and then go to the next one.) So it would be 2 long lists that I cannot handle in memory. So I thought of a database.

So finally to my question, is it better for me to use a relational database, or a NoSQL, graph, database, like Neo4j. The only information I'm getting now is the user ID and the ID of the followers, which later I want to analyse (for graph theory research.) I thought of a database because I might try add more information later as well.

Thank you.

asked Aug 28, 2013 at 2:50
1
  • Yes something like Neo4j makes a lot of sense for this kind of data. Esp. if you want to analyse graph structures later on. Commented Jun 2, 2015 at 5:20

1 Answer 1

2

Sounds on the surface like a graph database problem. If you're going to be walking the edges between users, neo4j or such like may be the one for you.

You might be able to do more generic processing using a document db where every user has an _id of user_id and an array of followers _ids.

Perhaps you could output to MongoDb, then use Neo4j for creating the graph(s) for specialised work, and mongodb for more general work. MapReduce and the aggregation framework in MongoDb are pretty good (speaking from experience, although MapReduce is much more powerful than aggregrtion framework (currently)).

Since the schema is likely to morph, and you do not know what the additional data will be, you might prefer a doc or graph db over a RDB. If you prefer to work in a relational manner at a later point, you can generate csv extracts to upload to your RDBMS of choice after you have defined a schema.

answered Sep 4, 2013 at 9:21

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.