1

Okay so this isn't a specific question; I just need some guidelines as to how to go about doing a final project for my class in python, version 2.7. What the program basically has to do is import information from 2 text files, one with just a list of books and their author seperated by a comma and lines:
ex:
Douglas Adams,The Hitchhiker's Guide To The Galaxy
Richard Adams,Watership Down
Mitch Albom,The Five People You Meet in Heaven

and then a file of users and their name on one line and with their rating of the 55 total books on the next(text box isn't big enough to put all ratings on one line): ex:
Ben
5 0 0 0 0 0 0 1 0 1 -3 5 0 0 0 5 5 0 0 0 0 5 0 0 0 0 0 0 0 0 1 3 0 1 0 -5 0 0 5 5 0 5 5 5 0 5 5 0 0 0 5 5 5 5 -5
Moose
5 5 0 0 0 0 3 0 0 1 0 5 3 0 5 0 3 3 5 0 0 0 0 0 5 0 0 0 0 0 3 5 0 0 0 0 0 5 -3 0 0 0 5 0 0 0 0 0 0 5 5 0 3 0 0

Anyways, the actual project has you use an algorithm provided by the instructor to compare the users interests in books. You input a user and it searches all of the other 87 users by multiplying their rating of the book and adding them all together. For example, for the first book Ben has a rating of 5 and so does Moose, so the score of similarity for Moose is 25, you do this for every book multiplying the inputted users ratings with all of the other users and whomever has the closest similarity score, you output them and refer 5 books to the user. Basically books that the closest similarity score user read and rated high but the inputted user hasn't read. Okay now after that huge wall of text which I'm sorry about. I understand what I have to do but I just cannot figure out a simple way to do this task, I'm not looking for someone to do the project for me or anything like that its just that this project is worth a large portion of our grade and I don't even know where to start. If anybody could even lead me in the right direction as to what type of data structures would be easiest to accomplish this task I would be extremely grateful, again sorry for such a long post but I'm desperate.

Wooble
90.6k12 gold badges111 silver badges132 bronze badges
asked Apr 12, 2012 at 3:28
6
  • What have you tried? What do or don't you know how to do? Have you tried starting by breaking the problem down into logical steps and listing them? What have you accomplished in Python before? Commented Apr 12, 2012 at 3:36
  • 1
    Okay well to start I broke the two files into lists. I made a list of all the books; a list of all the users, and a list of all the ratings. but from there I basically don't have any idea how to move forward. Commented Apr 12, 2012 at 3:42
  • Make a dictionary with the key being a user and the value being the corresponding ratings. Commented Apr 12, 2012 at 4:23
  • How could I go about doing this, if I have a list of all the users, and a list of all the ratings? I was never very good with dictionaries. Commented Apr 12, 2012 at 5:02
  • Well, you read names and rating-lists alternately, right? So, read a name; read a rating-list; add the information to the dictionary, with the key being the name and the value being the rating-list. What's the problem? Commented Apr 12, 2012 at 5:21

1 Answer 1

1

The algorithm you speak of sounds an awful lot like the Vector Space model (also this page). Conceive of each user's score as a 55-dimensonal vector (forming a line in 55-dimensional space), and you are comparing the similarities of the user's lines by computing how close their angles are to one another.

Anyway, your application has two basic parts:

  1. document parsing to build a data structure
  2. implementing your algorithm using the created structures

Notice what these have in common is that you need to decide on a data structure, so the data structure you use is central to your application.

The simplest thing that could possibly work is two lists. One list is the book data: books = [('author', 'book'), ...]. The other is the score data: scores = [('user', [1,2,3,4,...]), ...]. And then you make sure that scores[n][1][m] is the score corresponding to books[m]. Then you make sure the functions that implement the algorithm accepts these structures.

I would make sure you can get this method working first. Then you can look at higher levels of abstraction.

You can bundle your data with their operations using classes and objects to provide a higher level of abstraction. For example, you can store your book records as named tuples and have another object that holds a set of book records and has methods for book lookup (e.g. findByIndex, findByAuthor, etc), and a similar one for scores. You can supply a scoreset with a bookset so the score can look up book records from a score index. You can create a scoring class that accepts a scoreset and performs operations on it, like returning a list of highest-scoring book records for a given user in the scoreset. And so on.

answered Apr 12, 2012 at 3:52
Sign up to request clarification or add additional context in comments.

6 Comments

I have made the list of tuples for the books correctly but having an issue making the list of tuples of the scores. I do make the tuple, and the user correctly goes with that users scores, but the scores come as a string, how can I fix this?
parse the string. Most likely [int(n) for n in '1,2,3'.split(',')]
Okay, got it to work. So now both of the lists are good. Where can I go from here?
Is this the stack overflow equivalent of stone soup?
I guess what I need to know to continue is, what is an easy way to multiply all the inputted users ratings with the ratings of all the other users and how would I go about doing so?
|

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.