Okay so this isn't a specific question; I just need some guidelines as to how to go about doing a final project for my class in python, version 2.7. What the program basically has to do is import information from 2 text files, one with just a list of books and their author seperated by a comma and lines:
ex:
Douglas Adams,The Hitchhiker's Guide To The Galaxy
Richard Adams,Watership Down
Mitch Albom,The Five People You Meet in Heaven
and then a file of users and their name on one line and with their rating of the 55 total books on the next(text box isn't big enough to put all ratings on one line):
ex:
Ben
5 0 0 0 0 0 0 1 0 1 -3 5 0 0 0 5 5 0 0 0 0 5 0 0 0 0 0 0 0 0 1 3 0 1 0 -5 0 0 5 5 0 5 5 5 0 5 5 0 0 0 5 5 5 5 -5
Moose
5 5 0 0 0 0 3 0 0 1 0 5 3 0 5 0 3 3 5 0 0 0 0 0 5 0 0 0 0 0 3 5 0 0 0 0 0 5 -3 0 0 0 5 0 0 0 0 0 0 5 5 0 3 0 0
Anyways, the actual project has you use an algorithm provided by the instructor to compare the users interests in books. You input a user and it searches all of the other 87 users by multiplying their rating of the book and adding them all together. For example, for the first book Ben has a rating of 5 and so does Moose, so the score of similarity for Moose is 25, you do this for every book multiplying the inputted users ratings with all of the other users and whomever has the closest similarity score, you output them and refer 5 books to the user. Basically books that the closest similarity score user read and rated high but the inputted user hasn't read. Okay now after that huge wall of text which I'm sorry about. I understand what I have to do but I just cannot figure out a simple way to do this task, I'm not looking for someone to do the project for me or anything like that its just that this project is worth a large portion of our grade and I don't even know where to start. If anybody could even lead me in the right direction as to what type of data structures would be easiest to accomplish this task I would be extremely grateful, again sorry for such a long post but I'm desperate.
-
What have you tried? What do or don't you know how to do? Have you tried starting by breaking the problem down into logical steps and listing them? What have you accomplished in Python before?Karl Knechtel– Karl Knechtel2012年04月12日 03:36:30 +00:00Commented Apr 12, 2012 at 3:36
-
1Okay well to start I broke the two files into lists. I made a list of all the books; a list of all the users, and a list of all the ratings. but from there I basically don't have any idea how to move forward.Mike– Mike2012年04月12日 03:42:01 +00:00Commented Apr 12, 2012 at 3:42
-
Make a dictionary with the key being a user and the value being the corresponding ratings.Karl Knechtel– Karl Knechtel2012年04月12日 04:23:03 +00:00Commented Apr 12, 2012 at 4:23
-
How could I go about doing this, if I have a list of all the users, and a list of all the ratings? I was never very good with dictionaries.Mike– Mike2012年04月12日 05:02:43 +00:00Commented Apr 12, 2012 at 5:02
-
Well, you read names and rating-lists alternately, right? So, read a name; read a rating-list; add the information to the dictionary, with the key being the name and the value being the rating-list. What's the problem?Karl Knechtel– Karl Knechtel2012年04月12日 05:21:08 +00:00Commented Apr 12, 2012 at 5:21
1 Answer 1
The algorithm you speak of sounds an awful lot like the Vector Space model (also this page). Conceive of each user's score as a 55-dimensonal vector (forming a line in 55-dimensional space), and you are comparing the similarities of the user's lines by computing how close their angles are to one another.
Anyway, your application has two basic parts:
- document parsing to build a data structure
- implementing your algorithm using the created structures
Notice what these have in common is that you need to decide on a data structure, so the data structure you use is central to your application.
The simplest thing that could possibly work is two lists. One list is the book data: books = [('author', 'book'), ...]. The other is the score data: scores = [('user', [1,2,3,4,...]), ...]. And then you make sure that scores[n][1][m] is the score corresponding to books[m]. Then you make sure the functions that implement the algorithm accepts these structures.
I would make sure you can get this method working first. Then you can look at higher levels of abstraction.
You can bundle your data with their operations using classes and objects to provide a higher level of abstraction. For example, you can store your book records as named tuples and have another object that holds a set of book records and has methods for book lookup (e.g. findByIndex, findByAuthor, etc), and a similar one for scores. You can supply a scoreset with a bookset so the score can look up book records from a score index. You can create a scoring class that accepts a scoreset and performs operations on it, like returning a list of highest-scoring book records for a given user in the scoreset. And so on.
6 Comments
[int(n) for n in '1,2,3'.split(',')]Explore related questions
See similar questions with these tags.