- "Six Degrees of Kevin Bacon or 'Bacon's Law' is a parlour game based on the 'six degrees of separation' concept, which posits that any two people on Earth are six or fewer acquaintance links apart. Movie buffs challenge each other to find the shortest path between an arbitrary actor and prolific actor Kevin Bacon. It rests on the assumption that anyone involved in the Hollywood film industry can be linked through their film roles to Bacon within six steps." [source: wikipedia]
- In this assignment you will put this assertion to the test by building a large graph of movies and stars and computing the degrees of separation between Kevin Bacon and other performers
- To start, build a
Vertex class corresponding to each vertex of our graph. The constructor for this class should take a single string (performer) and initialize the three instance variables: (1) the vertex name, name, that will be initialized to the preformer's name that is passed to the constructor; (2) an empty list, adjacent, that will eventually contain other vertices with which the named performer has co-starred; and (3) a second emtpy list, edge, that will eventaully contain the movie in which the performer has co-starred with the corresponding perfomer in the adjacent list. It is important that you use exactly these instance variable names: name, adjacent, edge as you will eventually use some code that I provide that assumes this naming convention.
- For example, because Brad Pitt starred in Fight Club, Full Frontal, and Meet Joe Black (among others) with Ed Norton, Julia Roberts, and Anthony Hopkins (among others), the partial node for Brad Pitt will be:
- name = "Brad Pitt"
- adjacent = ["Ed Norton", "Julia Roberts", "Anthony Hopkins"]
- edge = ["Fight Club", "Full Frontal", "Meet Joe Black"]
- Your vertex class should also define a
__str__ function that returns the name of the vertex being printed followed by "|", followed by the name of the first movie in the edge list, followed by "/" followed by the corresponing co-star in adjacent. followed by "|". Repeat for each element in edge and adjacent. This function will be useful in verifying that you have built your graph properly.
- For example, in the above example, printing the vertex "Brad Pitt" produces the string "Brad Pitt|Fight Club/Ed Norton|Full Frontal/Julia Roberts|Meet Joe Black/Anthony Hopkins".
- Using your vertex class, build a graph from the data in movies.txt. Each line of this file starts with a movie name followed by a "/"-delimited list of those appearing in the movie (notice that the names are listed as
last, first).
- For example, a partial listing of the movie Fight Club is:
Fight Club (1999)/Termo, Leonard/Arquette, Richmond/Rosario, Marcio (I)/Pontino, J.T./Peddy, ...
- Due to a file encoding issue, open the file movies.txt with the following command:
myfile = open(filename,'r',encoding="ISO-8859-1")
- Parse this file to create a graph of all of the movies and performers. We will represent the graph using a dictionary indexed on the performer's name (a string). As you parse the file of
movies.txt, each time you extract a performer's name, determine if it is already a vertex in the graph. If it is not in the graph, create a new vertex for this performer, and populate the adjacenct and edge with all of the other performers in the movie (except, of course, for the performer that is represented by this vertex). If the performer is in the graph, update the adjacenct and edge lists with all of the other performers in the movie.
- Once you have built your graph, use the breadth-first search to determine the short path between two performers as follows:
start = "Hanks, Tom"
goal = "Bacon, Kevin"
if( (start not in graph) or (goal not in graph) ):
print( "no path found")
else:
path = bfs(graph[start],graph[goal])
path = path[::-1] # reverse (start --> goal)
print(path)
- For testing and debugging purposes, I suggest that you create a simple test text file with a handfull of nodes and edges. This text file must follow the same format as
movies.txt.