Algorithm
You enumerate items
within a loop also enumerating items
. This means your algorithm is quadratic in the length of each sentence, which is bad. I think you can improve your algorithm to make only a single pass over items
by creating a dict
which stores unique words as keys, and lists of word indices as the values. Then you can lookup the appropriate indices of the items in your wordlists and perform the distance calculation. Since dict
lookups are a constant-time operation, this reduces the complexity to linear in the length of each sentence. Do note that the algorithm is still quadratic in the length of your word lists, so there may be some improvement to be had if your word lists are long.
Correctness and Edge Cases
#Correctness and Edge Cases It'sIt's hard to tell exactly what this code is supposed to do, so I will be making a few assumptions. The handling of edge cases will vary depending on the requirements.
You likely have at least one bug in your code, which is reflected in your example output: you check if x in item
, which will evaluate True
for the string 'use'
in the word 'because'
. If this is not the desired behavior, you may want a stricter check like checking equality x == item
, or something based on Levenshtein distance for a less strict evaluation.
Another possible bug is that you never include the first word of a sentence in your results. Your check if first_int:
will be False
for every word whose index is 0
.
Code Style
Holy indentation, Batman! Deeply nested code is hard to read and understand, and usually indicates you can organize your code better. Usually the inner loops can be brought into their own function. You can sometimes reduce nesting by consolidating conditional statements. For example, an if
statement immediately followed by another if
with no else
can be brought onto one line:
if first_int: if second_int != first_int:
can be written on one line as
if first_int and second_int != first_int:
Short variable names like w
, t
, and x
aren't very descriptive, and make it hard for others to understand the code. Try to pick more descriptive names.
Make sure you don't include unnecessary logic. For example, your check if dist in relations
will always be False
, since you only insert tuples, and dist
is a float. It can be removed, saving you a line of code and a level of indentation.
Algorithm
You enumerate items
within a loop also enumerating items
. This means your algorithm is quadratic in the length of each sentence, which is bad. I think you can improve your algorithm to make only a single pass over items
by creating a dict
which stores unique words as keys, and lists of word indices as the values. Then you can lookup the appropriate indices of the items in your wordlists and perform the distance calculation. Since dict
lookups are a constant-time operation, this reduces the complexity to linear in the length of each sentence. Do note that the algorithm is still quadratic in the length of your word lists, so there may be some improvement to be had if your word lists are long.
#Correctness and Edge Cases It's hard to tell exactly what this code is supposed to do, so I will be making a few assumptions. The handling of edge cases will vary depending on the requirements.
You likely have at least one bug in your code, which is reflected in your example output: you check if x in item
, which will evaluate True
for the string 'use'
in the word 'because'
. If this is not the desired behavior, you may want a stricter check like checking equality x == item
, or something based on Levenshtein distance for a less strict evaluation.
Another possible bug is that you never include the first word of a sentence in your results. Your check if first_int:
will be False
for every word whose index is 0
.
Code Style
Holy indentation, Batman! Deeply nested code is hard to read and understand, and usually indicates you can organize your code better. Usually the inner loops can be brought into their own function. You can sometimes reduce nesting by consolidating conditional statements. For example, an if
statement immediately followed by another if
with no else
can be brought onto one line:
if first_int: if second_int != first_int:
can be written on one line as
if first_int and second_int != first_int:
Short variable names like w
, t
, and x
aren't very descriptive, and make it hard for others to understand the code. Try to pick more descriptive names.
Make sure you don't include unnecessary logic. For example, your check if dist in relations
will always be False
, since you only insert tuples, and dist
is a float. It can be removed, saving you a line of code and a level of indentation.
Algorithm
You enumerate items
within a loop also enumerating items
. This means your algorithm is quadratic in the length of each sentence, which is bad. I think you can improve your algorithm to make only a single pass over items
by creating a dict
which stores unique words as keys, and lists of word indices as the values. Then you can lookup the appropriate indices of the items in your wordlists and perform the distance calculation. Since dict
lookups are a constant-time operation, this reduces the complexity to linear in the length of each sentence. Do note that the algorithm is still quadratic in the length of your word lists, so there may be some improvement to be had if your word lists are long.
Correctness and Edge Cases
It's hard to tell exactly what this code is supposed to do, so I will be making a few assumptions. The handling of edge cases will vary depending on the requirements.
You likely have at least one bug in your code, which is reflected in your example output: you check if x in item
, which will evaluate True
for the string 'use'
in the word 'because'
. If this is not the desired behavior, you may want a stricter check like checking equality x == item
, or something based on Levenshtein distance for a less strict evaluation.
Another possible bug is that you never include the first word of a sentence in your results. Your check if first_int:
will be False
for every word whose index is 0
.
Code Style
Holy indentation, Batman! Deeply nested code is hard to read and understand, and usually indicates you can organize your code better. Usually the inner loops can be brought into their own function. You can sometimes reduce nesting by consolidating conditional statements. For example, an if
statement immediately followed by another if
with no else
can be brought onto one line:
if first_int: if second_int != first_int:
can be written on one line as
if first_int and second_int != first_int:
Short variable names like w
, t
, and x
aren't very descriptive, and make it hard for others to understand the code. Try to pick more descriptive names.
Make sure you don't include unnecessary logic. For example, your check if dist in relations
will always be False
, since you only insert tuples, and dist
is a float. It can be removed, saving you a line of code and a level of indentation.
Algorithm
You enumerate items
within a loop also enumerating items
. This means your algorithm is quadratic in the length of each sentence, which is bad. I think you can improve your algorithm to make only a single pass over items
by creating a dict
which stores unique words as keys, and lists of word indices as the values. Then you can lookup the appropriate indices of the items in your wordlists and perform the distance calculation. Since dict
lookups are a constant-time operation, this reduces the complexity to linear in the length of each sentence. Do note that the algorithm is still quadratic in the length of your word lists, so there may be some improvement to be had if your word lists are long.
#Correctness and Edge Cases It's hard to tell exactly what this code is supposed to do, so I will be making a few assumptions. The handling of edge cases will vary depending on the requirements.
You likely have at least one bug in your code, which is reflected in your example output: you check if x in item
, which will evaluate True
for the string 'use'
in the word 'because'
. If this is not the desired behavior, you may want a stricter check like checking equality x == item
, or something based on Levenshtein distance for a less strict evaluation.
Another possible bug is that you never include the first word of a sentence in your results. Your check if first_int:
will be False
for every word whose index is 0
.
Code Style
Holy indentation, Batman! Deeply nested code is hard to read and understand, and usually indicates you can organize your code better. Usually the inner loops can be brought into their own function. You can sometimes reduce nesting by consolidating conditional statements. For example, an if
statement immediately followed by another if
with no else
can be brought onto one line:
if first_int: if second_int != first_int:
can be written on one line as
if first_int and second_int != first_int:
Short variable names like w
, t
, and x
aren't very descriptive, and make it hard for others to understand the code. Try to pick more descriptive names.
Make sure you don't include unnecessary logic. For example, your check if dist in relations
will always be False
, since you only insert tuples, and dist
is a float. It can be removed, saving you a line of code and a level of indentation.