Return to Answer

Commonmark migration

edited Jun 10, 2020 at 13:24

Algorithm

You enumerate items within a loop also enumerating items. This means your algorithm is quadratic in the length of each sentence, which is bad. I think you can improve your algorithm to make only a single pass over items by creating a dict which stores unique words as keys, and lists of word indices as the values. Then you can lookup the appropriate indices of the items in your wordlists and perform the distance calculation. Since dict lookups are a constant-time operation, this reduces the complexity to linear in the length of each sentence. Do note that the algorithm is still quadratic in the length of your word lists, so there may be some improvement to be had if your word lists are long.

Correctness and Edge Cases

#Correctness and Edge Cases It'sIt's hard to tell exactly what this code is supposed to do, so I will be making a few assumptions. The handling of edge cases will vary depending on the requirements.

You likely have at least one bug in your code, which is reflected in your example output: you check if x in item, which will evaluate True for the string 'use' in the word 'because'. If this is not the desired behavior, you may want a stricter check like checking equality x == item, or something based on Levenshtein distance for a less strict evaluation.

Another possible bug is that you never include the first word of a sentence in your results. Your check if first_int: will be False for every word whose index is 0.

Code Style

Holy indentation, Batman! Deeply nested code is hard to read and understand, and usually indicates you can organize your code better. Usually the inner loops can be brought into their own function. You can sometimes reduce nesting by consolidating conditional statements. For example, an if statement immediately followed by another if with no else can be brought onto one line:

if first_int: if second_int != first_int:

can be written on one line as

if first_int and second_int != first_int:

Short variable names like w, t, and x aren't very descriptive, and make it hard for others to understand the code. Try to pick more descriptive names.

Make sure you don't include unnecessary logic. For example, your check if dist in relations will always be False, since you only insert tuples, and dist is a float. It can be removed, saving you a line of code and a level of indentation.

Algorithm

#Correctness and Edge Cases It's hard to tell exactly what this code is supposed to do, so I will be making a few assumptions. The handling of edge cases will vary depending on the requirements.

Another possible bug is that you never include the first word of a sentence in your results. Your check if first_int: will be False for every word whose index is 0.

Code Style

if first_int: if second_int != first_int:

can be written on one line as

if first_int and second_int != first_int:

Short variable names like w, t, and x aren't very descriptive, and make it hard for others to understand the code. Try to pick more descriptive names.

Algorithm

Correctness and Edge Cases

It's hard to tell exactly what this code is supposed to do, so I will be making a few assumptions. The handling of edge cases will vary depending on the requirements.

Another possible bug is that you never include the first word of a sentence in your results. Your check if first_int: will be False for every word whose index is 0.

Code Style

if first_int: if second_int != first_int:

can be written on one line as

if first_int and second_int != first_int:

Short variable names like w, t, and x aren't very descriptive, and make it hard for others to understand the code. Try to pick more descriptive names.

Source Link

answered Apr 21, 2015 at 5:46

Aurelius

answered Apr 21, 2015 at 5:46

Aurelius

1.4k
8
21

Algorithm

#Correctness and Edge Cases It's hard to tell exactly what this code is supposed to do, so I will be making a few assumptions. The handling of edge cases will vary depending on the requirements.

Another possible bug is that you never include the first word of a sentence in your results. Your check if first_int: will be False for every word whose index is 0.

Code Style

if first_int: if second_int != first_int:

can be written on one line as

if first_int and second_int != first_int:

Short variable names like w, t, and x aren't very descriptive, and make it hard for others to understand the code. Try to pick more descriptive names.

lang-py