Longest absolute path in file system

Question 1

Problem

Suppose we abstract our file system by a string in the following manners:

The string dir\n\tsubdir1\n\tsubdir2\n\t\tfile.ext represents:
dir
 subdir1
 subdir2
 file.ext
a directory dir contains an empty sub-directories subdir1 and a sub-directory subdir2 containing a file file.ext.

The string dir\n\tsubdir1\n\t\tfile1.ext\n\t\tsubsubdir1\n\tsubdir2\n\t\tsubsubdir2\n\t\t\tfile2.ext represents:
dir
 subdir1
 file1.ext
 subsubdir1
 subdir2
 subsubdir2
 file2.ext
a directory dir contains two sub-directories subdir1 and subdir2. subdir1 contains a file file1.extand an empty second-level sub-directory subsubdir1. subdir2 contains a second-level sub-directory subsubdir2 containing a file file2.ext.

We are interested in finding the longest (number of characters) absolute path to a file within our file system. That is, for example, in the second example above, the longest absolute path is dir/subdr2/subsubdir2/file.ext, and its length is 30.

Given a string representing the file system in the above format, return the length of the longest absolute path to file in the abstracted file system. Simply put, that is the length of a valid path that ends with a file, and a file contains . and an extension.

Time complexity required: O(n) where n is the size of the input string.

Notice that a/aa/aaa/file1.txt is not the path you want, if there is aaaaaaaaaaaaaaaaaaaaa/sth.png.

Any advice on code bugs, performance improvements in terms of algorithm time complexity, or code style advice is highly appreciated.

def find_longest_path(source):
 items = source.split('\n')
 prefix = []
 prefix_length = []
 max_length_so_far = -1
 max_path_so_far = ''
 for path in items:
 i = 0
 while path[i] == '\t':
 i += 1
 if len(prefix) > i:
 prefix = prefix[:i]
 prefix_length = prefix_length[:i]
 prefix.append(path[i:])
 prefix_length.append(len(path[i:]))
 if sum(prefix_length) > max_length_so_far:
 max_length_so_far = sum(prefix_length)
 max_path_so_far = '/'.join(prefix)
 return max_path_so_far
if __name__ == "__main__":
 path_name = "dir\n\tsubdir1\n\t\tfile1.ext\n\t\tsubsubdir1\n\tsubdir2\n\t\tsubsubdir2\n\t\t\tfile2.ext"
 print find_longest_path(path_name)
 path_name = "dir\n\tsubdir1\n\tsubdir2\n\t\tfile.ext"
 print find_longest_path(path_name)

Question 2

The problem statement indicates that path separators should be taken into account when computing the length. You might add a test case to test for this case either by crafting a particular input or by returning the longest path along with longest count that you can verify.

Some of the code could be made more concise.

filename = path.split('\t')[-1] # last of list, assumes no tabs in filenames
depth = len(path) - len(filename) # yields number of tabs

Rather than repeatedly computing the sum of the paths, you could create a cumulative_sum.

cumulative_len = cumulative_len[:depth]
if depth > 0:
 # last element contains sum of entire path to depth
 # +1 to account for the preceding path separator
 cumulative_len.append(cumulative_len[-1] + len(filename) + 1)
else:
 # the problem statement does not include an initial path separator
 cumulative_len.append(len(filename))

Rather than joining the path for intermediate maximums that will be discarded, just store the list of path elements and join on final delivery.

You might consider asserting on the desired results so you get clearer feedback of failure.

def check(input, expected):
 longest_path = find_longest_path(input)
 assert input == expected, "Expected {} but got {}".format(expected, longest_path)
 print(longest_path)
if __name__ == "__main__":
 check("dir\n\tsubdir1\n\t\tfile1.ext\n\t\tsubsubdir1\n\tsubdir2\n\t\tsubsubdir2\n\t\t\tfile2.ext", 
 "dir/subdir2/subsubdir2/file2.ext")
 check("dir\n\tsubdir1\n\tsubdir2\n\t\tfile.ext",
 "dir/subdir2/file.ext")

Question 3

Hi user650881, thanks for the reply and vote up. A bit confused by this line --filename = path.split('\t')[-1] , this line only returns one string, do you assume in the representation of path, there is only one filename? If I mis-understand your points, please feel free to correct me.

Question 4

BTW, if you could show your code, it is more clear. :)

Question 5

@LinMa my understanding of the problem statement was that there was one leaf filename or directory name per line. In any event it takes the last token after the split. If needed it could subsequently be split again on "/" or whatever as long as the rest of the algorithm was adjusted accordingly.

user650881 user650881 4842 silver badges5 bronze badges · Answer 1 · 2017-01-22 15:30:07Z

The problem statement indicates that path separators should be taken into account when computing the length. You might add a test case to test for this case either by crafting a particular input or by returning the longest path along with longest count that you can verify.

Some of the code could be made more concise.

filename = path.split('\t')[-1] # last of list, assumes no tabs in filenames
depth = len(path) - len(filename) # yields number of tabs

Rather than repeatedly computing the sum of the paths, you could create a cumulative_sum.

cumulative_len = cumulative_len[:depth]
if depth > 0:
 # last element contains sum of entire path to depth
 # +1 to account for the preceding path separator
 cumulative_len.append(cumulative_len[-1] + len(filename) + 1)
else:
 # the problem statement does not include an initial path separator
 cumulative_len.append(len(filename))

Rather than joining the path for intermediate maximums that will be discarded, just store the list of path elements and join on final delivery.

You might consider asserting on the desired results so you get clearer feedback of failure.

def check(input, expected):
 longest_path = find_longest_path(input)
 assert input == expected, "Expected {} but got {}".format(expected, longest_path)
 print(longest_path)
if __name__ == "__main__":
 check("dir\n\tsubdir1\n\t\tfile1.ext\n\t\tsubsubdir1\n\tsubdir2\n\t\tsubsubdir2\n\t\t\tfile2.ext", 
 "dir/subdir2/subsubdir2/file2.ext")
 check("dir\n\tsubdir1\n\tsubdir2\n\t\tfile.ext",
 "dir/subdir2/file.ext")

Hi user650881, thanks for the reply and vote up. A bit confused by this line --filename = path.split('\t')[-1] , this line only returns one string, do you assume in the representation of path, there is only one filename? If I mis-understand your points, please feel free to correct me.
@LinMa my understanding of the problem statement was that there was one leaf filename or directory name per line. In any event it takes the last token after the split. If needed it could subsequently be split again on "/" or whatever as long as the rest of the algorithm was adjusted accordingly.

Stack Exchange Network

Longest absolute path in file system

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Longest absolute path in file system

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions