Problem
Suppose we abstract our file system by a string in the following manners:
The string
dir\n\tsubdir1\n\tsubdir2\n\t\tfile.ext
represents:dir subdir1 subdir2 file.ext
a directory
dir
contains an empty sub-directoriessubdir1
and a sub-directorysubdir2
containing a filefile.ext
.The string
dir\n\tsubdir1\n\t\tfile1.ext\n\t\tsubsubdir1\n\tsubdir2\n\t\tsubsubdir2\n\t\t\tfile2.ext
represents:dir subdir1 file1.ext subsubdir1 subdir2 subsubdir2 file2.ext
a directory
dir
contains two sub-directoriessubdir1
andsubdir2
.subdir1
contains a filefile1.extand
an empty second-level sub-directorysubsubdir1
.subdir2
contains a second-level sub-directorysubsubdir2
containing a filefile2.ext
.We are interested in finding the longest (number of characters) absolute path to a file within our file system. That is, for example, in the second example above, the longest absolute path is
dir/subdr2/subsubdir2/file.ext
, and its length is 30.Given a string representing the file system in the above format, return the length of the longest absolute path to file in the abstracted file system. Simply put, that is the length of a valid path that ends with a file, and a file contains . and an extension.
Time complexity required: O(n) where n is the size of the input string.
Notice that
a/aa/aaa/file1.txt
is not the path you want, if there isaaaaaaaaaaaaaaaaaaaaa/sth.png
.
Any advice on code bugs, performance improvements in terms of algorithm time complexity, or code style advice is highly appreciated.
def find_longest_path(source):
items = source.split('\n')
prefix = []
prefix_length = []
max_length_so_far = -1
max_path_so_far = ''
for path in items:
i = 0
while path[i] == '\t':
i += 1
if len(prefix) > i:
prefix = prefix[:i]
prefix_length = prefix_length[:i]
prefix.append(path[i:])
prefix_length.append(len(path[i:]))
if sum(prefix_length) > max_length_so_far:
max_length_so_far = sum(prefix_length)
max_path_so_far = '/'.join(prefix)
return max_path_so_far
if __name__ == "__main__":
path_name = "dir\n\tsubdir1\n\t\tfile1.ext\n\t\tsubsubdir1\n\tsubdir2\n\t\tsubsubdir2\n\t\t\tfile2.ext"
print find_longest_path(path_name)
path_name = "dir\n\tsubdir1\n\tsubdir2\n\t\tfile.ext"
print find_longest_path(path_name)
1 Answer 1
The problem statement indicates that path separators should be taken into account when computing the length. You might add a test case to test for this case either by crafting a particular input or by returning the longest path along with longest count that you can verify.
Some of the code could be made more concise.
filename = path.split('\t')[-1] # last of list, assumes no tabs in filenames
depth = len(path) - len(filename) # yields number of tabs
Rather than repeatedly computing the sum of the paths, you could create a cumulative_sum.
cumulative_len = cumulative_len[:depth]
if depth > 0:
# last element contains sum of entire path to depth
# +1 to account for the preceding path separator
cumulative_len.append(cumulative_len[-1] + len(filename) + 1)
else:
# the problem statement does not include an initial path separator
cumulative_len.append(len(filename))
Rather than joining the path for intermediate maximums that will be discarded, just store the list of path elements and join on final delivery.
You might consider asserting on the desired results so you get clearer feedback of failure.
def check(input, expected):
longest_path = find_longest_path(input)
assert input == expected, "Expected {} but got {}".format(expected, longest_path)
print(longest_path)
if __name__ == "__main__":
check("dir\n\tsubdir1\n\t\tfile1.ext\n\t\tsubsubdir1\n\tsubdir2\n\t\tsubsubdir2\n\t\t\tfile2.ext",
"dir/subdir2/subsubdir2/file2.ext")
check("dir\n\tsubdir1\n\tsubdir2\n\t\tfile.ext",
"dir/subdir2/file.ext")
-
\$\begingroup\$ Hi user650881, thanks for the reply and vote up. A bit confused by this line --
filename = path.split('\t')[-1]
, this line only returns one string, do you assume in the representation ofpath
, there is only one filename? If I mis-understand your points, please feel free to correct me. \$\endgroup\$Lin Ma– Lin Ma2017年02月02日 05:37:08 +00:00Commented Feb 2, 2017 at 5:37 -
\$\begingroup\$ BTW, if you could show your code, it is more clear. :) \$\endgroup\$Lin Ma– Lin Ma2017年02月02日 05:49:16 +00:00Commented Feb 2, 2017 at 5:49
-
\$\begingroup\$ @LinMa my understanding of the problem statement was that there was one leaf filename or directory name per line. In any event it takes the last token after the split. If needed it could subsequently be split again on "/" or whatever as long as the rest of the algorithm was adjusted accordingly. \$\endgroup\$user650881– user6508812017年02月02日 05:50:35 +00:00Commented Feb 2, 2017 at 5:50