0

What I'm trying to achieve

Given a path, I need to extract the part of the path that precedes a specifically named sub-directory (if it exists) - we'll call this stopper to easily identify it in this question.

It should be noted that the path may begin or end with the stopper

Some sample pairs for input/output:

path = 'some/path/to/my/file.ext'
# ends with stopper
stopper = 'my'
result = 'some/path/to'
# begins with stopper
stopper = 'some'
result = ''
# stopper in middle
stopper = 'to'
result = 'some/path'
# special case - should stop at first stopper location
path = 'path/to/to/my/file.ext'
stopper = 'to'
result = 'path'

What I have so far

I've devised two such methods of obtaining an answer:

Regex

import re
# p = path; s = stopper
def regex_method(p,s):
 regex = r"(?:(?!(?:^|(?<=/))" + s + r").)+(?=/)"
 m = re.match(regex, p)
 if m:
 return m.group()
 return ''

This works but is prone to failure based on the passed stopper value - not ideal for use in production.

OS

import os
# p = path; s = stopper
def os_method(p,s):
 parts = os.path.dirname(p).split('/')
 return '/'.join(parts[:parts.index(s)])

This works and seems more concise than the regex counterpart, but it seems odd to me that I need to split the string, then the list based on a value's index, then join it together. I feel like this could be simplified or improved.


My questions

  1. Is there a more idiomatic way of splitting a path on a specific directory name?
  2. Is there a simple way of achieving this using list comprehensions?
eyllanesc
246k19 gold badges205 silver badges282 bronze badges
asked Sep 13, 2019 at 17:51

3 Answers 3

1

Another seemingly more efficient and a much simpler method is to use itertools.takewhile, which (from the docs) makes an iterator that returns elements from the iterable as long as the predicate is true:

import os
from itertools import takewhile
def it_method(p, s):
 return '/'.join(takewhile(lambda d : d != s, p.split('/')))

Test:

print(it_method('some/path/to/my/file.ext', 'my'))
print(it_method('some/path/to/my/file.ext', 'to'))
print(it_method('some/path/to/my/file.ext', 'some'))
print(it_method('some/path/to/to/my/file.ext', 'to'))

Output:

some/path/to
some/path
some/path

So in this case it keeps generating directory names until stopper is encountered.

The predicate could also be shortened to s.__ne__ instead of using a lambda function:

def it_method(p,s):
 return '/'.join(takewhile(s.__ne__, p.split('/')))
answered Sep 13, 2019 at 17:59
Sign up to request clarification or add additional context in comments.

Comments

1

I would suggest using pathlib:

def split_path(path, stopper):
 parts = path.parts
 idx = next((idx for idx, part in enumerate(parts) if part == stopper))
 result = Path(*parts[:idx])
 return result

Using your example:

path = Path('some/path/to/my/file.ext')

stopper = 'my'
split_path(path, stopper)

Output: PosixPath('some/path/to')

stopper = 'some'
split_path(path, stopper)

Output: PosixPath('.')

stopper = 'to'
split_path(path, stopper)

Output: PosixPath('some/path')

answered Sep 13, 2019 at 18:20

Comments

0

You can use pathlib module and a next on generator like so:

from pathlib import Path
# p = path; s = stopper
def get_path(p,s):
 return next((parent for parent in Path(p).parents if not any(x in str(parent) for x in (f'/{s}/', f'{s}/', f'/{s}')) and str(parent) != s), '')
path = 'some/path/to/my/file.ext'
# ends with stopper
stopper = 'to' 
print(get_path(path, stopper))
# some/path
answered Sep 13, 2019 at 18:11

1 Comment

@ctwheels, Can you try now? Note with stopper- 'some', this returns a dot.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.