Find a value between two of the same characters in a string using Python

Question 1

I am attempting to make a python script that imports a given file.

Then on the import of the file, I am pulling a value from in the filename to determine what function to run.

I will then eventually extend this to pull all files from a folder rather than explicitly pass it a specific file.

The format of the file names is always the following:

blah_Value_blah.extension

I am wondering if there is a better and more efficient way to pull Value from the above example than what I have given below?

Here is my code:

from os.path import splitext, basename
under = '_'
base = basename(splitext(filename_goes_here)[0])
value = base[base.find(under)+len(under):base.rfind(under)]

I am aware I can merge my two lines of code above into one line but it would be very unsightly.

Examples of the filenames are:

//path/to/file/GAME_team_2017.csv
//path/to/file/GAME_player_2017.csv
//path/to/file/GAME_rules_2017.csv

The sample output of the above files would be:

'team'
'player'
'rules'

Question 2

Rather than using str.find, you could better describe yourself using regex. However it's not that much of an improvement.

For example using the regex _(.+)_ on the basename of the file is all you need. If you think a file extension is going to have an _ then you may need the splitext.

This can get:

from os.path import splitext, basename
from re import search
base = basename(splitext(filename_goes_here)[0])
value = search('_(.+)_', base)
if value is not None:
 value = value.group(1)

If you're using Python 3.6, as noted in the comments by 200_success, you could change the last line to:

value = value[0]

Question 3

Since you said the format of the file names is always blah_Value_blah.extension, I would simply split the name at _ and access the value at index 1. For example, 'GAME_player_2017.csv'.split('_')[1]

If you have a list like this

filenames = ['GAME_team_2017.csv',
 'GAME_player_2017.csv',
 'GAME_rules_2017.csv']

I would split each string and get item at index 1 with a list comprehension.

values = [name.split('_')[1] for name in filenames]

To make the code reusable, I would turn it into a function using listdir() from os module:

from os import listdir
def get_values(path_to_folder):
 filenames = listdir(path_to_folder)
 values = [name.split('_')[1] for name in filenames]
 return values

Now you could call the function with a path as an argument and based on the returned values you could determine what function to run.

For example:

values = get_values(path_to_folder)
for value in values:
 # determine what function to run

Question 4

can adapt from here, just adjust your regex

import os
import re
def key(filename):
 # extract category from filename
 pattern = '(\s|\_)\d{4}.*' # space/underscore & 4 digit date & the rest
 return re.sub(pattern, '', os.path.basename(filename))

Peilonrayz ♦Peilonrayz 44.4k7 gold badges80 silver badges157 bronze badges · Accepted Answer · 2017-11-14 17:32:44Z

Rather than using str.find, you could better describe yourself using regex. However it's not that much of an improvement.

For example using the regex _(.+)_ on the basename of the file is all you need. If you think a file extension is going to have an _ then you may need the splitext.

This can get:

from os.path import splitext, basename
from re import search
base = basename(splitext(filename_goes_here)[0])
value = search('_(.+)_', base)
if value is not None:
 value = value.group(1)

If you're using Python 3.6, as noted in the comments by 200_success, you could change the last line to:

value = value[0]

Stack Exchange Network

Find a value between two of the same characters in a string using Python

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Find a value between two of the same characters in a string using Python

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions