I am attempting to make a python script that imports a given file.
Then on the import of the file, I am pulling a value from in the filename to determine what function to run.
I will then eventually extend this to pull all files from a folder rather than explicitly pass it a specific file.
The format of the file names is always the following:
blah_Value_blah.extension
I am wondering if there is a better and more efficient way to pull Value from the above example than what I have given below?
Here is my code:
from os.path import splitext, basename
under = '_'
base = basename(splitext(filename_goes_here)[0])
value = base[base.find(under)+len(under):base.rfind(under)]
I am aware I can merge my two lines of code above into one line but it would be very unsightly.
Examples of the filenames are:
//path/to/file/GAME_team_2017.csv
//path/to/file/GAME_player_2017.csv
//path/to/file/GAME_rules_2017.csv
The sample output of the above files would be:
'team'
'player'
'rules'
3 Answers 3
Rather than using str.find
, you could better describe yourself using regex. However it's not that much of an improvement.
For example using the regex _(.+)_
on the basename
of the file is all you need. If you think a file extension is going to have an _
then you may need the splitext
.
This can get:
from os.path import splitext, basename
from re import search
base = basename(splitext(filename_goes_here)[0])
value = search('_(.+)_', base)
if value is not None:
value = value.group(1)
If you're using Python 3.6, as noted in the comments by 200_success, you could change the last line to:
value = value[0]
Since you said the format of the file names is always blah_Value_blah.extension
, I would simply split the name at _
and access the value at index 1
. For example, 'GAME_player_2017.csv'.split('_')[1]
If you have a list like this
filenames = ['GAME_team_2017.csv',
'GAME_player_2017.csv',
'GAME_rules_2017.csv']
I would split each string and get item at index 1
with a list comprehension
.
values = [name.split('_')[1] for name in filenames]
To make the code reusable, I would turn it into a function using listdir()
from os
module:
from os import listdir
def get_values(path_to_folder):
filenames = listdir(path_to_folder)
values = [name.split('_')[1] for name in filenames]
return values
Now you could call the function with a path as an argument and based on the returned values you could determine what function to run.
For example:
values = get_values(path_to_folder)
for value in values:
# determine what function to run
can adapt from here, just adjust your regex
import os
import re
def key(filename):
# extract category from filename
pattern = '(\s|\_)\d{4}.*' # space/underscore & 4 digit date & the rest
return re.sub(pattern, '', os.path.basename(filename))