3
\$\begingroup\$

I am attempting to make a python script that imports a given file.

Then on the import of the file, I am pulling a value from in the filename to determine what function to run.

I will then eventually extend this to pull all files from a folder rather than explicitly pass it a specific file.

The format of the file names is always the following:

blah_Value_blah.extension 

I am wondering if there is a better and more efficient way to pull Value from the above example than what I have given below?

Here is my code:

from os.path import splitext, basename
under = '_'
base = basename(splitext(filename_goes_here)[0])
value = base[base.find(under)+len(under):base.rfind(under)]

I am aware I can merge my two lines of code above into one line but it would be very unsightly.

Examples of the filenames are:

//path/to/file/GAME_team_2017.csv
//path/to/file/GAME_player_2017.csv
//path/to/file/GAME_rules_2017.csv

The sample output of the above files would be:

'team'
'player'
'rules'
Peilonrayz
44.4k7 gold badges80 silver badges157 bronze badges
asked Nov 14, 2017 at 17:08
\$\endgroup\$
0

3 Answers 3

3
\$\begingroup\$

Rather than using str.find, you could better describe yourself using regex. However it's not that much of an improvement.

For example using the regex _(.+)_ on the basename of the file is all you need. If you think a file extension is going to have an _ then you may need the splitext.

This can get:

from os.path import splitext, basename
from re import search
base = basename(splitext(filename_goes_here)[0])
value = search('_(.+)_', base)
if value is not None:
 value = value.group(1)

If you're using Python 3.6, as noted in the comments by 200_success, you could change the last line to:

value = value[0]
answered Nov 14, 2017 at 17:32
\$\endgroup\$
0
2
\$\begingroup\$

Since you said the format of the file names is always blah_Value_blah.extension, I would simply split the name at _ and access the value at index 1. For example, 'GAME_player_2017.csv'.split('_')[1]

If you have a list like this

filenames = ['GAME_team_2017.csv',
 'GAME_player_2017.csv',
 'GAME_rules_2017.csv']

I would split each string and get item at index 1 with a list comprehension.

values = [name.split('_')[1] for name in filenames]

To make the code reusable, I would turn it into a function using listdir() from os module:

from os import listdir
def get_values(path_to_folder):
 filenames = listdir(path_to_folder)
 values = [name.split('_')[1] for name in filenames]
 return values

Now you could call the function with a path as an argument and based on the returned values you could determine what function to run.

For example:

values = get_values(path_to_folder)
for value in values:
 # determine what function to run
answered Nov 15, 2017 at 14:10
\$\endgroup\$
-1
\$\begingroup\$

can adapt from here, just adjust your regex

import os
import re
def key(filename):
 # extract category from filename
 pattern = '(\s|\_)\d{4}.*' # space/underscore & 4 digit date & the rest
 return re.sub(pattern, '', os.path.basename(filename))
answered Nov 14, 2017 at 17:34
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.