2
\$\begingroup\$

I have written Python code to rearrange some files in a directory, create new directories and delete old ones to form the dataset in the structure that I want it. Previously, I was doing this with repetative code because it got the job done but I want to clean it up and see no reason to repeat code.

test_names_path = 'food-101/meta/test.txt'
train_names_path = 'food-101/meta/train.txt'
train = 'food-101/images/train'
test = 'food-101/images/test'
def assemble_dataset(data_path, folder):
 for line in data_path:
 name_of_folder = line.split('/')[0]
 name_of_file = line.split
 Path('food-101/images/' + name_of_folder + '/' + name_of_file + '.jpg').rename(folder + name_of_folder + '_' + name_of_file + '.jpg')
 if data_path == 'food-101/meta/train.txt':
 with open('food-101/meta/train.txt') as train_file:
 for element in train_file:
 name_of_folder = element.split('/')[0]
 if os.path.exists('food-101/images/' + name_of_folder):
 shutil.rmtree('food-101/images/' + name_of_folder)
# with open('food-101/meta/test.txt') as test_file:
# for line in test_file:
# name_of_folder = line.split('/')[0]
# name_of_file = line.split('/')[1].rstrip()
# Path('food-101/images/' + name_of_folder + '/' + name_of_file + '.jpg').rename('food-101/images/test/' + name_of_folder + '_' + name_of_file + '.jpg')
# # Moves all training images to the Food-101/images directory and renames them
# with open('food-101/meta/train.txt') as train_file:
# for line in train_file:
# name_of_folder = line.split('/')[0]
# name_of_file = line.split('/')[1].rstrip()
# Path('food-101/images/' + name_of_folder + '/' + name_of_file + '.jpg').rename('food-101/images/train/' + name_of_folder + '_' + name_of_file + '.jpg')
# Removes empty directories inside Food-101/images
# with open('food-101/meta/train.txt') as train_file:
# for folder in train_file:
# name_of_folder = folder.split('/')[0]
# if os.path.exists('food-101/images/' + name_of_folder):
# shutil.rmtree('food-101/images/' + name_of_folder)
assemble_dataset(train_names_path, train)
assemble_dataset(test_names_path, test)

The commented out code is the old code and is what I'm trying to shrink. In def assemble_dataset(), the first 2 blocks of code correspond to the first 2 with open() chunks. The following if data_path... statement corresponds to the last with open() chunk.

Below is the original code:

git_repo_tags = ['AB', 'C', 'DEF', 'G', 'HILMNO', 'PR', 'STW', 'X']
# Cloning the github repositories
for repo in git_repo_tags:
 git.Git('.').clone('git://github.com/utility-repos/' + repo)
 #Removing the .git folder from each repo
 shutil.rmtree(repo + '/.git')
# Creating the Food-101/images directory and subdirectory if it doesn't already exist
if not os.path.exists('Food-101/images/train') and not os.path.exists('Food-101/images/test'):
 os.makedirs('Food-101/images/train')
 os.makedirs('Food-101/images/test')
 # Going through the repo X and moving everything a branch up
 for i in os.listdir('X'):
 shutil.move(os.path.join('X', i), 'Food-101')
 # Going through the other repos and moving everything to Food-101/images
 for directory in git_repo_tags:
 for subdirectory in os.listdir(directory):
 shutil.move(os.path.join(directory, subdirectory), 'Food-101/images')
with open('Food-101/meta/test.txt') as test_file:
 for line in test_file:
 name_of_folder = line.split('/')[0]
 name_of_file = line.split('/')[1].rstrip()
 Path('Food-101/images/' + name_of_folder + '/' + name_of_file + '.jpg').rename('Food-101/images/test/' + name_of_folder + '_' + name_of_file + '.jpg')
# Moves all training images to the Food-101/images directory and renames them
with open('Food-101/meta/train.txt') as train_file:
 for line in train_file:
 name_of_folder = line.split('/')[0]
 name_of_file = line.split('/')[1].rstrip()
 Path('Food-101/images/' + name_of_folder + '/' + name_of_file + '.jpg').rename('Food-101/images/train/' + name_of_folder + '_' + name_of_file + '.jpg')
# Removes empty directories inside Food-101/images
with open('Food-101/meta/train.txt') as train_file:
 for folder in train_file:
 name_of_folder = folder.split('/')[0]
 if os.path.exists('Food-101/images/' + name_of_folder):
 shutil.rmtree('Food-101/images/' + name_of_folder)
# Removes empty directories 
for dirs in git_repo_tags:
 shutil.rmtree(dirs)
Reinderien
71k5 gold badges76 silver badges256 bronze badges
asked May 24, 2020 at 2:19
\$\endgroup\$
3
  • 1
    \$\begingroup\$ Please post the code that you want review unmodified. We're not going to unpick which lines are comments and which lines are code before reviewing it. \$\endgroup\$ Commented May 24, 2020 at 2:28
  • \$\begingroup\$ @l0b0 Done... :) \$\endgroup\$ Commented May 24, 2020 at 3:14
  • \$\begingroup\$ You would do well simply to read through pathlib documentation. Also for line in datapath will iterate characters in string as you have not open()ed the file. \$\endgroup\$ Commented May 24, 2020 at 7:21

1 Answer 1

4
\$\begingroup\$

As @David says, extensive replacement of your os and shutil calls with pathlib will get you 90% of the way to a better solution. The one exception is shutil.rmtree which does not have a pathlib equivalent.

I'll go through most of the instances.

Immutable constants

git_repo_tags = ['AB', 'C', 'DEF', 'G', 'HILMNO', 'PR', 'STW', 'X']

should be

GIT_REPO_TAGS = ('AB', 'C', 'DEF', 'G', 'HILMNO', 'PR', 'STW', 'X')

since it's global and you don't intend on changing it.

Exists

if not os.path.exists('Food-101/images/train') and not os.path.exists('Food-101/images/test'):
 os.makedirs('Food-101/images/train')
 os.makedirs('Food-101/images/test')
 ...

can be

images = Path('Food-101/images')
train = images / 'train'
test = images / 'test'
if not (train.exists() or test.exists()):
 train.mkdir()
 test.mkdir()
 ...

Move

for i in os.listdir('X'):
 shutil.move(os.path.join('X', i), 'Food-101')

can be

food = Path('Food-101')
repo = Path('X')
for i in repo.iterdir():
 i.rename(food / i.name)

Path appends

Path('Food-101/images/' + name_of_folder + '/' + name_of_file + '.jpg')

should be

(Path('Food-101/images') / name_of_folder / name_of_file).with_suffix('.jpg')
answered May 24, 2020 at 14:42
\$\endgroup\$
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.