I wrote a code to remove the background of 8000 images but that whole code is taking approximately 8 hours to give the result.
How to improve its time complexity? As I have to work on a larger dataset in the future.
from rembg import remove
import cv2
import glob
for img in glob.glob('../images/*.jpg'):
a = img.split('../images/')
a1 = a[1].split('.jpg')
try:
cv_img = cv2.imread(img)
output = remove(cv_img)
except:
continue
cv2.imwrite('../output image/' + str(a1[0]) + '.png', output)
1 Answer 1
Performance
This is a simple loop, and I would expect that the majority of time is spent in rembg.remove()
- but you should profile to demonstrate that.
If my guess is correct, and if that method is single-threaded, the simplest approach is to divide the work across more cores, to process images in parallel.
General code review
PEP-8 recommends that indentation should be 4 spaces per level, rather than variously 2 and 3.
Some of the names could be better - img
is actually the input filename; it's not an image until we read it. a
and a1
are utterly meaningless.
Instead of using string.split()
to compose the output filename, we can use os.path
or pathlib
.
I think that except: continue
isn't very useful error handling. You probably want to have some messages on the error stream indicating which files weren't converted, and possibly also write a log file.
I would probably move the cv2.imwrite()
within the try
block too - if that fails, we want to know about it.
We can get a cleaner implementation, and use this as the basis for parallelising:
import cv2
import rembg
import sys
from pathlib import Path
in_dir = Path('../images')
out_dir = Path('../output image')
for path in in_dir.glob('*.jpg'):
try:
image = cv2.imread(str(path))
if image is None or not image.data:
raise cv2.error("read failed")
output = rembg.remove(image)
path = out_dir / path.with_suffix('.png').name
cv2.imwrite(path, output)
except Exception as e:
print(f"{path}: {e}", file=sys.stderr)
-
2\$\begingroup\$ For super easy parallelisation I’d recommend using something like p_map or p_umap from the p_tqdm package, which comes with a progress bar and ETA. \$\endgroup\$Seb– Seb2022年09月14日 09:05:41 +00:00Commented Sep 14, 2022 at 9:05
Explore related questions
See similar questions with these tags.