1
\$\begingroup\$

I want to check null values in excel files where there are many sub folders. I have wrote a python code to check the null values in excel files in each and every folder and the code was working fine but I need to simplify the code.

import os ,uuidtre
import numpy as np
import pandas as pd
import logging
path =r"C:\Users\Documents\python\Emp"
files = os.listdir(path)
for filename in files:
 pathchild=path+'\\'+filename
 file = os.listdir(pathchild)
 files_xls = [f for f in file if f[-4:]== 'xlsx' ]
 child_folders= [f for f in file if f[-4:]!= 'xlsx']
 for f1 in files_xls:
 filepath=pathchild+'\\'+f1
 df = pd.read_excel(filepath, engine='openpyxl')
 count_nan = df.isnull().sum().sum()
 logging.basicConfig(filename='nanTest.log', level=logging.INFO,format='%(message)s')
 if count_nan ==0:
 none=' No none value in the excel file'
 else:
 none=' There is none values founded in the excel file'
 output='File name: '+ f1, 'File path: '+ filepath, none
 logging.info(output)
 for f2 in child_folders:
 patha=pathchild+'\\'+f2
 file1 = os.listdir(patha)
 files_xls1 = [f for f in file1 if f[-4:]== 'xlsx']
 for f3 in files_xls1:
 filechildpath=patha+'\\'+f3
 df = pd.read_excel(filechildpath, engine='openpyxl')
 count_nan = df.isnull().sum().sum()
 if count_nan ==0:
 none=' No none value in the excel file'
 else:
 none=' There is none values founded in the excel file'
 
 output='File name: '+ f3,'File path: '+ filepath, none
 logging.info(output)
 
pacmaninbw
26.2k13 gold badges47 silver badges113 bronze badges
asked Sep 27, 2021 at 15:07
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

Use print() or write() for normal program output. The logging module is generally for recording information about how the program is running, e.g., for debugging, performance monitoring, etc.

I'd recommend using pathlib in the standard library. Path.rglob() would be useful, it recursively searches directories for files with names that match a pattern, such as "*.xlsx". Using it lets you eliminate the nested loops and a lot of the file/path handling code.

pd.read_excel() accepts a pathlib.Path for the first argument.

The code collects the counts and path for each file, so the information can be sorted. This makes it easier to see which files have nulls, because they are grouped together.

Use f-strings instead of string concatenation.

I don't have excel files on this machine, so this code isn't tested:

import pandas as pd
import pathlib
null_counts = []
start_path = pathlib.Path(r"C:\Users\Documents\python\Emp")
for file_path in start_path.rglob('*.xlsx'):
 df = pd.read_excel(file_path, engine='openpyxl')
 count_nan = df.isnull().sum().sum()
 null_counts.append((count_nan, file_path))
null_counts.sort(reverse=True)
with open('nanTest.log', 'wt') as output:
 for count, filepath in null_counts:
 print(f"{count:5} {filepath}", file=output)
answered Sep 27, 2021 at 20:29
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.