Python code to check the null values in excel files

Question 1

I want to check null values in excel files where there are many sub folders. I have wrote a python code to check the null values in excel files in each and every folder and the code was working fine but I need to simplify the code.

import os ,uuidtre
import numpy as np
import pandas as pd
import logging
path =r"C:\Users\Documents\python\Emp"
files = os.listdir(path)
for filename in files:
 pathchild=path+'\\'+filename
 file = os.listdir(pathchild)
 files_xls = [f for f in file if f[-4:]== 'xlsx' ]
 child_folders= [f for f in file if f[-4:]!= 'xlsx']
 for f1 in files_xls:
 filepath=pathchild+'\\'+f1
 df = pd.read_excel(filepath, engine='openpyxl')
 count_nan = df.isnull().sum().sum()
 logging.basicConfig(filename='nanTest.log', level=logging.INFO,format='%(message)s')
 if count_nan ==0:
 none=' No none value in the excel file'
 else:
 none=' There is none values founded in the excel file'
 output='File name: '+ f1, 'File path: '+ filepath, none
 logging.info(output)
 for f2 in child_folders:
 patha=pathchild+'\\'+f2
 file1 = os.listdir(patha)
 files_xls1 = [f for f in file1 if f[-4:]== 'xlsx']
 for f3 in files_xls1:
 filechildpath=patha+'\\'+f3
 df = pd.read_excel(filechildpath, engine='openpyxl')
 count_nan = df.isnull().sum().sum()
 if count_nan ==0:
 none=' No none value in the excel file'
 else:
 none=' There is none values founded in the excel file'
 
 output='File name: '+ f3,'File path: '+ filepath, none
 logging.info(output)

Question 2

Use print() or write() for normal program output. The logging module is generally for recording information about how the program is running, e.g., for debugging, performance monitoring, etc.

I'd recommend using pathlib in the standard library. Path.rglob() would be useful, it recursively searches directories for files with names that match a pattern, such as "*.xlsx". Using it lets you eliminate the nested loops and a lot of the file/path handling code.

pd.read_excel() accepts a pathlib.Path for the first argument.

The code collects the counts and path for each file, so the information can be sorted. This makes it easier to see which files have nulls, because they are grouped together.

Use f-strings instead of string concatenation.

I don't have excel files on this machine, so this code isn't tested:

import pandas as pd
import pathlib
null_counts = []
start_path = pathlib.Path(r"C:\Users\Documents\python\Emp")
for file_path in start_path.rglob('*.xlsx'):
 df = pd.read_excel(file_path, engine='openpyxl')
 count_nan = df.isnull().sum().sum()
 null_counts.append((count_nan, file_path))
null_counts.sort(reverse=True)
with open('nanTest.log', 'wt') as output:
 for count, filepath in null_counts:
 print(f"{count:5} {filepath}", file=output)

RootTwo RootTwo 10.6k1 gold badge14 silver badges30 bronze badges · Answer 1 · 2021-09-27 20:29:54Z

Use print() or write() for normal program output. The logging module is generally for recording information about how the program is running, e.g., for debugging, performance monitoring, etc.

I'd recommend using pathlib in the standard library. Path.rglob() would be useful, it recursively searches directories for files with names that match a pattern, such as "*.xlsx". Using it lets you eliminate the nested loops and a lot of the file/path handling code.

pd.read_excel() accepts a pathlib.Path for the first argument.

The code collects the counts and path for each file, so the information can be sorted. This makes it easier to see which files have nulls, because they are grouped together.

Use f-strings instead of string concatenation.

I don't have excel files on this machine, so this code isn't tested:

import pandas as pd
import pathlib
null_counts = []
start_path = pathlib.Path(r"C:\Users\Documents\python\Emp")
for file_path in start_path.rglob('*.xlsx'):
 df = pd.read_excel(file_path, engine='openpyxl')
 count_nan = df.isnull().sum().sum()
 null_counts.append((count_nan, file_path))
null_counts.sort(reverse=True)
with open('nanTest.log', 'wt') as output:
 for count, filepath in null_counts:
 print(f"{count:5} {filepath}", file=output)

Stack Exchange Network

Python code to check the null values in excel files

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Python code to check the null values in excel files

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions