2

I have a big csv file that I want to edit. The editing here means deleting the columns that have only one value. So far, I wrote this (since I'm a newbie in Python I'm stuck and not sure if that is the right solution for the problem):

import csv
import collections
import numpy as np 
number_of_rows = 2432 
interseting_cols = [] 
col_values = collections.defaultdict(list)
col_values_named = collections.defaultdict(list)
new_row = collections.defaultdict(list)
inputFile = open('input.csv', 'r',newline='');
outputFile= open('output.csv','w')
reader = csv.reader(inputFile)
writer = csv.writer(outputFile)
#skip field names
next(reader)
for row in reader:
 for col, value in enumerate(row):
 col_values[col].append(value)
 #each column is now saved col_values ( without the headers )
for i in range(len(col_values)):
 if len(set(col_values[i][:(number_of_rows-1)])) != 1:
 interseting_cols.append(i)# saved the index of the columns with valid values 
inputFile.seek(0)
# reading the file again now with headers
for row in reader:
 for col, value in enumerate(row):
 col_values_named[col].append(value)# saving the columns now with header 
# generating a new CSV file, only with interessting columns :
for i in range(number_of_rows):
 print("i value ",i)
 for j in range(len(interseting_cols)): # I'm not sure about this PART !!!!
 new_row.append(col_values_named[interseting_cols[j]])
 writer.writerow(new_row)

Any idea how to do the last loop? Or is there is a better way to solve this?

UPDATE say the file looks like

---------------------------------------------------
 |A|B |C |D |F |G|H |I|J |K | 
--------------------------------------------------- 
1 |1|NULL|444 |201|0.01|A|NULL|4|9.5|NULL| 
--------------------------------------------------- 
2 |2|NULL|NULL|201|0 |A|NULL|4|9.5|NULL|
--------------------------------------------------- 
3 |4|NULL|444 |201|0 |A|NULL|4|9.5|NULL|
--------------------------------------------------- 
4 |1|NULL|444 |201|0 |A|NULL|4|9.5|NULL|

in this case the result should only include only three columns A,C and F

asked Mar 22, 2017 at 12:48
2
  • 1
    Could you edit the question to include a small sample from your CSV file, and also how you want it to appear afterwards. Commented Mar 22, 2017 at 12:54
  • 1
    As a tip, there exists a libary called Pandas, which is extremely useful in reading, manipulating und writing data Commented Mar 22, 2017 at 13:03

2 Answers 2

2

Using pandas library, you can reduce all your extra work by its own inbuilt functions. Here is a small implementation of the requirement you posted above. If you are a beginner and in need of little more clear explanations, ping me in comment and am ready to give a little more information. By the way, start playing around with pandas.

import pandas as pd 
df = pd.read_csv('input.csv')
for columns in df:
 if len(df[columns].unique()) == 1:
 df.drop(columns, 1, inplace=True)
df.to_csv('output.csv', index=None)
answered Mar 22, 2017 at 13:04
Sign up to request clarification or add additional context in comments.

6 Comments

thanks for your answer, but how can I ignore the headers in the 1st rows ?
While writing to csv?? just do df.to_csv('output.csv', index=None, header=False)
no while reading because if the check includes the headers , it will never delete any column
Sorry for the late reply bro.! while reading, just do this.. df = pd.read_csv('input.csv', header = None )
thanks for replying I've solved it, but using the hardway no panda ;-)
|
1

Unless the spreadsheet is truly enormous, just read the whole thing in and then find what you want!

Untested code:

headers = reader.next()
sheet = []
for row in reader:
 sheet.append(row)
# assuming all rows are the same length ...
for colno,header in enumerate(headers):
 col = [ row[colno] for row in sheet ]
 distinct = set( col)
 if len(distinct) > 1:
 # col contains at least two distinct values, so
 # do something with it and its header and/or column number
 writer.writerow( [header, colno] + col )
answered Mar 22, 2017 at 13:47

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.