So I got a CSV file that has 14 columns, and I was trying to convert that data to string type and I tried this
import pandas as pd
df = pd.read_csv("2008_data_test.csv", sep=",")
output = pd.DataFrame(columns=df.columns)
for c in df.columns:
if df[c].dtype == object:
print "convert ", df[c].name, " to string"
df[c] = df[c].astype(str)
output.to_csv("2008_data_test.csv_tostring2.csv", index=False)
This gives me the headers only, and I can't figure out what I missed?
Any ideas? And is it possible to convert specific columns?
1 Answer 1
You're modifying one dataframe, but writing another, that's the reason. Use select_dtypes instead.
c = df.select_dtypes(include=[object]).columns
df[c] = df[c].astype(str)
df.to_csv("2008_data_test.csv_tostring2.csv", index=False)
As MaxU suggested, it might be simpler to do filter by dtypes in this manner:
c = df.columns[df.dtypes.eq('object')]
The former creates a dataframe subslice before accessing columns, so this should be cheaper.
If you want to convert specific columns only, you can remove columns as needed from c before the conversion using c = c.difference(['Col1', 'Col2', ...]).
3 Comments
df.columns[df.dtypes.eq('object')] instead of df.select_dtypes(include=[object]).columns as the latter (df.select_dtypes(include=[object])) first generates a DataFrame and then returns its columns...
output = pd.DataFrame(columns=df.columns)is defined before theforloop. That's what you write to CSV; theforloop does basically nothing. Did you meandf.to_csv("2008_data_test.csv_tostring2.csv", index=False)instead?