I need to transpose the following file output1.csv, which is is a result from a quantum chemistry calculation into a single colum efficiently:
Frequencies -- 18.8210 44.7624 46.9673
Frequencies -- 66.6706 102.0432 112.4930
Frequencies -- 124.4601 138.4393 180.1404
Frequencies -- 230.0306 240.4389 258.2459
Frequencies -- 282.7781 340.8302 357.7789
Frequencies -- 378.9043 384.1284 401.4285
Frequencies -- 418.0523 444.2264 447.6885
Frequencies -- 473.2391 501.0937 518.9083
Frequencies -- 559.5925 609.9256 623.7729
Frequencies -- 657.4144 672.5480 728.2009
Frequencies -- 740.5035 750.3238 757.2199
Frequencies -- 774.6343 806.7750 815.9990
Frequencies -- 839.3050 858.0716 876.1641
Frequencies -- 888.6654 942.2963 965.7888
Frequencies -- 987.3819 994.7388 1020.8724
Frequencies -- 1025.0426 1045.5129 1059.0966
Frequencies -- 1076.5127 1143.1178 1155.4200
Frequencies -- 1208.6790 1219.7513 1244.7080
Frequencies -- 1265.6108 1287.8830 1300.0463
Frequencies -- 1325.0427 1339.0678 1353.0061
Frequencies -- 1369.0614 1408.5258 1433.0543
Frequencies -- 1452.4148 1454.6319 1500.4304
Frequencies -- 1511.2305 1517.2562 1552.9189
Frequencies -- 1560.5313 1636.2290 1640.1732
Frequencies -- 1664.8747 1681.5566 1703.2026
Frequencies -- 1770.2627 3058.4143 3122.3743
Frequencies -- 3147.1828 3192.5897 3199.1398
Frequencies -- 3211.0676 3222.0033 3236.3394
Frequencies -- 3262.2119 3556.7997 3862.4791
To achieve that I wrote this code:
import os
import csv
import re
import sys
import pandas as pd
inputfile = open('output1.csv', 'r')
reader = csv.reader(inputfile)
outputfile = open('output1_f.csv', 'a')
writer = csv.writer(outputfile)
with open('output1_f.csv', 'w') as file:
file.write('Frequencies,Frequencies,Frequencies\n')
for row in reader:
row = [re.sub(' +', ',', item) for item in row]
row = [re.sub(',Frequencies,--,', '', item) for item in row]
# row = map(str.strip, row)
writer.writerow(row)
inputfile.close()
outputfile.close()
I added the code as comment in order to delete the first and last character " from every line in the output1_f.csv file. However it does not work out.
row = map(str.strip, row)
I found the line.replace solution which creates a second output1_2f.csv file.
inputfile = open('output1_f.csv', 'r')
outputfile = open('output1_2f.csv', 'w')
for line in inputfile:
line = line.replace('"', '')
outputfile.write(line)
inputfile.close()
outputfile.close()
The following transpose part works only if the character " is removed and this is the reason that I need to delete the " character efficiently code instead of line.replace.
ifile = open('output1_2f.csv', "rb")
reader = csv.reader(ifile)
with open('output1_transp.csv', 'w') as out:
rownum = 0
for row in reader:
# Save header row.
if rownum == 0:
header = row
else:
colnum = 0
for col in row:
out.write( '%s\n' % (col))
colnum += 1
rownum += 1
ifile.close()
I would be grateful if you could propose me any idea to shorten the code and make it more efficient and easier to use. Thank you developers for your time!!!
-
2\$\begingroup\$ It's not clear what the result is intended to be. Is it a file with a single column of frequencies? \$\endgroup\$Edward– Edward2016年07月06日 12:59:15 +00:00Commented Jul 6, 2016 at 12:59
2 Answers 2
I believe there is a much simpler way of doing this.
If I correctly understand the problem, the idea is to produce a single column of frequencies.
with open('input.txt', 'r') as infile, open('out.csv', 'w') as outfile:
print >> outfile, "Frequency"
for line in infile:
for freq in line.split()[2:]:
print >> outfile, freq
This iterates over each line in the input file. It uses split
to eliminate whitespace and to split each line into fields. It then discards the first two, which are assumed to be "Frequencies" and "--" and writes the remaining fields to the output file.
-
\$\begingroup\$ Downvoter: could you explain? How could this answer be improved? \$\endgroup\$Edward– Edward2016年07月07日 01:08:44 +00:00Commented Jul 7, 2016 at 1:08
-
\$\begingroup\$ @Edward I'm not the downvoter, but
print >> outfile, "Frequency"
is only erroring for me. And may explain the downvote. \$\endgroup\$2016年07月07日 08:40:21 +00:00Commented Jul 7, 2016 at 8:40 -
\$\begingroup\$ @JoeWallis Could it be version difference? The question itself isn't tagged with a specific version. \$\endgroup\$2016年07月07日 11:10:06 +00:00Commented Jul 7, 2016 at 11:10
-
\$\begingroup\$ @Mast Yes,
print >>stream, ...
is explicitly Python 2, in Python 3 you useprint(..., file=stream)
instead. \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2016年07月07日 11:12:45 +00:00Commented Jul 7, 2016 at 11:12 -
\$\begingroup\$ @MathiasEttinger @Mast I copied the
open
without the'w'
flag. Silly me ): \$\endgroup\$2016年07月07日 11:24:09 +00:00Commented Jul 7, 2016 at 11:24
If I am understanding what you want, and you are already importing pandas, this can be done very easily:
import csv
import pandas as pd
df = pd.read_csv('output1.csv', header=None, delim_whitespace=True)
df = df.iloc[:, 2:].stack().reset_index(drop=True).to_frame()
df.insert(0, -1, '--')
df.insert(0, -2, 'Frequencies')
df.to_csv('output1_f.csv', sep=' ', mode='a', index=False, quoting=csv.QUOTE_NONE, header=None)
And the result:
Frequencies -- 18.821
Frequencies -- 44.7624
Frequencies -- 46.9673
Frequencies -- 66.6706
Frequencies -- 102.0432
Frequencies -- 112.493
Frequencies -- 124.4601
Frequencies -- 138.4393
Frequencies -- 180.1404
Frequencies -- 230.0306
Frequencies -- 240.4389
Frequencies -- 258.2459
Frequencies -- 282.7781
Frequencies -- 340.8302
Frequencies -- 357.7789
Frequencies -- 378.9043
Frequencies -- 384.1284
Frequencies -- 401.4285
Frequencies -- 418.0523
Frequencies -- 444.2264
Frequencies -- 447.6885
Frequencies -- 473.2391
Frequencies -- 501.0937
Frequencies -- 518.9083
Frequencies -- 559.5925
Frequencies -- 609.9256
Frequencies -- 623.7729
Frequencies -- 657.4144
Frequencies -- 672.548
Frequencies -- 728.2009
Frequencies -- 740.5035
Frequencies -- 750.3238
Frequencies -- 757.2199
Frequencies -- 774.6343
Frequencies -- 806.775
Frequencies -- 815.999
Frequencies -- 839.305
Frequencies -- 858.0716
Frequencies -- 876.1641
Frequencies -- 888.6654
Frequencies -- 942.2963
Frequencies -- 965.7888
Frequencies -- 987.3819
Frequencies -- 994.7388
Frequencies -- 1020.8724
Frequencies -- 1025.0426
Frequencies -- 1045.5129
Frequencies -- 1059.0966
Frequencies -- 1076.5127
Frequencies -- 1143.1178
Frequencies -- 1155.42
Frequencies -- 1208.679
Frequencies -- 1219.7513
Frequencies -- 1244.708
Frequencies -- 1265.6108
Frequencies -- 1287.883
Frequencies -- 1300.0463
Frequencies -- 1325.0427
Frequencies -- 1339.0678
Frequencies -- 1353.0061
Frequencies -- 1369.0614
Frequencies -- 1408.5258
Frequencies -- 1433.0543
Frequencies -- 1452.4148
Frequencies -- 1454.6319
Frequencies -- 1500.4304
Frequencies -- 1511.2305
Frequencies -- 1517.2562
Frequencies -- 1552.9189
Frequencies -- 1560.5313
Frequencies -- 1636.229
Frequencies -- 1640.1732
Frequencies -- 1664.8747
Frequencies -- 1681.5566
Frequencies -- 1703.2026
Frequencies -- 1770.2627
Frequencies -- 3058.4143
Frequencies -- 3122.3743
Frequencies -- 3147.1828
Frequencies -- 3192.5897
Frequencies -- 3199.1398
Frequencies -- 3211.0676
Frequencies -- 3222.0033
Frequencies -- 3236.3394
Frequencies -- 3262.2119
Frequencies -- 3556.7997
Frequencies -- 3862.4791
Explore related questions
See similar questions with these tags.