Optimize a simple and quick python script for transposing a .csv file

Question 1

I need to transpose the following file output1.csv, which is is a result from a quantum chemistry calculation into a single colum efficiently:

 Frequencies -- 18.8210 44.7624 46.9673
 Frequencies -- 66.6706 102.0432 112.4930
 Frequencies -- 124.4601 138.4393 180.1404
 Frequencies -- 230.0306 240.4389 258.2459
 Frequencies -- 282.7781 340.8302 357.7789
 Frequencies -- 378.9043 384.1284 401.4285
 Frequencies -- 418.0523 444.2264 447.6885
 Frequencies -- 473.2391 501.0937 518.9083
 Frequencies -- 559.5925 609.9256 623.7729
 Frequencies -- 657.4144 672.5480 728.2009
 Frequencies -- 740.5035 750.3238 757.2199
 Frequencies -- 774.6343 806.7750 815.9990
 Frequencies -- 839.3050 858.0716 876.1641
 Frequencies -- 888.6654 942.2963 965.7888
 Frequencies -- 987.3819 994.7388 1020.8724
 Frequencies -- 1025.0426 1045.5129 1059.0966
 Frequencies -- 1076.5127 1143.1178 1155.4200
 Frequencies -- 1208.6790 1219.7513 1244.7080
 Frequencies -- 1265.6108 1287.8830 1300.0463
 Frequencies -- 1325.0427 1339.0678 1353.0061
 Frequencies -- 1369.0614 1408.5258 1433.0543
 Frequencies -- 1452.4148 1454.6319 1500.4304
 Frequencies -- 1511.2305 1517.2562 1552.9189
 Frequencies -- 1560.5313 1636.2290 1640.1732
 Frequencies -- 1664.8747 1681.5566 1703.2026
 Frequencies -- 1770.2627 3058.4143 3122.3743
 Frequencies -- 3147.1828 3192.5897 3199.1398
 Frequencies -- 3211.0676 3222.0033 3236.3394
 Frequencies -- 3262.2119 3556.7997 3862.4791

To achieve that I wrote this code:

import os
import csv
import re
import sys
import pandas as pd
inputfile = open('output1.csv', 'r')
reader = csv.reader(inputfile)
outputfile = open('output1_f.csv', 'a')
writer = csv.writer(outputfile)
with open('output1_f.csv', 'w') as file:
 file.write('Frequencies,Frequencies,Frequencies\n')
for row in reader: 
 row = [re.sub(' +', ',', item) for item in row]
 row = [re.sub(',Frequencies,--,', '', item) for item in row] 
# row = map(str.strip, row)
 writer.writerow(row)
inputfile.close()
outputfile.close()

I added the code as comment in order to delete the first and last character " from every line in the output1_f.csv file. However it does not work out.

 row = map(str.strip, row)

I found the line.replace solution which creates a second output1_2f.csv file.

inputfile = open('output1_f.csv', 'r')
outputfile = open('output1_2f.csv', 'w')
for line in inputfile:
 line = line.replace('"', '')
 outputfile.write(line)
inputfile.close()
outputfile.close()

The following transpose part works only if the character " is removed and this is the reason that I need to delete the " character efficiently code instead of line.replace.

ifile = open('output1_2f.csv', "rb")
reader = csv.reader(ifile)
with open('output1_transp.csv', 'w') as out:
 rownum = 0
 for row in reader:
 # Save header row.
 if rownum == 0:
 header = row
 else:
 colnum = 0
 for col in row:
 out.write( '%s\n' % (col))
 colnum += 1
 rownum += 1
ifile.close()

I would be grateful if you could propose me any idea to shorten the code and make it more efficient and easier to use. Thank you developers for your time!!!

Question 2

It's not clear what the result is intended to be. Is it a file with a single column of frequencies?

Question 3

I believe there is a much simpler way of doing this.

If I correctly understand the problem, the idea is to produce a single column of frequencies.

with open('input.txt', 'r') as infile, open('out.csv', 'w') as outfile:
 print >> outfile, "Frequency"
 for line in infile:
 for freq in line.split()[2:]:
 print >> outfile, freq

This iterates over each line in the input file. It uses split to eliminate whitespace and to split each line into fields. It then discards the first two, which are assumed to be "Frequencies" and "--" and writes the remaining fields to the output file.

Question 4

Downvoter: could you explain? How could this answer be improved?

Question 5

@Edward I'm not the downvoter, but print >> outfile, "Frequency" is only erroring for me. And may explain the downvote.

Question 6

@JoeWallis Could it be version difference? The question itself isn't tagged with a specific version.

Question 7

@Mast Yes, print >>stream, ... is explicitly Python 2, in Python 3 you use print(..., file=stream) instead.

Question 8

@MathiasEttinger @Mast I copied the open without the 'w' flag. Silly me ):

Question 9

If I am understanding what you want, and you are already importing pandas, this can be done very easily:

import csv
import pandas as pd
df = pd.read_csv('output1.csv', header=None, delim_whitespace=True)
df = df.iloc[:, 2:].stack().reset_index(drop=True).to_frame()
df.insert(0, -1, '--')
df.insert(0, -2, 'Frequencies')
df.to_csv('output1_f.csv', sep=' ', mode='a', index=False, quoting=csv.QUOTE_NONE, header=None)

And the result:

Frequencies -- 18.821
Frequencies -- 44.7624
Frequencies -- 46.9673
Frequencies -- 66.6706
Frequencies -- 102.0432
Frequencies -- 112.493
Frequencies -- 124.4601
Frequencies -- 138.4393
Frequencies -- 180.1404
Frequencies -- 230.0306
Frequencies -- 240.4389
Frequencies -- 258.2459
Frequencies -- 282.7781
Frequencies -- 340.8302
Frequencies -- 357.7789
Frequencies -- 378.9043
Frequencies -- 384.1284
Frequencies -- 401.4285
Frequencies -- 418.0523
Frequencies -- 444.2264
Frequencies -- 447.6885
Frequencies -- 473.2391
Frequencies -- 501.0937
Frequencies -- 518.9083
Frequencies -- 559.5925
Frequencies -- 609.9256
Frequencies -- 623.7729
Frequencies -- 657.4144
Frequencies -- 672.548
Frequencies -- 728.2009
Frequencies -- 740.5035
Frequencies -- 750.3238
Frequencies -- 757.2199
Frequencies -- 774.6343
Frequencies -- 806.775
Frequencies -- 815.999
Frequencies -- 839.305
Frequencies -- 858.0716
Frequencies -- 876.1641
Frequencies -- 888.6654
Frequencies -- 942.2963
Frequencies -- 965.7888
Frequencies -- 987.3819
Frequencies -- 994.7388
Frequencies -- 1020.8724
Frequencies -- 1025.0426
Frequencies -- 1045.5129
Frequencies -- 1059.0966
Frequencies -- 1076.5127
Frequencies -- 1143.1178
Frequencies -- 1155.42
Frequencies -- 1208.679
Frequencies -- 1219.7513
Frequencies -- 1244.708
Frequencies -- 1265.6108
Frequencies -- 1287.883
Frequencies -- 1300.0463
Frequencies -- 1325.0427
Frequencies -- 1339.0678
Frequencies -- 1353.0061
Frequencies -- 1369.0614
Frequencies -- 1408.5258
Frequencies -- 1433.0543
Frequencies -- 1452.4148
Frequencies -- 1454.6319
Frequencies -- 1500.4304
Frequencies -- 1511.2305
Frequencies -- 1517.2562
Frequencies -- 1552.9189
Frequencies -- 1560.5313
Frequencies -- 1636.229
Frequencies -- 1640.1732
Frequencies -- 1664.8747
Frequencies -- 1681.5566
Frequencies -- 1703.2026
Frequencies -- 1770.2627
Frequencies -- 3058.4143
Frequencies -- 3122.3743
Frequencies -- 3147.1828
Frequencies -- 3192.5897
Frequencies -- 3199.1398
Frequencies -- 3211.0676
Frequencies -- 3222.0033
Frequencies -- 3236.3394
Frequencies -- 3262.2119
Frequencies -- 3556.7997
Frequencies -- 3862.4791

Edward EdwardEdward 67.2k4 gold badges120 silver badges284 bronze badges · Accepted Answer · 2016-07-06 13:28:27Z

2

\$\begingroup\$

I believe there is a much simpler way of doing this.

If I correctly understand the problem, the idea is to produce a single column of frequencies.

with open('input.txt', 'r') as infile, open('out.csv', 'w') as outfile:
 print >> outfile, "Frequency"
 for line in infile:
 for freq in line.split()[2:]:
 print >> outfile, freq

This iterates over each line in the input file. It uses split to eliminate whitespace and to split each line into fields. It then discards the first two, which are assumed to be "Frequencies" and "--" and writes the remaining fields to the output file.

Share

answered Jul 6, 2016 at 13:28

Edward's user avatar

Edward EdwardEdward

67.2k4 gold badges120 silver badges284 bronze badges

\$\endgroup\$

5

\$\begingroup\$ Downvoter: could you explain? How could this answer be improved? \$\endgroup\$

Edward
– Edward

2016年07月07日 01:08:44 +00:00
Commented Jul 7, 2016 at 1:08
\$\begingroup\$ @Edward I'm not the downvoter, but print >> outfile, "Frequency" is only erroring for me. And may explain the downvote. \$\endgroup\$

Peilonrayz
– Peilonrayz ♦

2016年07月07日 08:40:21 +00:00
Commented Jul 7, 2016 at 8:40
\$\begingroup\$ @JoeWallis Could it be version difference? The question itself isn't tagged with a specific version. \$\endgroup\$

Mast
– Mast ♦

2016年07月07日 11:10:06 +00:00
Commented Jul 7, 2016 at 11:10
\$\begingroup\$ @Mast Yes, print >>stream, ... is explicitly Python 2, in Python 3 you use print(..., file=stream) instead. \$\endgroup\$

301_Moved_Permanently
– 301_Moved_Permanently

2016年07月07日 11:12:45 +00:00
Commented Jul 7, 2016 at 11:12
\$\begingroup\$ @MathiasEttinger @Mast I copied the open without the 'w' flag. Silly me ): \$\endgroup\$

Peilonrayz
– Peilonrayz ♦

2016年07月07日 11:24:09 +00:00
Commented Jul 7, 2016 at 11:24

Add a comment |

Stack Exchange Network

Optimize a simple and quick python script for transposing a .csv file

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Optimize a simple and quick python script for transposing a .csv file

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions