9
\$\begingroup\$

Lets say you have several .txt files in a folder and want to merge all text files into a single Excel document:

A.txt

A1
A2
A3
A4
A5
A6
A7
A8
A9
A10
A11
A12
A13
A14
A15
A16
A17
A18
A19
A20
A21
A22

B.txt

B1
B2
B3
B4
B5
B6
B7
B8
B9
B10
B11
B12
B13
B14

C.txt

C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C11
C12
C13
C14
C15
C16
C17
C18
C19
C20
C21
C22
C23
C24
C25
C26
C27
C28
C29
C30
C31
C32
C33

D.txt

//empty file

E.txt

E1
E2
E3
E4
E5
E6
E7
E8
E9
E10
E11
E12
E13
E14
E15
E16
E17
E18
E19
E20
E21
E22
E23
E24
E25
E26
E27
E28
E29
E30
E31
E32
E33
E34
E35
E36
E37
E38

The Output:

enter image description here

spreadsheet_to_text.py

"""
Reads all .txt files in path of the script into a single
spreadsheet. In the first line of the spreadsheet the filename were
the data is from is displayed. Then the data follows
"""
import os
from typing import List
import openpyxl
from openpyxl.utils import get_column_letter
def text_into_spreadsheet():
 """main logic for read .txt into spreadsheet"""
 workbook = openpyxl.Workbook()
 sheet = workbook.active
 column: int = 1
 filenames: List[str] = os.listdir()
 for filename in filenames:
 if filename.endswith(".txt"):
 with open(filename) as textfile:
 lines: List[int] = textfile.readlines()
 sheet[get_column_letter(column) + '1'] = filename
 row: int = 2
 for line in lines:
 sheet[get_column_letter(column) + str(row)] = line
 row += 1
 column += 1
 workbook.save('result.xlsx')
if __name__ == "__main__":
 text_into_spreadsheet()

What do you think about the code? How can it be improved?

edit: you can find a programm doing it in reversed in Spreadsheets to text files

asked Dec 18, 2018 at 20:10
\$\endgroup\$
1
  • 2
    \$\begingroup\$ Does it have to be an Excel spreadsheet, or can it simply be importable by Excel? If the latter, you really should output to something like CSV, which will be faster and simpler. \$\endgroup\$ Commented Dec 18, 2018 at 23:18

2 Answers 2

11
\$\begingroup\$

There are a few things we could improve:

  • you could use the "lazy" glob.iglob() to filter out *.txt files instead of doing the os.listdir(), keeping the whole list if memory and having an extra check inside the loop:

    for filename in glob.iglob("*.txt"):
    
  • instead of using textfile.readlines() and read all the lines in a file into memory, iterate over the file object directly in a "lazy" manner:

    for line in textfile:
    
  • instead of manually keeping track of column value, you could use enumerate():

    for column, filename in enumerate(glob.iglob("*.txt"), start=1):
    

    Same idea could be applied for rows.

  • I think you don't have to use get_column_letter() and instead operate the numbers which you have:

    sheet.cell(row=row, column=column).value = line
    
  • not to say anything against openpyxl, but I personally find xlsxwriter module's API more enjoyable and more feature rich


Complete improved version:

import glob
import openpyxl
def text_into_spreadsheet():
 """main logic for read .txt into spreadsheet"""
 workbook = openpyxl.Workbook()
 sheet = workbook.active
 for column, filename in enumerate(glob.iglob("*.txt"), start=1):
 with open(filename) as textfile:
 sheet.cell(row=1, column=column).value = filename
 for row, line in enumerate(textfile, start=2):
 sheet.cell(row=row, column=column).value = line
 workbook.save('result.xlsx')
if __name__ == "__main__":
 text_into_spreadsheet()
answered Dec 18, 2018 at 22:44
\$\endgroup\$
1
  • \$\begingroup\$ I think this solution is very very nice. It is alot better to read compared to what i did. \$\endgroup\$ Commented Dec 19, 2018 at 18:17
4
\$\begingroup\$

If you're on a Linux/Mac machine, it's as simple as this shell command:

paste ?.txt

The ? wildcard will match all your files, A.txt to E.txt in order. The paste command will paste them in parallel columns, separated by TABs.

You can then open your spreadsheet app and import the text file, and add the header.

Per a question formerly in comments: Can you auto-generate the header as well? Sure:

for f in ?.txt; do echo -en "$f\t"; done; echo; paste ?.txt

Also, I'm assuming a single letter before .txt, as in the original example. If you want all files ending in .txt, then it's *.txt instead of ?.txt.

Sᴀᴍ Onᴇᴌᴀ
29.5k16 gold badges45 silver badges201 bronze badges
answered Dec 19, 2018 at 16:13
\$\endgroup\$
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.