4

I have a text file of the following form:

('1', '2')
('3', '4')
 .
 .
 .

and i'm trying to get it to look like this:

1 2
3 4
etc...

I've been trying to do this using the re module in python, by chaining together re.sub commands like so:

for line in file:
 s = re.sub(r"\(", "", line)
 s1 = re.sub(r",", "", s)
 s2 = re.sub(r"'", "", s1)
 s3 = re.sub(r"\)", "", s2)
 output.write(s3)
output.close()

It seems to work great until I get near the end of my output file; then it becomes inconsistent and stops working. I am thinking it is because of the sheer SIZE of the file I am working with; 300MB or approximately 12 million lines.

Can anyone help me confirm that I'm simply running out of memory? Or if it is something else? Suitable alternatives, or ways around this?

asked Sep 22, 2015 at 15:43
3
  • 1
    It looks like your file is full of representations of two-tuples of strings representing integers - why?! You could ast.literal_eval each line and use csv to write it back out. Commented Sep 22, 2015 at 15:45
  • 1
    It's processing the file line by line, so I don't see how the size of the file should be causing a problem. Are you sure there isn't something else in your code creating an isue? Commented Sep 22, 2015 at 15:46
  • You can use a single regex: output.write(re.sub(r"\(\s*'(\d+)',\s*'(\d+)'\s*\)", r"1円 2円", line)). But as I say, that's not your problem. You might need to show more of your code to get an answer to that particular issue. Commented Sep 22, 2015 at 16:08

4 Answers 4

2

You could simplify your code by using a simpler regex that finds all numbers in your input:

import re
with open(file_name) as input,open(output_name,'w') as output:
for line in input:
 output.write(' '.join(re.findall('\d+', line))
 output.write('\n')
answered Sep 22, 2015 at 15:51

Comments

1

Why don't load them as python tuples with ast.literal_eval. Also instead of opening and closing the files manually you can use with statement which close the file at the end of the block :

With open(file_name) as input,open(output_name,'w') as output:
 for line in input:
 output.write(','.join(ast.literal_eval(line.strip())))
answered Sep 22, 2015 at 15:47

Comments

1

I would used a namedtuple for better performance. And the code becomes more readable.

# Python 3
from collections import namedtuple
from ast import literal_eval
#...
Row = namedtuple('Row', 'x y')
with open(in_file, 'r') as f, open(out_file, 'w') as output:
 for line in f.readlines():
 output.write("{0.x} {0.y}".
 format(Row._make(literal_eval(line))))
answered Sep 22, 2015 at 16:05

2 Comments

I got this error(my first line is 35 characters long): r = Row._make(line) File "<string>", line 21, in _make TypeError: Expected 2 arguments, got 35
@EliRiekeberg , Okay, updated to fix that - the answer now converts using ast.literal_eval as mentioned by @Kasramvd which converts from the string line to tuple for input in namedtuple and also consolidate output.write()
0

This is one way to do it without the re module:

in_file = open(r'd:\temp02円\input.txt', 'r')
out_file = open(r'd:\temp02円\output.txt', 'w')
for line in in_file:
 out_file.write(line.replace("'", '').replace('(', '').replace(', ', ' ').replace(')', ''))
out_file.close()
answered Sep 22, 2015 at 18:47

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.