Given a text file including different text and integer numbers (text.txt), what's the shortest way of getting the sum of all the numbers out if it? I have:
import re
f = open('text.txt', 'r')
result = 0
for line in f.readlines():
for value in re.findall('[0-9]+', line):
result += int(value)
f.close()
print(result)
Which works, but I would like to understand what are the possibilities to make it shorter?
-
1\$\begingroup\$ You could make it shorter but you're going to sacrifice readability along the way. \$\endgroup\$James Buck– James Buck2016年04月16日 09:28:49 +00:00Commented Apr 16, 2016 at 9:28
5 Answers 5
Here's a one line code. Shortest I could find.
import re
print(sum([int(i) for i in re.findall('[0-9]+', open('text.txt').read())]))
-
1\$\begingroup\$ Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center. \$\endgroup\$2021年09月17日 07:32:09 +00:00Commented Sep 17, 2021 at 7:32
-
1\$\begingroup\$ Hi, welcome to Code Review! This isn't Code Golf, so the goal on this site is to write good code, not necessarily short code. Also, answers are expected to make "at least one insightful observation about the code in the question". codereview.stackexchange.com/help/how-to-answer \$\endgroup\$Alex Waygood– Alex Waygood2021年09月17日 09:52:27 +00:00Commented Sep 17, 2021 at 9:52
You can use map
and sum
. It good practice to use with statement when working with a file.
with open('text.txt', 'r') as f:
result = sum(map(int, re.findall(r'[0-9]+', f.read())))
print(result)
-
\$\begingroup\$ I doubt the regex is necessary. What is fileted out? \$\endgroup\$Martin Thoma– Martin Thoma2020年08月06日 18:40:48 +00:00Commented Aug 6, 2020 at 18:40
read
instead of readlines
will read the whole file into a single string. Using the file within a context avoids calling close()
. The generator expression shortens the remaining loop. sum
does what you'd expect:
import re
with open('text.txt', 'r') as f:
result = sum(int(value) for value in re.findall('[0-9]+', f.read()))
print(result)
-
\$\begingroup\$ I think you missed a
sum
call. \$\endgroup\$James Buck– James Buck2016年04月16日 09:31:25 +00:00Commented Apr 16, 2016 at 9:31 -
\$\begingroup\$ Thx for attention. Corrected it ;) \$\endgroup\$schwobaseggl– schwobaseggl2016年04月16日 09:31:56 +00:00Commented Apr 16, 2016 at 9:31
\d
instead of [0-9]
saves a little.
I'd say that shorter shouldn't be the goal in itself, unless you're golfing (in which case, you want to be on Code Golf and Coding Challenges). A better criterion is to have the simplest code that achieves the desired aim.
A major contribution to simplicity is to replace the explicit close of f
with a with
block:
with open('text.txt', 'r') as f:
# do the addition
# f is closed before we get here
Another simplification is to use sum
/map
as suggested in styvane's answer; however, it may be more efficient to read the file a line at a time rather than slurping the entire contents into memory if it's very long.
We could also make the code more efficient by compiling the regular expression just once.
Following that, we might want to make the code more reusable and testable. I would separate into a function that does the summing but doesn't care where the stream comes from (file, pipe, socket, string), and have tests for that. Then simply call it with the input file stream.
That reusable function including unit-tests looks like this:
import re
def sum_integers(stream):
"""Return the total of all the digit strings found in the input stream
>>> from io import StringIO
>>> sum_integers(StringIO(''))
0
>>> sum_integers(StringIO('1'))
1
>>> sum_integers(StringIO('-1'))
1
>>> sum_integers(StringIO('a1z'))
1
>>> sum_integers(StringIO('1-2'))
3
>>> sum_integers(StringIO('1-2\\n3.4'))
10
"""
digit_pattern = re.compile(r'\d+')
return sum(sum(map(int, digit_pattern.findall(line))) for line in stream)
if __name__ == "__main__":
import doctest
doctest.testmod()
And we can use it very simply:
def print_sum_of_integers_from_file(filename):
print(sum_integers(open(filename, 'r')))
That's not shorter, but it is better in the ways I've described (more efficient, flexible and maintainable).