I recently tried my hand at writing Go code for a project I'm working on. In order to help myself get a better grasp I decided to write a simple yet somewhat practical program for parsing numeral characters from strings based on the ascii values of the runes within, and appending them to a slice.
My primary concerns with this code are:
- It may not be a good solution.
- It may not be very performant.
- It may not be very idiomatic or reflective of best practices.
- It may not be very concise, clean, or readable.
Note: The code below contains the documentation comments. Since this is my first time using Stack Exchange, I'm not entirely certain whether or not this is reasonable and it makes the read a little long.
/*
parseNum package
This is a package which contains a function for converting characters
(charToInt), in the form of runes, into numerals. It also contains a
function for parsing numerals from strings (parseNum).
*/
package main
import (
"errors"
"fmt"
)
/*
CharToNum function
This is a function which recieves a rune and converts it to a numeral
based on its ascii code by matching it against the numeral ascii codes
in a for loop. It subtracts 48 from the code to get the actual numeral,
as the ascii codes for numerals are 48-57 and simple subtraction gives
you the actual numeral.
*/
func CharToNum(r rune) (int, error) {
for i := 48; i <= 57; i++ {
if int(r) == i {
return (int(r) - 48), nil
}
}
return -1, errors.New("type: rune was not int")
}
/*
ParseNum function
This is a function which serves the purpose of identifying numerals inside
strings and returning them in a slice. It loops over the string, passing each
character to the charToNum function and identifying whether it should append the
output to the array by testing whether or not the error evaluates to nil.
*/
func ParseNum(str string) []int {
var nums []int
for _, val := range str {
num, err := CharToNum(val)
if err != nil {
continue
}
nums = append(nums, num)
}
return nums
}
func main() {
fmt.Println(ParseNum("000123456789000"))
}
2 Answers 2
In your question, you wrote:
func CharToNum(r rune) (int, error) {
for i := 48; i <= 57; i++ {
if int(r) == i {
return (int(r) - 48), nil
}
}
return -1, errors.New("type: rune was not int")
}
In his answer, Janos wrote:
func CharToNum(r rune) (int, error) {
intval := int(r) - '0'
if 0 <= intval && intval <= 9 {
return intval, nil
}
return -1, errors.New("type: rune was not int")
}
Using idiomatic Go, I wrote,
var ErrRuneNotInt = errors.New("type: rune was not int")
func CharToNum(r rune) (int, error) {
if '0' <= r && r <= '9' {
return int(r) - '0', nil
}
return 0, ErrRuneNotInt
}
It's simple and direct and it's fast. Benchmarking the numbers zero through nine:
BenchmarkCharToNumTschfld 5000000 323 ns/op
BenchmarkCharToNumPeter 50000000 37.7 ns/op
Benchmarking the invalid numeric character space (' '
):
BenchmarkCharToNumErrTschfld 10000000 242 ns/op 16 B/op 1 allocs/op
BenchmarkCharToNumErrPeter 2000000000 1.26 ns/op 0 B/op 0 allocs/op
The error variable ErrRuneNotInt
can be used to check for a specific error, for example, if err == ErrRuneNotInt {}
.
In your question, you wrote:
func ParseNum(str string) []int {
var nums []int
for _, val := range str {
num, err := CharToNum(val)
if err != nil {
continue
}
nums = append(nums, num)
}
return nums
}
Using idiomatic Go, I wrote,
func ParseNum(s string) []int {
nLen := 0
for i := 0; i < len(s); i++ {
if b := s[i]; '0' <= b && b <= '9' {
nLen++
}
}
var n = make([]int, 0, nLen)
for i := 0; i < len(s); i++ {
if b := s[i]; '0' <= b && b <= '9' {
n = append(n, int(b)-'0')
}
}
return n
}
It minimizes both CPU time and memory allocations; it's fast. Benchmarking the string "000123456789000"
:
BenchmarkParseNumTschfld 1000000 2475 ns/op 248 B/op 5 allocs/op
BenchmarkParseNumPeter 2000000 610 ns/op 128 B/op 1 allocs/op
Unicode and UTF-8 characters corresponding to the ASCII set have the same byte values as ASCII.
Instead of this loop:
for i := 48; i <= 57; i++ { if int(r) == i { return (int(r) - 48), nil } }
This is equivalent:
intval := int(r)
if 48 <= intval && intval <= 57 {
return (intval - 48), nil
}
And if we subtract 48 earlier, the code will start to look more natural:
intval := int(r) - 48
if 0 <= intval && intval <= 9 {
return intval, nil
}
And what is this magical 48 really? The ASCII code of '0'
of course! We can write it that way:
intval := int(r) - '0'
The name ParseNum
is a bit misleading. For example a function called parseInt
usually tries to convert a string to an integer, and if the string is not a valid integer, it raises an error. That's not how ParseNum
works. ParseNum
will happily convert "he1l2l3oth4ere"
to [1234]
. A better name would be StripNonDigit
or ExtractDigits
.
-
\$\begingroup\$ Firstly, thank you. Secondly, I'm curious about naming. 'ExtractDigits' seems closer to how the function actually works. Were I to write a package/library which used things like the Read interface to extract information from various data "inputs", would it make the most sense to name it 'extract' or something related? \$\endgroup\$tschfld– tschfld2016年03月14日 22:51:27 +00:00Commented Mar 14, 2016 at 22:51
-
\$\begingroup\$ That seems reasonable \$\endgroup\$janos– janos2016年03月14日 22:54:35 +00:00Commented Mar 14, 2016 at 22:54