I'd like to read lines from a file and process them concurrently. I came up with the following code to do this:
var wg sync.WaitGroup
func main(){
file := "data/input.txt"
reader, _ := os.Open(file)
scanner := bufio.NewScanner(reader)
for scanner.Scan() {
wg.Add(1)
go processLine(scanner.Text())
}
wg.Wait()
}
func processLine(line string) {
time.Sleep(time.Duration(rand.Intn(5)) * time.Second)
fmt.Println("line:", line)
wg.Done()
}
I added the random sleep time in there to simulate potential differences in processing times.
Is there any potential drawbacks that I should be aware of with this method of concurrent processing? Are there any better ways that I should consider concurrently processing lines in a file?
1 Answer 1
The general approach seems fine. You may want to benchmark your application based on the type of inputs you'll receive and the type of computation that's done. If the computation is CPU intensive, it may not make sense to have too many goroutines running in parallel since they can't all use the CPU at the same time.
If you find that's the case, it could be better to have a channel into which you send the lines that are being read and have a bunch of worker goroutines that read a line from the channel, process it, read another line and so on. Benchmarking should give you a good idea about the right approach.
If each goroutine makes a request to an external resource (such as a web service or a database) you'll have to think about ways to limit the rate of such requests.
Some general comments about the code which I think you should fix before putting it into production:
- The code doesn't have any error checking or a way to report errors from the processing of each input line. You'll have to add those in.
- You're also using a global variable to share the
WaitGroup
, which I think you should avoid. Either pass it in as a function parameter or use an anonymous function that callsprocessLine()
and thenwg.Done()
(instead of callingwg.Done()
inprocessLine()
) - As I mentioned above you may want to consider limiting the number of concurrent goroutines depending on your use case.
processLine
function empty because it's not relevant to the question i'm asking \$\endgroup\$Questions should not contain purely generic, hypothetical code, such as
This is clearly not "...purly generic, hupothetical code". The concurrency code (which is what I'm asking about) is what will be used in my production application \$\endgroup\$