Counting words from standard input in Swift

Question 1

I know Swift isn't exactly meant to be used to write your classic stdin to stdout scripts, and that Python, Ruby, Perl, bash, awk, and friends are much better in this area, but I'd still like to see how well it can be done.

The problem I chose is: Read from standard input, and write to standard output a space-separated report of each of the words (lowercased) and their counts, sorted by word. For simplicity, words are defined as containing Basic Latin letters (U+0061 through U+007A) and the apostrophe (U+0027) only.

In Ruby, just for illustration, we can do this:

counts = Hash.new(0)
ARGF.each do |line|
 line.downcase.scan /[a-z']+/ {|word| counts[word] += 1}
end
counts.sort.each do |word, count|
 puts "#{word} #{count}"
end

Now, struggling with Swift, all I can come up with is this mess:

import Foundation
let standardInput = NSFileHandle.fileHandleWithStandardInput()
let input = standardInput.availableData
let text = String(data: input, encoding: NSUTF8StringEncoding)!
var counts = [String: Int]()
func isWordChar (c: Character) -> Bool {
 return "abcdefghijklmnopqrstuvwxyz'".rangeOfString("\(c)") != nil
}
for word in (text.lowercaseString.characters.split{!isWordChar(0ドル)}.map(String.init)) {
 if let count = counts[word] {
 counts[word] = count + 1
 } else {
 counts[word] = 1
 }
}
for (word, count) in (counts.sort { 0ドル.0 < 1ドル.0 }) {
 print("\(word) \(count)")
}

There are so many problems with this, particularly:

I'm not sure how to read "line by line" or at least in chunks from stdin. Ruby and Python give you this for free. My Swift code reads up to the theoretical 8EiB limit, and as we know, with big data we want to process as we read rather than slurping in the whole file first.
It appears Swift has no native regex support, so rather than using all the Objective-C regex support, I went with split which works on Swift strings. Is there a simple way to use the regexes instead of splitting?
I think my use of split with the little helper can be improved.
I'm not happy with my use of ! to unwrap the optional string.
Is there a better way to do the counting with the dictionary? Ruby's default value for hash lookups is fabulous here... Can Swift do the same?

Can the Swift code here be professionalized?

Question 2

Regarding standard input, see Martin's answer below.

As for splitting strings into words and returning them as a sorted dictionary with word counts, that sounds like a useful thing to have around as a String extension. This is simple to implement unless you insist on pure Standard Library solution:

import Foundation
public extension String {
 public var wordCounts: [String:Int] {
 var d: [String:Int] = [:]
 enumerateSubstringsInRange(characters.indices, options: .ByWords) { word, _, _, _ in
 guard let word = word?.lowercaseString else { return }
 d[word] = (d[word] ?? 0) + 1
 }
 return d
 }
 public var sortedWordCounts: [(word: String, count: Int)] {
 return wordCounts.sort{ 0ドル.0 < 1ドル.0 }.map{ (word: 0,ドル count: 1ドル) }
 }
}

Usage as follows (to produce the same output as your code):

if let input = readLine() {
 let report = input.sortedWordCounts.map{"\(0ドル) \(1ドル)"}.joinWithSeparator("\n")
 print(report)
}

For example, the following input ↓

Baa, baa, black sheep,/ Have you any wool?/ Yes, sir, yes, sir,/ Three bags full

...yields the following output ↓

any 1
baa 2
bags 1
black 1
full 1
have 1
sheep 1
sir 2
three 1
wool 1
yes 2
you 1

Edit

As Martin points out in the comment, your Ruby code reads multiple lines, or even multiple files... Here is a version that would do something like that, though you’ll of course need to adapt it to your needs. Note that I’ve added that eof so that we can play with this in the Xcode.

public func += <T> (inout lhs: [T:Int], rhs: [T:Int]) {
 for (k, i) in rhs {
 lhs[k] = (lhs[k] ?? 0) + i
 }
}
print("Type your terminating token or just type return if an empty line works for you:")
let eof = readLine()
print("Enter your lines:")
var wordCounts: [String:Int] = [:]
while let line = readLine() where line != eof {
 wordCounts += line.wordCounts
}
let report = wordCounts
 .sort{ 0ドル.0 < 1ドル.0 }
 .map{"\(0ドル) \(1ドル)"}
 .joinWithSeparator("\n")
print(report)

Question 3

From the given Ruby sample program I assume that the counts should be accumulated over all input lines.

Question 4

Right, the Ruby script takes advantage of the wonderful ARGF so that I can pipe standard input in (the way I was intending to do the Swift program -- oh man how did I not know about readLine?) but yes ARGF is also super general and will read multiple files if given multiple file names for command line arguments. So readLine() will do for Swift (stdin redirection is what I am playing with). These answers have a lot of nice observations and idioms in them.

Question 5

There is a problem with your use of

let input = standardInput.availableData

to read the input. If standard input is a regular file then this will read the entire file contents. But if standard input is a tty (e.g. a Terminal window) then it will just wait until a single line is entered and return. You have to repeat the call until availableData returns an empty data object. And if standard input is some other communications channel (e.g. a pipe) then the only thing you know is that it returns at least one byte (which may be a incomplete UTF-8 sequence).

There is already a function for that purpose: readLine() reads from standard input and returns each line as a Swift String (or nil on EOF). So your main loop would be:

while let line = readLine() {
 // count words in `line` ...
}

Your isWordChar() function can be simplified to

func isWordChar (c: Character) -> Bool {
 return "abcdefghijklmnopqrstuvwxyz'".characters.contains(c)
}

instead of converting the Character to String and searching that as a substring. (But using enumerateSubstringsInRange() instead of your own splitting function as suggested in @milos' answer is probably the better way to go.)

Milos Milos 6395 silver badges18 bronze badges · Accepted Answer · 2015-10-26 11:27:04Z

Regarding standard input, see Martin's answer below.

As for splitting strings into words and returning them as a sorted dictionary with word counts, that sounds like a useful thing to have around as a String extension. This is simple to implement unless you insist on pure Standard Library solution:

import Foundation
public extension String {
 public var wordCounts: [String:Int] {
 var d: [String:Int] = [:]
 enumerateSubstringsInRange(characters.indices, options: .ByWords) { word, _, _, _ in
 guard let word = word?.lowercaseString else { return }
 d[word] = (d[word] ?? 0) + 1
 }
 return d
 }
 public var sortedWordCounts: [(word: String, count: Int)] {
 return wordCounts.sort{ 0ドル.0 < 1ドル.0 }.map{ (word: 0,ドル count: 1ドル) }
 }
}

Usage as follows (to produce the same output as your code):

if let input = readLine() {
 let report = input.sortedWordCounts.map{"\(0ドル) \(1ドル)"}.joinWithSeparator("\n")
 print(report)
}

For example, the following input ↓

Baa, baa, black sheep,/ Have you any wool?/ Yes, sir, yes, sir,/ Three bags full

...yields the following output ↓

any 1
baa 2
bags 1
black 1
full 1
have 1
sheep 1
sir 2
three 1
wool 1
yes 2
you 1

Edit

As Martin points out in the comment, your Ruby code reads multiple lines, or even multiple files... Here is a version that would do something like that, though you’ll of course need to adapt it to your needs. Note that I’ve added that eof so that we can play with this in the Xcode.

public func += <T> (inout lhs: [T:Int], rhs: [T:Int]) {
 for (k, i) in rhs {
 lhs[k] = (lhs[k] ?? 0) + i
 }
}
print("Type your terminating token or just type return if an empty line works for you:")
let eof = readLine()
print("Enter your lines:")
var wordCounts: [String:Int] = [:]
while let line = readLine() where line != eof {
 wordCounts += line.wordCounts
}
let report = wordCounts
 .sort{ 0ドル.0 < 1ドル.0 }
 .map{"\(0ドル) \(1ドル)"}
 .joinWithSeparator("\n")
print(report)

From the given Ruby sample program I assume that the counts should be accumulated over all input lines.
Right, the Ruby script takes advantage of the wonderful ARGF so that I can pipe standard input in (the way I was intending to do the Swift program -- oh man how did I not know about readLine?) but yes ARGF is also super general and will read multiple files if given multiple file names for command line arguments. So readLine() will do for Swift (stdin redirection is what I am playing with). These answers have a lot of nice observations and idioms in them.

Stack Exchange Network

Counting words from standard input in Swift

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Counting words from standard input in Swift

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions