I know Swift isn't exactly meant to be used to write your classic stdin to stdout scripts, and that Python, Ruby, Perl, bash, awk, and friends are much better in this area, but I'd still like to see how well it can be done.
The problem I chose is: Read from standard input, and write to standard output a space-separated report of each of the words (lowercased) and their counts, sorted by word. For simplicity, words are defined as containing Basic Latin letters (U+0061 through U+007A) and the apostrophe (U+0027) only.
In Ruby, just for illustration, we can do this:
counts = Hash.new(0)
ARGF.each do |line|
line.downcase.scan /[a-z']+/ {|word| counts[word] += 1}
end
counts.sort.each do |word, count|
puts "#{word} #{count}"
end
Now, struggling with Swift, all I can come up with is this mess:
import Foundation
let standardInput = NSFileHandle.fileHandleWithStandardInput()
let input = standardInput.availableData
let text = String(data: input, encoding: NSUTF8StringEncoding)!
var counts = [String: Int]()
func isWordChar (c: Character) -> Bool {
return "abcdefghijklmnopqrstuvwxyz'".rangeOfString("\(c)") != nil
}
for word in (text.lowercaseString.characters.split{!isWordChar(0ドル)}.map(String.init)) {
if let count = counts[word] {
counts[word] = count + 1
} else {
counts[word] = 1
}
}
for (word, count) in (counts.sort { 0ドル.0 < 1ドル.0 }) {
print("\(word) \(count)")
}
There are so many problems with this, particularly:
I'm not sure how to read "line by line" or at least in chunks from stdin. Ruby and Python give you this for free. My Swift code reads up to the theoretical 8EiB limit, and as we know, with big data we want to process as we read rather than slurping in the whole file first.
It appears Swift has no native regex support, so rather than using all the Objective-C regex support, I went with
split
which works on Swift strings. Is there a simple way to use the regexes instead of splitting?I think my use of
split
with the little helper can be improved.I'm not happy with my use of
!
to unwrap the optional string.Is there a better way to do the counting with the dictionary? Ruby's default value for hash lookups is fabulous here... Can Swift do the same?
Can the Swift code here be professionalized?
2 Answers 2
Regarding standard input, see Martin's answer below.
As for splitting strings into words and returning them as a sorted dictionary with word counts, that sounds like a useful thing to have around as a String
extension. This is simple to implement unless you insist on pure Standard Library solution:
import Foundation
public extension String {
public var wordCounts: [String:Int] {
var d: [String:Int] = [:]
enumerateSubstringsInRange(characters.indices, options: .ByWords) { word, _, _, _ in
guard let word = word?.lowercaseString else { return }
d[word] = (d[word] ?? 0) + 1
}
return d
}
public var sortedWordCounts: [(word: String, count: Int)] {
return wordCounts.sort{ 0ドル.0 < 1ドル.0 }.map{ (word: 0,ドル count: 1ドル) }
}
}
Usage as follows (to produce the same output as your code):
if let input = readLine() {
let report = input.sortedWordCounts.map{"\(0ドル) \(1ドル)"}.joinWithSeparator("\n")
print(report)
}
For example, the following input ↓
Baa, baa, black sheep,/ Have you any wool?/ Yes, sir, yes, sir,/ Three bags full
...yields the following output ↓
any 1
baa 2
bags 1
black 1
full 1
have 1
sheep 1
sir 2
three 1
wool 1
yes 2
you 1
Edit
As Martin points out in the comment, your Ruby code reads multiple lines, or even multiple files... Here is a version that would do something like that, though you’ll of course need to adapt it to your needs. Note that I’ve added that eof
so that we can play with this in the Xcode.
public func += <T> (inout lhs: [T:Int], rhs: [T:Int]) {
for (k, i) in rhs {
lhs[k] = (lhs[k] ?? 0) + i
}
}
print("Type your terminating token or just type return if an empty line works for you:")
let eof = readLine()
print("Enter your lines:")
var wordCounts: [String:Int] = [:]
while let line = readLine() where line != eof {
wordCounts += line.wordCounts
}
let report = wordCounts
.sort{ 0ドル.0 < 1ドル.0 }
.map{"\(0ドル) \(1ドル)"}
.joinWithSeparator("\n")
print(report)
-
\$\begingroup\$ From the given Ruby sample program I assume that the counts should be accumulated over all input lines. \$\endgroup\$Martin R– Martin R2015年10月26日 20:10:56 +00:00Commented Oct 26, 2015 at 20:10
-
\$\begingroup\$ Right, the Ruby script takes advantage of the wonderful
ARGF
so that I can pipe standard input in (the way I was intending to do the Swift program -- oh man how did I not know aboutreadLine
?) but yesARGF
is also super general and will read multiple files if given multiple file names for command line arguments. SoreadLine()
will do for Swift (stdin redirection is what I am playing with). These answers have a lot of nice observations and idioms in them. \$\endgroup\$Ray Toal– Ray Toal2015年10月27日 00:26:43 +00:00Commented Oct 27, 2015 at 0:26
There is a problem with your use of
let input = standardInput.availableData
to read the input. If standard input is a regular file then this will read the
entire file contents. But if standard input is a tty (e.g. a Terminal window)
then it will just wait until a single line is entered and return.
You have to repeat the call until availableData
returns an empty
data object. And if standard input is some other communications channel
(e.g. a pipe) then the only thing you know is that it returns at least
one byte (which may be a incomplete UTF-8 sequence).
There is already a function for that purpose: readLine()
reads from
standard input and returns each line as a Swift String
(or nil
on EOF).
So your main loop would be:
while let line = readLine() {
// count words in `line` ...
}
Your isWordChar()
function can be simplified to
func isWordChar (c: Character) -> Bool {
return "abcdefghijklmnopqrstuvwxyz'".characters.contains(c)
}
instead of converting the Character
to String
and searching
that as a substring. (But using enumerateSubstringsInRange()
instead of your own splitting function as suggested in @milos' answer
is probably the better way to go.)