问题
After reading a medium sized file (about 500kByte) from a web-service I have a regular Swift String (lines
) originally encoded in .isolatin1
. Before actually splitting it I would like to count the number of lines (quickly) in order to be able to initialise a progress bar.
What is the best Swift idiom to achieve this?
I came up with the following:
let linesCount = lines.reduce(into: 0) { (count, letter) in
if letter == "\r\n" {
count += 1
}
}
This does not look too bad but I am asking myself if there is a shorter/faster way to do it. The characters
property provides access to a sequence of Unicode graphemes which treat \r\n
as only one entity. Checking this with all CharacterSet.newlines
does not work, since CharacterSet
is not a set of Character
but a set of Unicode.Scalar
(a little counter-intuitively in my book) which is a set of code points (where \r\n counts as two code points), not graphemes. Trying
var lines = "Hello, playground\r\nhere too\r\nGalahad\r\n"
lines.unicodeScalars.reduce(into: 0) { (cnt, letter) in
if CharacterSet.newlines.contains(letter) {
cnt += 1
}
}
will count to 6 instead of 3. So this is more general than the above method, but it will not work correctly for CRLF line endings.
Is there a way to allow for more line ending conventions (as in CharacterSet.newlines
) that still achieves the correct result for CRLF? Can the number of lines be computed with less code (while still remaining readable)?
回答1:
If it's ok for you to use a Foundation method on an NSString, I suggest using
enumerateLines(_ block: @escaping (String, UnsafeMutablePointer<ObjCBool>) -> Void)
Here's an example:
import Foundation
let base = "Hello, playground\r\nhere too\r\nGalahad\r\n"
let ns = base as NSString
ns.enumerateLines { (str, _) in
print(str)
}
It separates the lines properly, taking into account all linefeed types, such as "\r\n", "\n", etc:
Hello, playground
here too
Galahad
In my example I print the lines but it's trivial to count them instead, as you need to - my version is just for the demonstration.
回答2:
As I did not find a generic way to count newlines I ended up just solving my problem by iterating through all the characters using
let linesCount = text.reduce(into: 0) { (count, letter) in
if letter == "\r\n" { // This treats CRLF as one "letter", contrary to UnicodeScalars
count += 1
}
}
I was sure this would be a lot faster than enumerating lines for just counting, but I resolved to eventually do the measurement. Today I finally got to it and found ... that I could not have been more wrong.
A 10000 line string counted lines as above in about 1.0 seconds , but counting through enumeration using
var enumCount = 0
text.enumerateLines { (str, _) in
enumCount += 1
}
only took around 0.8 seconds and was consistently faster by a little more than 20%. I do not know what tricks the Swift engineers hide in their sleves, but they sure manage to enumerateLines
very quickly. This just for the record.
来源:https://stackoverflow.com/questions/46490920/count-the-number-of-lines-in-a-swift-string