问题
I have a .docx file in my temporary storage:
let location: NSURL = NSURL.fileURLWithPath(NSTemporaryDirectory())
let file_Name = location.URLByAppendingPathComponent("5 November 2016.docx")
What I now want to do is extract the text inside this document. But I cannot seem to find any converters or methods of doing this.
I have tried this:
let file_Content = try? NSString(contentsOfFile: String(file_Name), encoding: NSUTF8StringEncoding)
print(file_Content)
However it prints nil.
So how do I read the text in a docx file?
回答1:
Your initial issue is with how you get the string from the URL. String(File_Name)
is not the correct way to convert a file URL into a file path. The proper way is to use the path
function.
let location = NSURL.fileURLWithPath(NSTemporaryDirectory())
let fileURL = location.URLByAppendingPathComponent("My File.docx")
let fileContent = try? NSString(contentsOfFile: fileURL.path, encoding: NSUTF8StringEncoding)
Note the many changes. Use proper naming conventions. Name variables more clearly.
Now here's the thing. This still won't work because a docx file is a zipped up collection of XML and other files. You can't load a docx file into an NSString
. You would need to use NSData
to load the zip contents. Then you would need to unzip it. Then you would need to go through all of the files and find the desired text. It's far from trivial and it is far beyond the scope of a single stack overflow post.
回答2:
Swift 4, Xcode 9.1, OSX targets from 10.10 to 10.13
I have found that the following code extracts text handily from a Word .doc file, which then easily goes into a string. (The attributed string contains formatting information that might be parsed to good effect.) The main info that I wanted to convey was the bit about using .docFormat to specify the document type.
let openPanel = NSOpenPanel()
var fileString = String("")
var fileData = NSData()
let fileURL = openPanel.url
do {
fileData = try NSData(contentsOf: fileURL!)
if let tryForString = try? NSAttributedString(data: fileData as Data, options: [
.documentType: NSAttributedString.DocumentType.docFormat,
.characterEncoding: String.Encoding.utf8.rawValue
], documentAttributes: nil) {
fileString = tryForString.string
} else {
fileString = "Data conversion error."
}
fileString = fileString.trimmingCharacters(in: .whitespacesAndNewlines)
} catch {
print("Word Document File Not Found")
}
来源:https://stackoverflow.com/questions/40443609/converting-docx-files-to-text-in-swift