Converting Docx Files To Text In Swift

陌路散爱 提交于 2021-01-01 08:57:35

问题


I have a .docx file in my temporary storage:

    let location: NSURL = NSURL.fileURLWithPath(NSTemporaryDirectory())
    let file_Name = location.URLByAppendingPathComponent("5 November 2016.docx")

What I now want to do is extract the text inside this document. But I cannot seem to find any converters or methods of doing this.

I have tried this:

    let file_Content = try? NSString(contentsOfFile: String(file_Name), encoding: NSUTF8StringEncoding)
    print(file_Content)

However it prints nil.

So how do I read the text in a docx file?


回答1:


Your initial issue is with how you get the string from the URL. String(File_Name) is not the correct way to convert a file URL into a file path. The proper way is to use the path function.

let location = NSURL.fileURLWithPath(NSTemporaryDirectory())
let fileURL = location.URLByAppendingPathComponent("My File.docx")
let fileContent = try? NSString(contentsOfFile: fileURL.path, encoding: NSUTF8StringEncoding)

Note the many changes. Use proper naming conventions. Name variables more clearly.

Now here's the thing. This still won't work because a docx file is a zipped up collection of XML and other files. You can't load a docx file into an NSString. You would need to use NSData to load the zip contents. Then you would need to unzip it. Then you would need to go through all of the files and find the desired text. It's far from trivial and it is far beyond the scope of a single stack overflow post.




回答2:


Swift 4, Xcode 9.1, OSX targets from 10.10 to 10.13

I have found that the following code extracts text handily from a Word .doc file, which then easily goes into a string. (The attributed string contains formatting information that might be parsed to good effect.) The main info that I wanted to convey was the bit about using .docFormat to specify the document type.

    let openPanel   = NSOpenPanel()
    var fileString  = String("")
    var fileData    = NSData()
    let fileURL     = openPanel.url

    do {
        fileData =  try NSData(contentsOf: fileURL!)
        if let tryForString = try? NSAttributedString(data: fileData as Data, options: [
            .documentType: NSAttributedString.DocumentType.docFormat,
            .characterEncoding: String.Encoding.utf8.rawValue
            ], documentAttributes: nil) {
            fileString = tryForString.string
        } else {
            fileString = "Data conversion error."
        }
        fileString = fileString.trimmingCharacters(in: .whitespacesAndNewlines)
    } catch {
        print("Word Document File Not Found")
    }


来源:https://stackoverflow.com/questions/40443609/converting-docx-files-to-text-in-swift

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!