Parsing Excel Data in Apple Swift

后端 未结 5 778
说谎
说谎 2020-12-14 05:17

My current workflow involves using Applescript to essentially delimit Excel data and format it into plain text files. We\'re pushing towards an all Swift environment, but I

相关标签:
5条回答
  • 2020-12-14 05:31

    It's somewhat unclear if you're trying to eliminate Excel as a dependency (which is not unreasonable: it costs money and not everyone has it) or AppleScript as a language (totally understandable, but a bad practical move as Apple's alternatives for application automation all suck).

    There are third-party Excel-parsing libraries available for other languages, e.g. I've used Python's openpyxl (for .xlsx files) and xlrd (for .xsl) libraries successfully in my own projects. And I see through the magicks of Googles that someone's written an ObjC framework, DHlibxls, which [assuming no dynamic trickery] should be usable directly from Swift, but I've not used it myself so can't tell you anything more.

    0 讨论(0)
  • 2020-12-14 05:33

    There is no need to export Excel files to CSV for Swift as you can use an existing open-source library for parsing XLSX files. If you use CocoaPods or Swift Package Manager for integrating 3rd-party libraries, CoreXLSX supports those. After the library is integrated, you can use it like this:

    import CoreXLSX
    
    guard let file = XLSXFile(filepath: "./file.xlsx") else {
      fatalError("XLSX file corrupted or does not exist")
    }
    
    for path in try file.parseWorksheetPaths() {
      let ws = try file.parseWorksheet(at: path)
      for row in ws.sheetData.rows {
        for c in row.cells {
          print(c)
        }
      }
    }
    

    This will open file.xlsx and print all cells within that file. You can also filter cells by references and access only cell data that you need for your automation.

    0 讨论(0)
  • 2020-12-14 05:35

    1. Export to plaintext CSV

    If all you're trying to do is extract data from Excel to use elsewhere, as opposed to capturing Excel formulas and formatting, then you probably should not try to read the .xls file. XLS is a complex format. It's good for Excel, not for general data interchange.

    Similarly, you probably don't need to use AppleScript or anything else to integrate with Excel, if all you want to do is save the data as plaintext. Excel already knows how to save data as plaintext. Just use Excel's "Save As" command. (That's what it's called on the Mac. I don't know about PCs.)

    The question is then what plaintext format to use. One obvious choice for this is a plaintext comma-separated value file (CSV) because it's a simple de facto standard (as opposed to a complex official standard like XML). This will make it easy to consume in Swift, or in any other language.

    2. Export in UTF-8 encoding if possible, otherwise as UTF-16

    So how do you do that exactly? Plaintext is wonderfully simple, but one subtlety that you need to keep track of is the text encoding. A text encoding is a way of representing characters in a plaintext file. Unfortunately, you cannot reliably tell the encoding of a file just by inspecting the file, so you need to choose an encoding when you save it and remember to use that encoding when you read it. If you mess this up, accented characters, typographer's quotation marks, dashes, and other non-ASCII characters will get mangled. So what text encoding should you use? The short answer is, you should always use UTF-8 if possible.

    But if you're working with an older version of Excel, then you may not be able to use UTF-8. In that case, you should use UTF-16. In particular, UTF-16 is, I believe, the only export option in Excel 2011 for Mac which produces a predictable result which will not depend in surprising ways on obscure locale settings or Microsoft-specific encodings.

    So if you're on Excel 2011 for Mac, for instance, choose "UTF-16 Unicode Text" from Excel's Save As command.

    This will cause Excel to save the file so that every row is a line of text, and every column is separated by a tab character. (So technically, this is a tab-separated value files, rather than a comma-separated value file.)

    3. Import with Swift

    Now you have a plaintext file, which you know was saved in a UTF-8 (or UTF-16) encoding. So now you can read it and parse it in Swift.

    If your Excel data is complicated, you may need a full-featured CSV parser. The best choice is probably CHCSVParser.

    Using CHCSV, you can parse the file with the following code:

    NSURL * const inputFileURL = [NSURL fileURLWithPath:@"/path/to/exported/file.txt"];
    unichar tabCharacter = '\t';
    NSArray *rows = [NSArray arrayWithContentsOfCSVFile:inputFilePath options:CHCSVParserOptionsSanitizesFields
                                              delimiter:tabCharacter];
    

    (You could also call it from Swift, of course.)

    On the other hand, if you're data is relatively simple (for instance, it has no escaped characters), then you might not need to use an external library at all. You can write some Swift code that parses tab-separated values just by reading in the file as a string, splitting on newlines, and then splitting on tabs.

    This function will take a String representing TSV data and return an array of dictionaries:

    /**
    Reads a multiline, tab-separated String and returns an Array<NSictionary>, taking column names from the first line or an explicit parameter
    */
    func JSONObjectFromTSV(tsvInputString:String, columnNames optionalColumnNames:[String]? = nil) -> Array<NSDictionary>
    {
      let lines = tsvInputString.componentsSeparatedByString("\n")
      guard lines.isEmpty == false else { return [] }
    
      let columnNames = optionalColumnNames ?? lines[0].componentsSeparatedByString("\t")
      var lineIndex = (optionalColumnNames != nil) ? 0 : 1
      let columnCount = columnNames.count
      var result = Array<NSDictionary>()
    
      for line in lines[lineIndex ..< lines.count] {
        let fieldValues = line.componentsSeparatedByString("\t")
        if fieldValues.count != columnCount {
          //      NSLog("WARNING: header has %u columns but line %u has %u columns. Ignoring this line", columnCount, lineIndex,fieldValues.count)
        }
        else
        {
          result.append(NSDictionary(objects: fieldValues, forKeys: columnNames))
        }
        lineIndex = lineIndex + 1
      }
      return result
    }
    

    So you only need to read the file into a string and pass it to this function. That snippet comes from this gist for a tsv-to-json converter. And if you need to know more about which text encodings Microsoft products produce, and which ones Cocoa can auto-detect, then this repo on text encoding contains the research on export specimens which led to the conclusion that UTF-16 is the way to go for old Microsoft products on the Mac.

    (I realize I'm linking to my own repos here. Apologies?)

    0 讨论(0)
  • 2020-12-14 05:43

    In Mac OS X 10.6 Snow Leopard Apple introduced the AppleScriptObjC framework which makes it very easy to interact between Cocoa and AppleScript. AppleScript code and a Objective-C like syntax can be used in the same source file. It's much more convenient than Scripting Bridge and NSAppleScript.

    AppleScriptObjC cannot be used directly in Swift because the command loadAppleScriptObjectiveCScripts of NSBundle is not bridged to Swift.

    However you can use a Objective-C bridge class for example

    ASObjC.h

    @import Foundation;
    @import AppleScriptObjC;
    
    @interface NSObject (Excel)
    - (void)openExcelDocument:(NSString *)filePath;
    - (NSArray *)valueOfUsedRange;
    
    @end
    
    @interface ASObjC : NSObject
    
    + (ASObjC *)sharedASObjC;
    
    @property id Excel;
    
    @end
    

    ASObjC.m

    #import "ASObjC.h"
    
    @implementation ASObjC
    
    + (void)initialize
    {
        if (self == [ASObjC class]) {
            [[NSBundle mainBundle] loadAppleScriptObjectiveCScripts];
        }
    }
    
    + (ASObjC *)sharedASObjC
    {
        static id sharedInstance = nil;
        static dispatch_once_t onceToken;
        dispatch_once(&onceToken, ^{
            sharedInstance = [[ASObjC alloc] init];
        });
    
        return sharedInstance;
    }
    
    - (instancetype)init
    {
        self = [super init];
        if (self) {
            _Excel = NSClassFromString(@"ASExcel");
        }
        return self;
    }
    
    @end
    

    Create a AppleScript source file form the AppleScriptObjC template

    ASExcel.applescript

    script ASExcel
      property parent: class "NSObject"
    
      on openExcelDocument:filePath
        set asFilePath to filePath as text
        tell application "Microsoft Excel"
          set sourceBook to open workbook workbook file name asFilePath
          repeat
            try
              get workbooks
              return
            end try
            delay 0.5
          end repeat
        end tell
      end openDocument
    
      on valueOfUsedRange()
        tell application "Microsoft Excel"
          tell active sheet
            set activeRange to used range
            return value of activeRange
          end tell
        end tell
      end valueOfUsedRange
    
    end script
    

    Link to the AppleScriptObjC framework if necessary.
    Create the Bridging Header and import ASObjC.h

    Then you can call AppleScriptObjC from Swift with

     ASObjC.sharedASObjC().Excel.openExcelDocument("Macintosh HD:Users:MyUser:Path:To:ExcelFile.xlsx")
    

    or

    let excelData = ASObjC.sharedASObjC().Excel.valueOfUsedRange() as! Array<[String]>
    
    0 讨论(0)
  • 2020-12-14 05:44

    You can use ScriptingBridge or NSAppleScript to interact with Apple Scriptable stuff

    ScriptingBridge can generate a header file from the Apple Script dictionary.

    NSAppleScript can execute any AppleScript for you by passing a String

    0 讨论(0)
提交回复
热议问题