Parse CSV, ignoring commas inside string literals in VBA?

后端 未结 11 1790
青春惊慌失措
青春惊慌失措 2020-12-30 03:27

I have a VBA application that runs every day. It checks a folder where CSVs are downloaded automatically, and adds their contents to a database. When parsing them, I reali

相关标签:
11条回答
  • 2020-12-30 04:14

    I realize this is an old post, but I just bumped into it looking for a solution to the same problem the OP had, so the thread is still relevant.

    To import data from a CSV, I add a query to a worksheet

    wksTarget.Querytables.add(Connection:=strConn, Destination:=wksTarget.Range("A1"))
    

    then set the appropriate Querytable parameters (e.g. Name, FieldNames, RefreshOnOpen, etc.)

    Querytables can handle various delimiters via the TextFileCommaDelimiter, TextFileSemiColonDelimiter and others. And there are a number of other parameters (TextfilePlatform, TextFileTrailingMinusNumbers, TextFileColumnTypes, TextFileDecimalSeparator, TextFileStartRow, TextFileThousandsSeparator) that handle source file idiosyncrasies.

    Relevant to the OP, QueryTables also has a parameter designed to handle commas that are within double quotes - TextFileQualifier = xlTextQualifierDoubleQuote.

    I find QueryTables much simpler than writing code to import the file, split/parse strings or use REGEX expressions.

    All together, a sample code snippet would look something like this:

        strConn = "TEXT;" & "C:\Desktop\SourceFile.CSV"
        varDataTypes = Array(5, 1, 1, 1, 1, 1, 5, 5)
        With wksTarget.QueryTables.Add(Connection:=strConn, _ 
             Destination:=wksTarget.Range("A1"))
            .Name = "ImportCSV"
            .FieldNames = True
            .RefreshOnFileOpen = False
            .SaveData = True
            .TextFilePlatform = xlMSDOS
            .TextFileStartRow = 1
            .TextFileParseType = xlDelimited
            .TextFileCommaDelimiter = True
            .TextFileTextQualifier = xlTextQualifierDoubleQuote
            .TextFileColumnDataTypes = varDataTypes
            .Refresh BackgroundQuery:=False
        End With
    

    I prefer to delete the QueryTable once the data is imported (wksTarget.QueryTable("ImportCSV").Delete), but I suppose it could be created just once and then simply refreshed if the source and destinations for the data don't change.

    0 讨论(0)
  • 2020-12-30 04:18

    If the source CSV has every field in double quotes, then split(strLine, """, """) may work well

    0 讨论(0)
  • 2020-12-30 04:19

    I know this is an old post, but thought this may help others. This was plagiarized/revised from http://n3wt0n.com/blog/comma-separated-values-and-quoted-commas-in-vbscript/, but works really well and is set as a function that you can pass your input line to.

    Function SplitCSVLineToArray(Line, RemoveQuotes) 'Pass it a line and whether or not to remove the quotes
        ReplacementString = "#!#!#"  'Random String that we should never see in our file
        LineLength = Len(Line)
        InQuotes = False
        NewLine = ""
        For x = 1 to LineLength 
            CurrentCharacter = Mid(Line,x,1)
            If CurrentCharacter = Chr(34) then  
                If InQuotes then
                    InQuotes = False
                Else
                    InQuotes = True
                End If
            End If
            If InQuotes Then 
                CurrentCharacter = Replace(CurrentCharacter, ",", ReplacementString)
            End If
            NewLine = NewLine & CurrentCharacter
        Next    
        LineArray = split(NewLine,",")
        For x = 0 to UBound(LineArray)
            LineArray(x) = Replace(LineArray(x), ReplacementString, ",")
            If RemoveQuotes = True then 
                LineArray(x) = Replace(LineArray(x), Chr(34), "")
            End If
        Next 
        SplitCSVLineToArray = LineArray
    End Function
    
    0 讨论(0)
  • 2020-12-30 04:19

    I made another variant of solution for parsing CSV files with "quoted" text strings with possible delimiters, like comma inside the double quotes. This method doesn't require regex expressions, or any other addons. Also, this code deals with multiple commas in between the quotes. Here is Subroutine for testing:

    Sub SubstituteBetweenQuotesSub()
    'In-string character replacement function by Maryan Hutsul      1/29/2019
    Dim quote, quoteTwo As Integer
    Dim oddEven As Integer
    Dim i, counter As Integer
    Dim byteArray() As Byte
    
    'LineItems are lines of text read from CSV file, or any other text string
    LineItems = ",,,2019NoApocalypse.ditamap,jesus.christ@sky.com,Approver,""JC, ,Son"",Reviewer,god.allmighty@sky.com,""God, All-Mighty,"",2019-01-29T08:47:29.290-05:00"
    
    quote = 1
    oddEven = 0
    
    Do Until quote = 0
    quote = InStr(quote, LineItems, Chr(34))
    quoteTwo = InStr(quote + 1, LineItems, Chr(34))
    
    oddEven = oddEven + 1
        If oddEven Mod 2 = 1 And quote <> 0 Then
    
            counter = 0
            For i = quote To quoteTwo
                byteArray = StrConv(LineItems, vbFromUnicode)
                If i <> 0 Then
                    If byteArray(i - 1) = 44 Then   '44 represents comma, can also do Chr(44)
                    counter = counter + 1
                    End If
                End If
            Next i
    
            LineItems = Left(LineItems, quote - 1) & Replace(LineItems, ",", ";", quote, counter)
            quote = quote + 1
        ElseIf quote <> 0 Then
            quote = quote + 1
        End If
    Loop
    
    End Sub
    

    Here is function to which you can pass lines from .csv, .txt or any other text files:

    Function SubstituteBetweenQuotes(LineItems)
    'In-string character replacement function by Maryan Hutsul                                          1/29/2019
    'LineItems are lines of text read from CSV file, or any other text string
    Dim quote, quoteTwo As Integer
    Dim oddEven As Integer
    Dim i, counter As Integer
    Dim byteArray() As Byte
    
    
    quote = 1
    oddEven = 0
    
    Do Until quote = 0
    quote = InStr(quote, LineItems, Chr(34))
    quoteTwo = InStr(quote + 1, LineItems, Chr(34))
    
    oddEven = oddEven + 1
        If oddEven Mod 2 = 1 And quote <> 0 Then
    
            counter = 0
            For i = quote To quoteTwo
                byteArray = StrConv(LineItems, vbFromUnicode)
                If i <> 0 Then
                    If byteArray(i - 1) = 44 Then   '44 represents "," comma, can also do Chr(44)
                    counter = counter + 1
                    End If
                End If
            Next i
    
            LineItems = Left(LineItems, quote - 1) & Replace(LineItems, ",", ";", quote, counter)
            quote = quote + 1
        ElseIf quote <> 0 Then
            quote = quote + 1
        End If
    Loop
    
    SubstituteBetweenQuotes = LineItems
    
    End Function
    

    And below is code for reading CSV file with function used:

    Dim fullFilePath As String
    Dim i As Integer
    
    'fullFilePath - full link to your input CSV file
    Open fullFilePath For Input As #1
    row_number = 0
    column_number = 0
    'EOF - End Of File  (1) - file #1
    Do Until EOF(1)
        Line Input #1, LineFromFile
                LineItems = Split(SubstituteBetweenQuotes(LineFromFile), ",")
        For i = LBound(LineItems) To UBound(LineItems)
        ActiveCell.Offset(row_number, i).Value = LineItems(i)
        Next i
        row_number = row_number + 1
    Loop
    Close #1
    

    All delimiters and replacement character may be modified for your needs. I Hope this is useful as I had quite a journey to solve some problems with CSV imports

    0 讨论(0)
  • 2020-12-30 04:22

    Try This! Make sure to have the "Microsoft VBScript Regular Expressions 5.5" ticked on References under Tools.

    enter image description here

    Function Splitter(line As String, n As Integer)
    Dim s() As String
    Dim regex As Object
        Set regex = CreateObject("vbscript.regexp")
        regex.IgnoreCase = True
        regex.Global = True
        regex.Pattern = ",(?=([^\""]*\""[^\""]*\"")*[^\""]*$)"
        s = split(regex.Replace(line, "|/||\|"), "|/||\|")
        Splitter = s(n - 1)
    End Function
    
    0 讨论(0)
提交回复
热议问题