regex to split line (csv file)

后端 未结 7 1803
予麋鹿
予麋鹿 2020-12-03 12:37

I am not good in regex. Can some one help me out to write regex for me?

I may have values like this while reading csv file.

\"Artist,Name\",Album,12-SCS
\         


        
相关标签:
7条回答
  • 2020-12-03 13:00

    Actually, its pretty easy to match CVS lines with a regex. Try this one out:

    StringCollection resultList = new StringCollection();
    try {
        Regex pattern = new Regex(@"
            # Parse CVS line. Capture next value in named group: 'val'
            \s*                      # Ignore leading whitespace.
            (?:                      # Group of value alternatives.
              ""                     # Either a double quoted string,
              (?<val>                # Capture contents between quotes.
                [^""]*(""""[^""]*)*  # Zero or more non-quotes, allowing 
              )                      # doubled "" quotes within string.
              ""\s*                  # Ignore whitespace following quote.
            |  (?<val>[^,]*)         # Or... zero or more non-commas.
            )                        # End value alternatives group.
            (?:,|$)                  # Match end is comma or EOS", 
            RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);
        Match matchResult = pattern.Match(subjectString);
        while (matchResult.Success) {
            resultList.Add(matchResult.Groups["val"].Value);
            matchResult = matchResult.NextMatch();
        } 
    } catch (ArgumentException ex) {
        // Syntax error in the regular expression
    }
    

    Disclaimer: The regex has been tested in RegexBuddy, (which generated this snippet), and it correctly matches the OP test data, but the C# code logic is untested. (I don't have access to C# tools.)

    0 讨论(0)
  • 2020-12-03 13:02

    Give the TextFieldParser class a look. It's in the Microsoft.VisualBasic assembly and does delimited and fixed width parsing.

    0 讨论(0)
  • 2020-12-03 13:03

    Regex might get overly complex here. Split the line on commas, and then iterate over the resultant bits and concatenate them where "the number of double quotes in the concatenated string" is not even.

    "hello,this",is,"a ""test"""

    ...split...

    "hello | this" | is | "a ""test"""

    ...iterate and merge 'til you've an even number of double quotes...

    "hello,this" - even number of quotes (note comma removed by split inserted between bits)

    is - even number of quotes

    "a ""test""" - even number of quotes

    ...then strip of leading and trailing quote if present and replace "" with ".

    0 讨论(0)
  • 2020-12-03 13:17

    Just adding the solution I worked on this morning.

    var regex = new Regex("(?<=^|,)(\"(?:[^\"]|\"\")*\"|[^,]*)");
    
    foreach (Match m in regex.Matches("<-- input line -->"))
    {
        var s = m.Value; 
    }
    

    As you can see, you need to call regex.Matches() per line. It will then return a MatchCollection with the same number of items you have as columns. The Value property of each match is, obviously, the parsed value.

    This is still a work in progress, but it happily parses CSV strings like:

    2,3.03,"Hello, my name is ""Joshua""",A,B,C,,,D
    
    0 讨论(0)
  • 2020-12-03 13:18

    Regex is not the suitable tool for this. Use a CSV parser. Either the builtin one or a 3rd party one.

    0 讨论(0)
  • 2020-12-03 13:18

    Give CsvHelper a try (a library I maintain). It's available via NuGet.

    You can easily read a CSV file into a custom class collection. It's also very fast.

    var streamReader = // Create a StreamReader to your CSV file
    var csvReader = new CsvReader( streamReader );
    var myObjects = csvReader.GetRecords<MyObject>();
    
    0 讨论(0)
提交回复
热议问题