How to remove the extra double quote?

后端 未结 5 1939
再見小時候
再見小時候 2021-01-26 02:37

In a malformed .csv file, there is a row of data with extra double quotes, e.g. the last line:

Name,Comment
\"Peter\",\"Nice singer\"
\"Paul\",\"Love \"folk\" so         


        
相关标签:
5条回答
  • 2021-01-26 03:26

    If you're not on Ruby 1.9, or just get tired of regexes sometimes, split the string on ,, strip the first/last quotes, replace remaining "s with _s, re-quote, and join with ,.

    (We don't always have to worry about efficiency!)

    0 讨论(0)
  • 2021-01-26 03:27
    $str = '"folk"';
    
    $new = str_replace('"', '', $str);
    
    /* now $new is only folk, without " */
    
    0 讨论(0)
  • 2021-01-26 03:29

    In Ruby 1.9, the following works:

    result = subject.gsub(/(?<!^|,)"(?!,|$)/, '_')
    

    Previous versions don't have lookbehind assertions.

    Explanation:

    (?<!^|,)  # Assert that we're not at the start of the line or right after a comma
    "         # Match a quote
    (?!,|$)   # Assert that we're not at the end of the line or right before a comma
    

    Of course this assumes that we won't run into pathological cases like

    "Mary",""Oh," she said"
    
    0 讨论(0)
  • 2021-01-26 03:29

    Unless you have no other choice, get the file regenerated with correct escaping. Any other approach is asking for trouble, because the insertion of unescaped quotes is lossy, and thus cannot be reliably reversed.

    If you can't get the file fixed from the source, then Tim Pietzcker's regex is better than nothing, but I strongly recommend that you have your script print all "fixed" lines and check them for errors manually.

    0 讨论(0)
  • 2021-01-26 03:41

    Meta-strategy:

    It's likely the case that the data was manually entered inconsistently, CSV's get messy when people manually enter either field terminators (double quote) or separators (comma) into the field itself. If you can have the file regenerated, ask them to use an extremely unlikely field begin/end marker, like 5 tilde's (~~~~~), and then you can split on "~~~~~,~~~~~" and get the correct number of fields every time.

    0 讨论(0)
提交回复
热议问题