How to remove the extra double quote?

后端未结

关注

 5  1939

In a malformed .csv file, there is a row of data with extra double quotes, e.g. the last line:

Name,Comment
\"Peter\",\"Nice singer\"
\"Paul\",\"Love \"folk\" so


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  孤街浪徒        
                
              
                            
                2021-01-26 03:26
              
            
            
                                                                       
If you're not on Ruby 1.9, or just get tired of regexes sometimes, split the string on ,, strip the first/last quotes, replace remaining "s with _s, re-quote, and join with ,.

(We don't always have to worry about efficiency!)
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦毁少年i        
                
              
                            
                2021-01-26 03:27
              
            
            
                                                                       
$str = '"folk"';

$new = str_replace('"', '', $str);

/* now $new is only folk, without " */

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  面向向阳花        
                
              
                            
                2021-01-26 03:29
              
            
            
                                                                       
In Ruby 1.9, the following works:

result = subject.gsub(/(?<!^|,)"(?!,|$)/, '_')


Previous versions don't have lookbehind assertions.

Explanation:

(?<!^|,)  # Assert that we're not at the start of the line or right after a comma
"         # Match a quote
(?!,|$)   # Assert that we're not at the end of the line or right before a comma


Of course this assumes that we won't run into pathological cases like

"Mary",""Oh," she said"

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  野性不改        
                
              
                            
                2021-01-26 03:29
              
            
            
                                                                       
Unless you have no other choice, get the file regenerated with correct escaping. Any other approach is asking for trouble, because the insertion of unescaped quotes is lossy, and thus cannot be reliably reversed.

If you can't get the file fixed from the source, then Tim Pietzcker's regex is better than nothing, but I strongly recommend that you have your script print all "fixed" lines and check them for errors manually. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  野性不改        
                
              
                            
                2021-01-26 03:41
              
            
            
                                                                       
Meta-strategy:

It's likely the case that the data was manually entered inconsistently, CSV's get messy when people manually enter either field terminators (double quote) or separators (comma) into the field itself.  If you can have the file regenerated, ask them to use an extremely unlikely field begin/end marker, like 5 tilde's (~~~~~), and then you can split on "~~~~~,~~~~~" and get the correct number of fields every time. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复