How can I parse a CSV string with JavaScript, which contains comma in data?

前端 未结 17 869
不知归路
不知归路 2020-11-22 01:52

I have the following type of string

var string = "\'string, duppi, du\', 23, lala"

I want to split the string into an array on each

相关标签:
17条回答
  • 2020-11-22 02:02

    My answer presumes your input is a reflection of code/content from web sources where single and double quote characters are fully interchangeable provided they occur as an non-escaped matching set.

    You cannot use regex for this. You actually have to write a micro parser to analyze the string you wish to split. I will, for the sake of this answer, call the quoted parts of your strings as sub-strings. You need to specifically walk across the string. Consider the following case:

    var a = "some sample string with \"double quotes\" and 'single quotes' and some craziness like this: \\\" or \\'",
        b = "sample of code from JavaScript with a regex containing a comma /\,/ that should probably be ignored.";
    

    In this case you have absolutely no idea where a sub-string starts or ends by simply analyzing the input for a character pattern. Instead you have to write logic to make decisions on whether a quote character is used a quote character, is itself unquoted, and that the quote character is not following an escape.

    I am not going to write that level of complexity of code for you, but you can look at something I recently wrote that has the pattern you need. This code has nothing to do with commas, but is otherwise a valid enough micro-parser for you to follow in writing your own code. Look into the asifix function of the following application:

    https://github.com/austincheney/Pretty-Diff/blob/master/fulljsmin.js

    0 讨论(0)
  • 2020-11-22 02:02

    I have also faced the same type of problem when I had to parse a CSV file.

    The file contains a column address which contains the ',' .

    After parsing that CSV file to JSON, I get mismatched mapping of the keys while converting it into a JSON file.

    I used Node.js for parsing the file and libraries like baby parse and csvtojson.

    Example of file -

    address,pincode
    foo,baar , 123456
    

    While I was parsing directly without using baby parse in JSON, I was getting:

    [{
     address: 'foo',
     pincode: 'baar',
     'field3': '123456'
    }]
    

    So I wrote code which removes the comma(,) with any other delimiter with every field:

    /*
     csvString(input) = "address, pincode\\nfoo, bar, 123456\\n"
     output = "address, pincode\\nfoo {YOUR DELIMITER} bar, 123455\\n"
    */
    const removeComma = function(csvString){
        let delimiter = '|'
        let Baby = require('babyparse')
        let arrRow = Baby.parse(csvString).data;
        /*
          arrRow = [
          [ 'address', 'pincode' ],
          [ 'foo, bar', '123456']
          ]
        */
        return arrRow.map((singleRow, index) => {
            //the data will include
            /*
            singleRow = [ 'address', 'pincode' ]
            */
            return singleRow.map(singleField => {
                //for removing the comma in the feild
                return singleField.split(',').join(delimiter)
            })
        }).reduce((acc, value, key) => {
            acc = acc +(Array.isArray(value) ?
             value.reduce((acc1, val)=> {
                acc1 = acc1+ val + ','
                return acc1
            }, '') : '') + '\n';
            return acc;
        },'')
    }

    The function returned can be passed into the csvtojson library and thus the result can be used.

    const csv = require('csvtojson')
    
    let csvString = "address, pincode\\nfoo, bar, 123456\\n"
    let jsonArray = []
    modifiedCsvString = removeComma(csvString)
    csv()
      .fromString(modifiedCsvString)
      .on('json', json => jsonArray.push(json))
      .on('end', () => {
        /* do any thing with the json Array */
      })

    Now you can get the output like:

    [{
      address: 'foo, bar',
      pincode: 123456
    }]
    
    0 讨论(0)
  • 2020-11-22 02:03

    I liked FakeRainBrigand's answer, however it contains a few problems: It can not handle whitespace between a quote and a comma, and does not support 2 consecutive commas. I tried editing his answer but my edit got rejected by reviewers that apparently did not understand my code. Here is my version of FakeRainBrigand's code. There is also a fiddle: http://jsfiddle.net/xTezm/46/

    String.prototype.splitCSV = function() {
            var matches = this.match(/(\s*"[^"]+"\s*|\s*[^,]+|,)(?=,|$)/g);
            for (var n = 0; n < matches.length; ++n) {
                matches[n] = matches[n].trim();
                if (matches[n] == ',') matches[n] = '';
            }
            if (this[0] == ',') matches.unshift("");
            return matches;
    }
    
    var string = ',"string, duppi, du" , 23 ,,, "string, duppi, du",dup,"", , lala';
    var parsed = string.splitCSV();
    alert(parsed.join('|'));
    
    0 讨论(0)
  • 2020-11-22 02:04

    PEG(.js) grammar that handles RFC 4180 examples at http://en.wikipedia.org/wiki/Comma-separated_values:

    start
      = [\n\r]* first:line rest:([\n\r]+ data:line { return data; })* [\n\r]* { rest.unshift(first); return rest; }
    
    line
      = first:field rest:("," text:field { return text; })*
        & { return !!first || rest.length; } // ignore blank lines
        { rest.unshift(first); return rest; }
    
    field
      = '"' text:char* '"' { return text.join(''); }
      / text:[^\n\r,]* { return text.join(''); }
    
    char
      = '"' '"' { return '"'; }
      / [^"]
    

    Test at http://jsfiddle.net/knvzk/10 or https://pegjs.org/online.

    Download the generated parser at https://gist.github.com/3362830.

    0 讨论(0)
  • 2020-11-22 02:04

    If you can have your quote delimiter be double quotes, then this is a duplicate of Example JavaScript code to parse CSV data.

    You can either translate all single-quotes to double-quotes first:

    string = string.replace( /'/g, '"' );
    

    ...or you can edit the regex in that question to recognize single-quotes instead of double-quotes:

    // Quoted fields.
    "(?:'([^']*(?:''[^']*)*)'|" +
    

    However, this assumes certain markup that is not clear from your question. Please clarify what all the various possibilities of markup can be, per my comment on your question.

    0 讨论(0)
  • 2020-11-22 02:04

    Regular expressions to the rescue! These few lines of code handle properly quoted fields with embedded commas, quotes, and newlines based on the RFC 4180 standard.

    function parseCsv(data, fieldSep, newLine) {
        fieldSep = fieldSep || ',';
        newLine = newLine || '\n';
        var nSep = '\x1D';
        var qSep = '\x1E';
        var cSep = '\x1F';
        var nSepRe = new RegExp(nSep, 'g');
        var qSepRe = new RegExp(qSep, 'g');
        var cSepRe = new RegExp(cSep, 'g');
        var fieldRe = new RegExp('(?<=(^|[' + fieldSep + '\\n]))"(|[\\s\\S]+?(?<![^"]"))"(?=($|[' + fieldSep + '\\n]))', 'g');
        var grid = [];
        data.replace(/\r/g, '').replace(/\n+$/, '').replace(fieldRe, function(match, p1, p2) {
            return p2.replace(/\n/g, nSep).replace(/""/g, qSep).replace(/,/g, cSep);
        }).split(/\n/).forEach(function(line) {
            var row = line.split(fieldSep).map(function(cell) {
                return cell.replace(nSepRe, newLine).replace(qSepRe, '"').replace(cSepRe, ',');
            });
            grid.push(row);
        });
        return grid;
    }
    
    const csv = 'A1,B1,C1\n"A ""2""","B, 2","C\n2"';
    const separator = ',';      // field separator, default: ','
    const newline = ' <br /> '; // newline representation in case a field contains newlines, default: '\n' 
    var grid = parseCsv(csv, separator, newline);
    // expected: [ [ 'A1', 'B1', 'C1' ], [ 'A "2"', 'B, 2', 'C <br /> 2' ] ]
    

    Unless stated elsewhere, you don't need a finite state machine. The regular expression handles RFC 4180 properly thanks to positive lookbehind, negative lookbehind, and positive lookahead.

    Clone/download code at https://github.com/peterthoeny/parse-csv-js

    0 讨论(0)
提交回复
热议问题