What is the best way to convert from CSV to JSON when commas and quotations may be in the fields?

前端 未结 1 526
野趣味
野趣味 2021-01-16 15:14

I want to be able to convert a CSV to JSON. The csv comes in as free text like this (with the newlines):

name,age,booktitle
John,2,Hello World
Mary,3,\"\"Ala         


        
1条回答
  •  说谎
    说谎 (楼主)
    2021-01-16 15:38

    My first guess is to use a regular expression. You can try this one I've just whipped up (regex101 link):

    /(?:[\t ]?)+("+)?(.*?)\1(?:[\t ]?)+(?:,|$)/gm
    

    This can be used to extract fields. So, you can grab headers with it as well. The first capture group is used as an optional quote-grabber with a backreference, so the actual data is in matchAll(regex)[2]. A filter is used to cut off the last match in all cases, since allowing for blank fields (f1,,f3) put a zero-width match at the end. This was easier to get rid of with JavaScript rather than in the regex.

    const csvToJson = (str, headerList, quotechar = '"', delimiter = ',') => {
      const cutlast = (_, i, a) => i < a.length - 1;
      // const regex = /(?:[\t ]?)+("+)?(.*?)\1(?:[\t ]?)+(?:,|$)/gm; // no variable chars
      const regex = new RegExp(`(?:[\\t ]?)+(${quotechar}+)?(.*?)\\1(?:[\\t ]?)+(?:${delimiter}|$)`, 'gm');
      const lines = str.split('\n');
      const headers = headerList || lines.splice(0, 1)[0].match(regex).filter(cutlast);
    
      const list = [];
    
      for (const line of lines) {
        const val = {};
        for (const [i, m] of [...line.matchAll(regex)].filter(cutlast).entries()) {
          // Attempt to convert to Number if possible, also use null if blank
          val[headers[i]] = (m[2].length > 0) ? Number(m[2]) || m[2] : null;
        }
        list.push(val);
      }
    
      return list;
    }
    
    const testString = `name,age,booktitle
    John,,Hello World
    Mary,3,""Alas, What Can I do?""
    Joseph,5,"Waiting, waiting, waiting"
    "Donaldson Jones"   , six,    "Hello, friend!"`;
    
    console.log(csvToJson(testString));
    console.log(csvToJson(testString, ['foo', 'bar', 'baz']));

    As a bonus, I've written this to allow for the passing of a list of strings to use as the headers instead, since I know first hand that not all CSV files have those.


    PS: If you don't like my regex then you can check out this much more complex one that adheres to the CSV standard, instead of just grabbing everything.

    0 讨论(0)
提交回复
热议问题