how to find out if csv file fields are tab delimited or comma delimited

后端 未结 15 944
[愿得一人]
[愿得一人] 2020-12-01 09:46

how to find out if csv file fields are tab delimited or comma delimited. I need php validation for this. Can anyone plz help. Thanks in advance.

相关标签:
15条回答
  • 2020-12-01 10:11

    In my situation users supply csv files which are then entered into an SQL database. They may save an Excel Spreadsheet as comma or tab delimited files. A program converting the spreadsheet to SQL needs to automatically identify whether fields are tab separated or comma

    Many Excel csv export have field headings as the first line. The heading test is unlikely to contain commas except as a delimiter. For my situation I counted the commas and tabs of the first line and use that with the greater number to determine if it is csv or tab

    0 讨论(0)
  • 2020-12-01 10:11

    you can simply use the fgetcsv(); PHP native function in this way:

    function getCsvDelimeter($file)
    {
        if (($handle = fopen($file, "r")) !== FALSE) {
            $delimiters = array(',', ';', '|', ':'); //Put all that need check
    
            foreach ($delimiters AS $item) {
                //fgetcsv() return array with unique index if not found the delimiter
                if (count(fgetcsv($handle, 0, $item, '"')) > 1) {
                    $delimiter = $item;
    
                    break;
                }
            }
        }
    
        return (isset($delimiter) ? $delimiter : null);
    }
    
    0 讨论(0)
  • 2020-12-01 10:16

    If you have a very large file example in GB, head the first few line, put in a temporary file. Open the temporary file in vi

    head test.txt > te1
    vi te1
    
    0 讨论(0)
  • 2020-12-01 10:17

    How about something simple?

    function findDelimiter($filePath, $limitLines = 5){
        $file = new SplFileObject($filePath);
        $delims = $file->getCsvControl();
        return $delims[0];
    }
    
    0 讨论(0)
  • 2020-12-01 10:18

    Aside from the trivial answer that c sv files are always comma-separated - it's in the name, I don't think you can come up with any hard rules. Both TSV and CSV files are sufficiently loosely specified that you can come up with files that would be acceptable as either.

    A\tB,C
    1,2\t3
    

    (Assuming \t == TAB)

    How would you decide whether this is TSV or CSV?

    0 讨论(0)
  • 2020-12-01 10:19

    I used @Jay Bhatt's solution for finding out a csv file's delimiter, but it didn't work for me, so I applied a few fixes and comments for the process to be more understandable.

    See my version of @Jay Bhatt's function:

    function decide_csv_delimiter($file, $checkLines = 10) {
    
        // use php's built in file parser class for validating the csv or txt file
        $file = new SplFileObject($file);
    
        // array of predefined delimiters. Add any more delimiters if you wish
        $delimiters = array(',', '\t', ';', '|', ':');
    
        // store all the occurences of each delimiter in an associative array
        $number_of_delimiter_occurences = array();
    
        $results = array();
    
        $i = 0; // using 'i' for counting the number of actual row parsed
        while ($file->valid() && $i <= $checkLines) {
    
            $line = $file->fgets();
    
            foreach ($delimiters as $idx => $delimiter){
    
                $regExp = '/['.$delimiter.']/';
                $fields = preg_split($regExp, $line);
    
                // construct the array with all the keys as the delimiters
                // and the values as the number of delimiter occurences
                $number_of_delimiter_occurences[$delimiter] = count($fields);
    
            }
    
           $i++;
        }
    
        // get key of the largest value from the array (comapring only the array values)
        // in our case, the array keys are the delimiters
        $results = array_keys($number_of_delimiter_occurences, max($number_of_delimiter_occurences));
    
    
        // in case the delimiter happens to be a 'tab' character ('\t'), return it in double quotes
        // otherwise when using as delimiter it will give an error,
        // because it is not recognised as a special character for 'tab' key,
        // it shows up like a simple string composed of '\' and 't' characters, which is not accepted when parsing csv files
        return $results[0] == '\t' ? "\t" : $results[0];
    }
    

    I personally use this function for helping automatically parse a file with PHPExcel, and it works beautifully and fast.

    I recommend parsing at least 10 lines, for the results to be more accurate. I personally use it with 100 lines, and it is working fast, no delays or lags. The more lines you parse, the more accurate the result gets.

    NOTE: This is just a modifed version of @Jay Bhatt's solution to the question. All credits goes to @Jay Bhatt.

    0 讨论(0)
提交回复
热议问题