how to find out if csv file fields are tab delimited or comma delimited

后端 未结 15 946
[愿得一人]
[愿得一人] 2020-12-01 09:46

how to find out if csv file fields are tab delimited or comma delimited. I need php validation for this. Can anyone plz help. Thanks in advance.

相关标签:
15条回答
  • 2020-12-01 10:20

    There is no 100% reliable way to detemine this. What you can do is

    • If you have a method to validate the fields you read, try to read a few fields using either separator and validate against your method. If it breaks, use another one.
    • Count the occurrence of tabs or commas in the file. Usually one is significantly higher than the other
    • Last but not least: Ask the user, and allow him to override your guesses.
    0 讨论(0)
  • 2020-12-01 10:20

    You also can use fgetcsv (http://php.net/manual/en/function.fgetcsv.php) passing it a delimiter parameter. If the function returns false it means that the $delimiter parameter wasn't the right one

    sample to check if the delimiter is ';'

    if (($data = fgetcsv($your_csv_handler, 1000, ';')) !== false) { $csv_delimiter = ';'; }
    
    0 讨论(0)
  • 2020-12-01 10:21

    It's too late to answer this question but hope it will help someone.

    Here's a simple function that will return a delimiter of a file.

    function getFileDelimiter($file, $checkLines = 2){
            $file = new SplFileObject($file);
            $delimiters = array(
              ',',
              '\t',
              ';',
              '|',
              ':'
            );
            $results = array();
            $i = 0;
             while($file->valid() && $i <= $checkLines){
                $line = $file->fgets();
                foreach ($delimiters as $delimiter){
                    $regExp = '/['.$delimiter.']/';
                    $fields = preg_split($regExp, $line);
                    if(count($fields) > 1){
                        if(!empty($results[$delimiter])){
                            $results[$delimiter]++;
                        } else {
                            $results[$delimiter] = 1;
                        }   
                    }
                }
               $i++;
            }
            $results = array_keys($results, max($results));
            return $results[0];
        }
    

    Use this function as shown below:

    $delimiter = getFileDelimiter('abc.csv'); //Check 2 lines to determine the delimiter
    $delimiter = getFileDelimiter('abc.csv', 5); //Check 5 lines to determine the delimiter
    

    P.S I have used preg_split() instead of explode() because explode('\t', $value) won't give proper results.

    UPDATE: Thanks for @RichardEB pointing out a bug in the code. I have updated this now.

    0 讨论(0)
  • 2020-12-01 10:22

    Easiest way I answer this is open it in a plain text editor, or in TextMate.

    0 讨论(0)
  • 2020-12-01 10:24

    This is my solution. Its works if you know how many columns you expect. Finally, the separator character is the $actual_separation_character

    $separator_1=",";
    $separator_2=";";
    $separator_3="\t";
    $separator_4=":";
    $separator_5="|";
    
    $separator_1_number=0;
    $separator_2_number=0;
    $separator_3_number=0;
    $separator_4_number=0;
    $separator_5_number=0;
    
    /* YOU NEED TO CHANGE THIS VARIABLE */
    // Expected number of separation character ( 3 colums ==> 2 sepearation caharacter / row )
    $expected_separation_character_number=2;  
    
    
    $file = fopen("upload/filename.csv","r");
    while(! feof($file)) //read file rows
    {
        $row= fgets($file);
    
        $row_1_replace=str_replace($separator_1,"",$row);
        $row_1_length=strlen($row)-strlen($row_1_replace);
    
        if(($row_1_length==$expected_separation_character_number)or($expected_separation_character_number==0)){
        $separator_1_number=$separator_1_number+$row_1_length;
        }
    
        $row_2_replace=str_replace($separator_2,"",$row);
        $row_2_length=strlen($row)-strlen($row_2_replace);
    
        if(($row_2_length==$expected_separation_character_number)or($expected_separation_character_number==0)){
        $separator_2_number=$separator_2_number+$row_2_length;
        }
    
        $row_3_replace=str_replace($separator_3,"",$row);
        $row_3_length=strlen($row)-strlen($row_3_replace);
    
        if(($row_3_length==$expected_separation_character_number)or($expected_separation_character_number==0)){
        $separator_3_number=$separator_3_number+$row_3_length;
        }
    
        $row_4_replace=str_replace($separator_4,"",$row);
        $row_4_length=strlen($row)-strlen($row_4_replace);
    
        if(($row_4_length==$expected_separation_character_number)or($expected_separation_character_number==0)){
        $separator_4_number=$separator_4_number+$row_4_length;
        }
    
        $row_5_replace=str_replace($separator_5,"",$row);
        $row_5_length=strlen($row)-strlen($row_5_replace);
    
        if(($row_5_length==$expected_separation_character_number)or($expected_separation_character_number==0)){
        $separator_5_number=$separator_5_number+$row_5_length;
        }
    
    } // while(! feof($file))  END
    fclose($file);
    
    /* THE FILE ACTUAL SEPARATOR (delimiter) CHARACTER */
    /* $actual_separation_character */
    
    if ($separator_1_number==max($separator_1_number,$separator_2_number,$separator_3_number,$separator_4_number,$separator_5_number)){$actual_separation_character=$separator_1;}
    else if ($separator_2_number==max($separator_1_number,$separator_2_number,$separator_3_number,$separator_4_number,$separator_5_number)){$actual_separation_character=$separator_2;}
    else if ($separator_3_number==max($separator_1_number,$separator_2_number,$separator_3_number,$separator_4_number,$separator_5_number)){$actual_separation_character=$separator_3;}
    else if ($separator_4_number==max($separator_1_number,$separator_2_number,$separator_3_number,$separator_4_number,$separator_5_number)){$actual_separation_character=$separator_4;}
    else if ($separator_5_number==max($separator_1_number,$separator_2_number,$separator_3_number,$separator_4_number,$separator_5_number)){$actual_separation_character=$separator_5;}
    else {$actual_separation_character=";";}
    
    /* 
    if the number of columns more than what you expect, do something ...
    */
    
    if ($expected_separation_character_number>0){
    if ($separator_1_number==0 and $separator_2_number==0 and $separator_3_number==0 and $separator_4_number==0 and $separator_5_number==0){/* do something ! more columns than expected ! */}
    }
    
    0 讨论(0)
  • 2020-12-01 10:27

    When I output a TSV file I author the tabs using \t the same method one would author a line break like \n so that being said I guess a method could be as follows:

    <?php
    $mysource = YOUR SOURCE HERE, file_get_contents() OR HOWEVER YOU WISH TO GET THE SOURCE;
     if(strpos($mysource, "\t") > 0){
       //We have a tab separator
     }else{
       // it might be CSV
     }
    ?>
    

    I Guess this may not be the right manner, because you could have tabs and commas in the actual content as well. It's just an idea. Using regular expressions may be better, although I am not too clued up on that.

    0 讨论(0)
提交回复
热议问题