Import CSV file with mixed data types

前端 未结 9 2207
眼角桃花
眼角桃花 2020-11-27 04:13

I\'m working with MATLAB for few days and I\'m having difficulties to import a CSV-file to a matrix.

My problem is that my CSV-file contains almost only Strings and

相关标签:
9条回答
  • 2020-11-27 04:51

    I recommend looking at the dataset array.

    The dataset array is a data type that ships with Statistics Toolbox. It is specifically designed to store hetrogeneous data in a single container.

    The Statistics Toolbox demo page contains a couple vidoes that show some of the dataset array features. The first is titled "An Introduction to Dataset Arrays". The second is titled "An Introduction to Joins".

    http://www.mathworks.com/products/statistics/demos.html

    0 讨论(0)
  • 2020-11-27 05:05

    For the case when you know how many columns of data there will be in your CSV file, one simple call to textscan like Amro suggests will be your best solution.

    However, if you don't know a priori how many columns are in your file, you can use a more general approach like I did in the following function. I first used the function fgetl to read each line of the file into a cell array. Then I used the function textscan to parse each line into separate strings using a predefined field delimiter and treating the integer fields as strings for now (they can be converted to numeric values later). Here is the resulting code, placed in a function read_mixed_csv:

    function lineArray = read_mixed_csv(fileName, delimiter)
    
      fid = fopen(fileName, 'r');         % Open the file
      lineArray = cell(100, 1);           % Preallocate a cell array (ideally slightly
                                          %   larger than is needed)
      lineIndex = 1;                      % Index of cell to place the next line in
      nextLine = fgetl(fid);              % Read the first line from the file
      while ~isequal(nextLine, -1)        % Loop while not at the end of the file
        lineArray{lineIndex} = nextLine;  % Add the line to the cell array
        lineIndex = lineIndex+1;          % Increment the line index
        nextLine = fgetl(fid);            % Read the next line from the file
      end
      fclose(fid);                        % Close the file
    
      lineArray = lineArray(1:lineIndex-1);              % Remove empty cells, if needed
      for iLine = 1:lineIndex-1                          % Loop over lines
        lineData = textscan(lineArray{iLine}, '%s', ...  % Read strings
                            'Delimiter', delimiter);
        lineData = lineData{1};                          % Remove cell encapsulation
        if strcmp(lineArray{iLine}(end), delimiter)      % Account for when the line
          lineData{end+1} = '';                          %   ends with a delimiter
        end
        lineArray(iLine, 1:numel(lineData)) = lineData;  % Overwrite line data
      end
    
    end
    

    Running this function on the sample file content from the question gives this result:

    >> data = read_mixed_csv('myfile.csv', ';')
    
    data = 
    
      Columns 1 through 7
    
        '04'    'abc'    'def'    'ghj'    'klm'    ''            ''        
        ''      ''       ''       ''       ''       'Test'        'text'    
        ''      ''       ''       ''       ''       'asdfhsdf'    'dsafdsag'
    
      Columns 8 through 10
    
        ''          ''    ''
        '0xFF'      ''    ''
        '0x0F0F'    ''    ''
    

    The result is a 3-by-10 cell array with one field per cell where missing fields are represented by the empty string ''. Now you can access each cell or a combination of cells to format them as you like. For example, if you wanted to change the fields in the first column from strings to integer values, you could use the function str2double as follows:

    >> data(:, 1) = cellfun(@(s) {str2double(s)}, data(:, 1))
    
    data = 
    
      Columns 1 through 7
    
        [  4]    'abc'    'def'    'ghj'    'klm'    ''            ''        
        [NaN]    ''       ''       ''       ''       'Test'        'text'    
        [NaN]    ''       ''       ''       ''       'asdfhsdf'    'dsafdsag'
    
      Columns 8 through 10
    
        ''          ''    ''
        '0xFF'      ''    ''
        '0x0F0F'    ''    ''
    

    Note that the empty fields results in NaN values.

    0 讨论(0)
  • 2020-11-27 05:08

    Given the sample you posted, this simple code should do the job:

    fid = fopen('file.csv','r');
    C = textscan(fid, repmat('%s',1,10), 'delimiter',';', 'CollectOutput',true);
    C = C{1};
    fclose(fid);
    

    Then you could format the columns according to their type. For example if the first column is all integers, we can format it as such:

    C(:,1) = num2cell( str2double(C(:,1)) )
    

    Similarly, if you wish to convert the 8th column from hex to decimals, you can use HEX2DEC:

    C(:,8) = cellfun(@hex2dec, strrep(C(:,8),'0x',''), 'UniformOutput',false);
    

    The resulting cell array looks as follows:

    C = 
        [  4]    'abc'    'def'    'ghj'    'klm'    ''            ''                []    ''    ''
        [NaN]    ''       ''       ''       ''       'Test'        'text'        [ 255]    ''    ''
        [NaN]    ''       ''       ''       ''       'asdfhsdf'    'dsafdsag'    [3855]    ''    ''
    
    0 讨论(0)
  • 2020-11-27 05:11

    In R2013b or later you can use a table:

    >> table = readtable('myfile.txt','Delimiter',';','ReadVariableNames',false)
    >> table = 
    
        Var1    Var2     Var3     Var4     Var5        Var6          Var7         Var8      Var9    Var10
        ____    _____    _____    _____    _____    __________    __________    ________    ____    _____
    
          4     'abc'    'def'    'ghj'    'klm'    ''            ''            ''          NaN     NaN  
        NaN     ''       ''       ''       ''       'Test'        'text'        '0xFF'      NaN     NaN  
        NaN     ''       ''       ''       ''       'asdfhsdf'    'dsafdsag'    '0x0F0F'    NaN     NaN  
    

    Here is more info.

    0 讨论(0)
  • 2020-11-27 05:11

    If your input file has a fixed amount of columns separated by commas and you know in which columns are the strings it might be best to use the function

    textscan()
    

    Note that you can specify a format where you read up to a maximum number of characters in the string or until a delimiter (comma) is found.

    0 讨论(0)
  • 2020-11-27 05:11
    % Assuming that the dataset is ";"-delimited and each line ends with ";"
    fid = fopen('sampledata.csv');
    tline = fgetl(fid);
    u=sprintf('%c',tline); c=length(u);
    id=findstr(u,';'); n=length(id);
    data=cell(1,n);
    for I=1:n
        if I==1
            data{1,I}=u(1:id(I)-1);
        else
            data{1,I}=u(id(I-1)+1:id(I)-1);
        end
    end
    ct=1;
    while ischar(tline)
        ct=ct+1;
        tline = fgetl(fid);
        u=sprintf('%c',tline);
        id=findstr(u,';');
        if~isempty(id)
            for I=1:n
                if I==1
                    data{ct,I}=u(1:id(I)-1);
                else
                    data{ct,I}=u(id(I-1)+1:id(I)-1);
                end
            end
        end
    end
    fclose(fid);
    
    0 讨论(0)
提交回复
热议问题