matlab plot histogram indicating sum of each character inside a file

后端 未结 2 1738
天命终不由人
天命终不由人 2021-01-26 12:52

I have 400 files, each one contains about 500000 character, and those 500000 characters consists only from about 20 letters. I want to make a histogram indicating the most 10 le

2条回答
  •  执笔经年
    2021-01-26 13:23

    Note: This answers the original version of the question (the data consists of 10 letters only; a histogram is wanted). The question was completely changed (the data consists of about 20 letters, and a histogram of the 10 most used letters is wanted).


    If the ten letters are arbitrary and not known in advance, you can't use hist(..., 10). Consider the following example with three arbitrary "letters":

    h = hist([1 2 2 10], 3);
    

    The result is not [1 2 1] as you would expect. The problem is that hist chooses equal-width bins.

    Here are three approaches to do what you want:

    1. You can find the letters with unique and then do the sum with bsxfun:

      letters = unique(part(:)).';             %'// these are the letters in your file
      h = sum(bsxfun(@eq, part(:), letters));   %// count occurrences of each letter
      
    2. The second line of the above approach could be replaced by histc specifying the bin edges:

      letters = unique(part(:)).';
      h = histc(part, letters);
      
    3. Or you could use sparse to do the accumulation:

      t = sparse(1, part, 1);
      [~, letters, h] = find(t);
      

    As an example, for part = [1 2 2 10] either of the above gives the expected result,

    letters =
         1     2    10
    h =
         1     2     1
    

提交回复
热议问题