What is the best way to find the period of a (repeating) list in Mathematica?

前端 未结 9 1202
没有蜡笔的小新
没有蜡笔的小新 2021-01-01 15:12

What is the best way to find the period in a repeating list?

For example:

a = {4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2}

has repeat

相关标签:
9条回答
  • 2021-01-01 15:52

    Please see the comments interspersed with the code on how it works.

    (* True if a has period p *)
    testPeriod[p_, a_] := Drop[a, p] === Drop[a, -p]
    
    (* are all the list elements the same? *)
    homogeneousQ[list_List] := Length@Tally[list] === 1
    homogeneousQ[{}] := Throw[$Failed] (* yes, it's ugly to put this here ... *)
    
    (* auxiliary for findPeriodOfFirstElement[] *)
    reduce[a_] := Differences@Flatten@Position[a, First[a], {1}]
    
    (* the first element occurs every ?th position ? *)
    findPeriodOfFirstElement[a_] := Module[{nl},
      nl = NestWhileList[reduce, reduce[a], ! homogeneousQ[#] &];
      Fold[Total@Take[#2, #1] &, 1, Reverse[nl]]
      ]
    
    (* the period must be a multiple of the period of the first element *)
    period[a_] := Catch@With[{fp = findPeriodOfFirstElement[a]},
       Do[
        If[testPeriod[p, a], Return[p]],
        {p, fp, Quotient[Length[a], 2], fp}
        ]
       ]
    

    Please ask if findPeriodOfFirstElement[] is not clear. I did this independently (for fun!), but now I see that the principle is the same as in Verbeia's solution, except the problem pointed out by Brett is fixed.

    I was testing with

    b = RandomInteger[100, {1000}];
    a = Flatten[{ConstantArray[b, 1000], Take[b, 27]}];
    

    (Note the low integer values: there will be lots of repeating elements within the same period *)


    EDIT: According to Leonid's comment below, another 2-3x speedup (~2.4x on my machine) is possible by using a custom position function, compiled specifically for lists of integers:

    (* Leonid's reduce[] *)
    
    myPosition = Compile[
      {{lst, _Integer, 1}, {val, _Integer}}, 
      Module[{pos = Table[0, {Length[lst]}], i = 1, ctr = 0}, 
        For[i = 1, i <= Length[lst], i++, 
          If[lst[[i]] == val, pos[[++ctr]] = i]
        ]; 
        Take[pos, ctr]
      ], 
      CompilationTarget -> "C", RuntimeOptions -> "Speed"
    ]
    
    reduce[a_] := Differences@myPosition[a, First[a]]
    

    Compiling testPeriod gives a further ~20% speedup in a quick test, but I believe this will depend on the input data:

    Clear[testPeriod]
    testPeriod = 
     Compile[{{p, _Integer}, {a, _Integer, 1}}, 
      Drop[a, p] === Drop[a, -p]]
    
    0 讨论(0)
  • 2021-01-01 15:52

    The following assumes that the cycle starts on the first element and gives the period length and the cycle.

    findCyclingList[a_?VectorQ] :=
      Module[{repeats1, repeats2, cl, cLs, vec}, 
      repeats1 = Flatten@Differences[Position[a, First[a]]];
      repeats2 = Flatten[Position[repeats1, First[repeats1]]]; 
      If[Equal @@ Differences[repeats2] && Length[repeats2] > 2(* 
       is potentially cyclic - first element appears cyclically *),
       cl = Plus @@@ Partition[repeats1, First[Differences[repeats2]]];
       cLs = Partition[a, First[cl]];
       If[SameQ @@ cLs  (* candidate cycles all actually the same *), 
        vec = First[cLs];
        {Length[vec], vec}, $Failed], $Failed]  ]
    

    Testing

    b = RandomInteger[50, {100}];
    a = Join[b, b, b, b, Take[b, 27]];
    
    findCyclingList[a]
    
    {100, {47, 15, 42, 10, 14, 29, 12, 29, 11, 37, 6, 19, 14, 50, 4, 38, 
      23, 3, 41, 39, 41, 17, 32, 8, 18, 37, 5, 45, 38, 8, 39, 9, 26, 33, 
      40, 50, 0, 45, 1, 48, 32, 37, 15, 37, 49, 16, 27, 36, 11, 16, 4, 28,
       31, 46, 30, 24, 30, 3, 32, 31, 31, 0, 32, 35, 47, 44, 7, 21, 1, 22,
       43, 13, 44, 35, 29, 38, 31, 31, 17, 37, 49, 22, 15, 28, 21, 8, 31, 
      42, 26, 33, 1, 47, 26, 1, 37, 22, 40, 27, 27, 16}}
    
    b1 = RandomInteger[10000, {100}]; 
    a1 = Join[b1, b1, b1, b1, Take[b1, 23]];
    
    findCyclingList[a1]
    
    {100, {1281, 5325, 8435, 7505, 1355, 857, 2597, 8807, 1095, 4203, 
      3718, 3501, 7054, 4620, 6359, 1624, 6115, 8567, 4030, 5029, 6515, 
      5921, 4875, 2677, 6776, 2468, 7983, 4750, 7609, 9471, 1328, 7830, 
      2241, 4859, 9289, 6294, 7259, 4693, 7188, 2038, 3994, 1907, 2389, 
      6622, 4758, 3171, 1746, 2254, 556, 3010, 1814, 4782, 3849, 6695, 
      4316, 1548, 3824, 5094, 8161, 8423, 8765, 1134, 7442, 8218, 5429, 
      7255, 4131, 9474, 6016, 2438, 403, 6783, 4217, 7452, 2418, 9744, 
      6405, 8757, 9666, 4035, 7833, 2657, 7432, 3066, 9081, 9523, 3284, 
      3661, 1947, 3619, 2550, 4950, 1537, 2772, 5432, 6517, 6142, 9774, 
      1289, 6352}}
    

    This case should fail because it isn't cyclical.

    findCyclingList[Join[b, Take[b, 11], b]]
    
    $Failed
    

    I tried to something with Repeated, e.g. a /. Repeated[t__, {2, 100}] -> {t} but it just doesn't work for me.

    0 讨论(0)
  • 2021-01-01 16:04

    This sounds like it might relate to sequence alignment. These algorithms are well studied, and might already be implemented in mathematica.

    0 讨论(0)
  • 2021-01-01 16:06

    Above methods are better if you have no noise. If your signal is only approximate then Fourier transform methods might be useful. I'll illustrate with a "parametrized" setup wherein the length and number of repetitions of the base signal, the length of the trailing part, and a bound on the noise perturbation are all variables one can play with.

    noise = 20;
    extra = 40;
    baselen = 103;
    base = RandomInteger[10000, {baselen}];
    repeat = 5;
    signal = Flatten[Join[ConstantArray[base, repeat], Take[base, extra]]];
    noisysignal = signal + RandomInteger[{-noise, noise}, Length[signal]];
    

    We compute the absolute value of the FFT. We adjoin zeros to both ends. The object will be to threshold by comparing to neighbors.

    sigfft = Join[{0.}, Abs[Fourier[noisysignal]], {0}];
    

    Now we create two 0-1 vectors. In one we threshold by making a 1 for each element in the fft that is greater than twice the geometric mean of its two neighbors. In the other we use the average (arithmetic mean) but we lower the size bound to 3/4. This was based on some experimentation. We count the number of 1s in each case. Ideally we'd get 100 for each, as that would be the number of nonzeros in a "perfect" case of no noise and no tail part.

    In[419]:= 
    thresh1 = 
      Table[If[sigfft[[j]]^2 > 2*sigfft[[j - 1]]*sigfft[[j + 1]], 1, 
        0], {j, 2, Length[sigfft] - 1}];
    count1 = Count[thresh1, 1]
    thresh2 = 
      Table[If[sigfft[[j]] > 3/4*(sigfft[[j - 1]] + sigfft[[j + 1]]), 1, 
        0], {j, 2, Length[sigfft] - 1}];
    count2 = Count[thresh2, 1]
    
    Out[420]= 114
    
    Out[422]= 100
    

    Now we get our best guess as to the value of "repeats", by taking the floor of the total length over the average of our counts.

    approxrepeats = Floor[2*Length[signal]/(count1 + count2)]
    Out[423]= 5
    

    So we have found that the basic signal is repeated 5 times. That can give a start toward refining to estimate the correct length (baselen, above). To that end we might try removing elements at the end and seeing when we get ffts closer to actually having runs of four 0s between nonzero values.

    Something else that might work for estimating number of repeats is finding the modal number of zeros in run length encoding of the thresholded ffts. While I have not actually tried that, it looks like it might be robust to bad choices in the details of how one does the thresholding (mine were just experiments that seem to work).

    Daniel Lichtblau

    0 讨论(0)
  • 2021-01-01 16:07

    I don't know how to solve it in mathematica, but the following algorithm (written in python) should work. It's O(n) so speed should be no concern.

    def period(array):
        if len(array) == 0:
            return False
        else:
            s = array[0]
            match = False
            end = 0
            i = 0
    
            for k in range(1,len(array)):
                c = array[k]
    
                if not match:
                    if c == s:
                        i = 1
                        match = True
                        end = k
                else:
                    if not c == array[i]:
                        match = False
                    i += 1
    
            if match:
                return array[:end]
            else:
                return False
    
    # False         
    print(period([4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2,1]))
    # [4, 5, 1, 2, 3]            
    print(period([4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2]))
    # False
    print(period([4]))
    # [4, 2]
    print(period([4,2,4]))
    # False
    print(period([4,2,1]))
    # False
    print(period([]))
    
    0 讨论(0)
  • 2021-01-01 16:09

    Ok, just to show my own work here:

    ModifiedTortoiseHare[a_List] := Module[{counter, tortoise, hare},
    Quiet[
     Check[
      counter = 1;
      tortoise = a[[counter]];
      hare = a[[2 counter]];
      While[(tortoise != hare) || (a[[counter ;; 2 counter - 1]] != a[[2 counter ;; 3 counter - 1]]),
       counter++;
       tortoise = a[[counter]];
       hare = a[[2 counter]];
      ];
     counter,
    $Failed]]]
    

    I'm not sure this is a 100% correct, especially with cases like {pattern,pattern,different,pattern, pattern} and it gets slower and slower when there are a lot of repeating elements, like so:

    { 1,2,1,1, 1,2,1,1, 1,2,1,1, ...} 
    

    because it is making too many expensive comparisons.

    0 讨论(0)
提交回复
热议问题