How to fill in rows with repeating data in pandas?

前端 未结 7 451
感情败类
感情败类 2021-01-02 02:59

In R, when adding new data of unequal length to a data frame, the values repeat to fill the data frame:

df <- data.frame(first=c(1,2,3,4,5,6))
df$second &         


        
相关标签:
7条回答
  • 2021-01-02 03:19
    import pandas as pd
    import numpy as np
    
    def put(df, column, values):
        df[column] = 0
        np.put(df[column], np.arange(len(df)), values)
    
    df = pd.DataFrame({'first':range(1, 8)})    
    put(df, 'second', [1,2,3])
    

    yields

       first  second
    0      1       1
    1      2       2
    2      3       3
    3      4       1
    4      5       2
    5      6       3
    6      7       1
    

    Not particularly beautiful, but one "feature" it possesses is that you do not have to worry if the length of the DataFrame is a multiple of the length of the repeated values. np.put repeats the values as necessary.


    My first answer was:

    import itertools as IT
    df['second'] = list(IT.islice(IT.cycle([1,2,3]), len(df)))
    

    but it turns out this is significantly slower:

    In [312]: df = pd.DataFrame({'first':range(10**6)})
    
    In [313]: %timeit df['second'] = list(IT.islice(IT.cycle([1,2,3]), len(df)))
    10 loops, best of 3: 143 ms per loop
    
    In [316]: %timeit df['second'] = 0; np.put(df['second'], np.arange(N), [1,2,3])
    10 loops, best of 3: 27.9 ms per loop
    
    0 讨论(0)
  • 2021-01-02 03:21

    Probably inefficient, but here's sort of a pure pandas solution.

    import numpy as np
    import pandas as pd
    
    base = [1,2,3]
    df = pd.DataFrame(data = None,index = np.arange(10),columns = ["filler"])
    df["filler"][:len(base)] = base
    
    df["tmp"] = np.arange(len(df)) % len(base)
    df["filler"] = df.sort_values("tmp")["filler"].ffill() #.sort_index()
    print(df)
    
    0 讨论(0)
  • 2021-01-02 03:22

    The cycle method from itertools is good for repeating a common pattern.

    from itertools import cycle
    
    seq = cycle([1, 2, 3])
    df['Seq'] = [next(seq) for count in range(df.shape[0])]
    
    0 讨论(0)
  • 2021-01-02 03:23

    In my case I needed to repeat the values without knowing the length of the sub-list, i.e. checking the length of every group. This was my solution:

    import numpy as np
    import pandas 
    
    df = pandas.DataFrame(['a','a','a','b','b','b','b'], columns=['first'])
    
    list = df.groupby('first').apply(lambda x: range(len(x))).tolist()
    loop = [val for sublist in list for val in sublist]
    df['second']=loop
    
    df
      first  second
    0     a       0
    1     a       1
    2     a       2
    3     b       0
    4     b       1
    5     b       2
    6     b       3
    
    0 讨论(0)
  • 2021-01-02 03:26

    Seems there is no elegant way. This is the workaround I just figured out. Basically create a repeating list just bigger than original dataframe, and then left join them.

    import pandas
    df = pandas.DataFrame(range(100), columns=['first'])
    repeat_arr = [1, 2, 3]
    df = df.join(pandas.DataFrame(repeat_arr * (len(df)/len(repeat_arr)+1),
        columns=['second']))
    
    0 讨论(0)
  • 2021-01-02 03:29

    You might want to try using the power of modulo (%). You can take the value (or index) of first and use the length of second as the modulus to get the value (or index) you're looking for. Something like:

    df = pandas.DataFrame([0,1,2,3,4,5], columns=['first'])
    sec = [0,1,2]
    df['second'] = df['first'].apply(lambda x: x % len(sec) )
    print(df)
       first  second
    0      0       0
    1      1       1
    2      2       2
    3      3       0
    4      4       1
    5      5       2
    

    I hope that helps.

    0 讨论(0)
提交回复
热议问题