pandas: records with lists to separate rows

前端 未结 3 1152
花落未央
花落未央 2021-01-16 07:05

I have a Python Pandas DataFrame like this (UCSC schema for NCBI RefSeq):

chrom   exonStart     exonEnds      name
chr1    100,200,300   110,210,310   gen1
c         


        
3条回答
  •  粉色の甜心
    2021-01-16 07:47

    This is one way using numpy and intertools.chain.

    The idea is to first split your comma separated fields into lists. Then construct a results dataframe, repeating or chaining values were necessary.

    import numpy as np
    from itertools import chain
    
    df['exonStart'] = df['exonStart'].str.split(',')
    df['exonEnds'] = df['exonEnds'].str.split(',')
    
    lens = list(map(len, df['exonStart']))
    
    res = pd.DataFrame({'chrom': np.repeat(df['chrom'], lens),
                        'exonStart': list(chain.from_iterable(df['exonStart'])),
                        'exonEnds': list(chain.from_iterable(df['exonEnds'])),
                        'name': np.repeat(df['name'], lens)})
    
    print(res)
    
    #   chrom exonEnds exonStart  name
    # 0  chr1      110       100  gen1
    # 0  chr1      210       200  gen1
    # 0  chr1      310       300  gen1
    # 1  chr1      600       500  gen2
    # 1  chr1      800       700  gen2
    # 2  chr2       55        50  gen3
    # 2  chr2       65        60  gen3
    # 2  chr2       75        70  gen3
    # 2  chr2       85        80  gen3
    

    Note you may wish to convert your numeric columns to int at the end of this process.

提交回复
热议问题