Equivalent of Paste R to Python

后端 未结 8 698
忘掉有多难
忘掉有多难 2021-02-01 14:32

I am a new python afficionado. For R users, there is one function : paste that helps to concatenate two or more variables in a dataframe. It\'s very useful. For example Suppose

相关标签:
8条回答
  • 2021-02-01 15:09

    This very much works like Paste command in R: R code:

     words = c("Here", "I","want","to","concatenate","words","using","pipe","delimeter")
     paste(words,collapse="|")
    

    [1]

    "Here|I|want|to|concatenate|words|using|pipe|delimeter"

    Python:

    words = ["Here", "I","want","to","concatenate","words","using","pipe","delimeter"]
    "|".join(words)
    

    Result:

    'Here|I|want|to|concatenate|words|using|pipe|delimeter'

    0 讨论(0)
  • 2021-02-01 15:11
    1. You can trypandas.Series.str.cat

      import pandas as pd
      def paste0(ss,sep=None,na_rep=None,):
          '''Analogy to R paste0'''
          ss = [pd.Series(s) for s in ss]
          ss = [s.astype(str) for s in ss]
          s = ss[0]
          res = s.str.cat(ss[1:],sep=sep,na_rep=na_rep)
          return res
      
      pasteA=paste0
      
    2. Or just sep.join()

      #
      def paste0(ss,sep=None,na_rep=None, 
          castF=unicode, ##### many languages dont work well with str
      ):
          if sep is None:
              sep=''
          res = [castF(sep).join(castF(s) for s in x) for x in zip(*ss)]
          return res
      pasteB = paste0
      
      
      %timeit pasteA([range(1000),range(1000,0,-1)],sep='_')
      # 100 loops, best of 3: 7.11 ms per loop
      %timeit pasteB([range(1000),range(1000,0,-1)],sep='_')
      # 100 loops, best of 3: 2.24 ms per loop
      
    3. I have used itertools to mimic recycling

      import itertools
      def paste0(ss,sep=None,na_rep=None,castF=unicode):
          '''Analogy to R paste0
          '''
          if sep is None:
              sep=u''
          L = max([len(e) for e in ss])
          it = itertools.izip(*[itertools.cycle(e) for e in ss])
          res = [castF(sep).join(castF(s) for s in next(it) ) for i in range(L)]
          # res = pd.Series(res)
          return res
      
    4. patsy might be relevant (not an experienced user myself.)

    0 讨论(0)
  • 2021-02-01 15:14

    Here's a simple implementation that works on lists, and probably other iterables. Warning: it's only been lightly tested, and only in Python 3.5+:

    from functools import reduce
    
    def _reduce_concat(x, sep=""):
        return reduce(lambda x, y: str(x) + sep + str(y), x)
            
    def paste(*lists, sep=" ", collapse=None):
        result = map(lambda x: _reduce_concat(x, sep=sep), zip(*lists))
        if collapse is not None:
            return _reduce_concat(result, sep=collapse)
        return list(result)
    
    assert paste([1,2,3], [11,12,13], sep=',') == ['1,11', '2,12', '3,13']
    assert paste([1,2,3], [11,12,13], sep=',', collapse=";") == '1,11;2,12;3,13'
    

    You can also have some more fun and replicate other functions like paste0:

    from functools import partial
    
    paste0 = partial(paste, sep="")
    

    Edit: here's a Repl.it project with type-annotated versions of this code.

    0 讨论(0)
  • 2021-02-01 15:15

    If you want to just paste two string columns together, you can simplify @shouldsee's answer because you don't need to create the function. E.g., in my case:

    df['newcol'] = df['id_part_one'].str.cat(df['id_part_two'], sep='_')
    

    It might be required for both Series to be of dtype object in order to this (I haven't verified).

    0 讨论(0)
  • 2021-02-01 15:18

    my anwser is loosely based on original question, was edited from answer by woles. I would like to illustrate the points:

    • paste is % operator in python
    • using apply you can make new value and assign it to new column

    for R folks: there is no ifelse in direct form (but there are ways to nicely replace it).

    import numpy as np
    import pandas as pd
    
    dates = pd.date_range('20140412',periods=7)
    df = pd.DataFrame(np.random.randn(7,4),index=dates,columns=list('ABCD'))
    df['categorie'] = ['z', 'z', 'l', 'l', 'e', 'e', 'p']
    
    def apply_to_row(x):
        ret = "this is the value i want: %f" % x['A']
        if x['B'] > 0:
            ret = "no, this one is better: %f" % x['C']
        return ret
    
    df['theColumnIWant'] = df.apply(apply_to_row, axis = 1)
    print df
    
    0 讨论(0)
  • 2021-02-01 15:21

    Let's try things with apply.

    df.apply( lambda x: str( x.loc[ desired_col ] ) + "pasting?" , axis = 1 )
    

    you will recevied things similar like paste

    0 讨论(0)
提交回复
热议问题