I am a new python afficionado. For R users, there is one function : paste that helps to concatenate two or more variables in a dataframe. It\'s very useful. For example Suppose
This very much works like Paste command in R: R code:
words = c("Here", "I","want","to","concatenate","words","using","pipe","delimeter")
paste(words,collapse="|")
[1]
"Here|I|want|to|concatenate|words|using|pipe|delimeter"
Python:
words = ["Here", "I","want","to","concatenate","words","using","pipe","delimeter"]
"|".join(words)
Result:
'Here|I|want|to|concatenate|words|using|pipe|delimeter'
You can trypandas.Series.str.cat
import pandas as pd
def paste0(ss,sep=None,na_rep=None,):
'''Analogy to R paste0'''
ss = [pd.Series(s) for s in ss]
ss = [s.astype(str) for s in ss]
s = ss[0]
res = s.str.cat(ss[1:],sep=sep,na_rep=na_rep)
return res
pasteA=paste0
Or just sep.join()
def paste0(ss,sep=None,na_rep=None,
castF=unicode, ##### many languages dont work well with str
):
if sep is None:
sep=''
res = [castF(sep).join(castF(s) for s in x) for x in zip(*ss)]
return res
pasteB = paste0
%timeit pasteA([range(1000),range(1000,0,-1)],sep='_')
# 100 loops, best of 3: 7.11 ms per loop
%timeit pasteB([range(1000),range(1000,0,-1)],sep='_')
# 100 loops, best of 3: 2.24 ms per loop
I have used itertools
to mimic recycling
import itertools
def paste0(ss,sep=None,na_rep=None,castF=unicode):
'''Analogy to R paste0
'''
if sep is None:
sep=u''
L = max([len(e) for e in ss])
it = itertools.izip(*[itertools.cycle(e) for e in ss])
res = [castF(sep).join(castF(s) for s in next(it) ) for i in range(L)]
# res = pd.Series(res)
return res
patsy might be relevant (not an experienced user myself.)
Here's a simple implementation that works on lists, and probably other iterables. Warning: it's only been lightly tested, and only in Python 3.5+:
from functools import reduce
def _reduce_concat(x, sep=""):
return reduce(lambda x, y: str(x) + sep + str(y), x)
def paste(*lists, sep=" ", collapse=None):
result = map(lambda x: _reduce_concat(x, sep=sep), zip(*lists))
if collapse is not None:
return _reduce_concat(result, sep=collapse)
return list(result)
assert paste([1,2,3], [11,12,13], sep=',') == ['1,11', '2,12', '3,13']
assert paste([1,2,3], [11,12,13], sep=',', collapse=";") == '1,11;2,12;3,13'
You can also have some more fun and replicate other functions like paste0
:
from functools import partial
paste0 = partial(paste, sep="")
Edit: here's a Repl.it project with type-annotated versions of this code.
If you want to just paste two string columns together, you can simplify @shouldsee's answer because you don't need to create the function. E.g., in my case:
df['newcol'] = df['id_part_one'].str.cat(df['id_part_two'], sep='_')
It might be required for both Series to be of dtype object
in order to this (I haven't verified).
my anwser is loosely based on original question, was edited from answer by woles. I would like to illustrate the points:
for R folks: there is no ifelse in direct form (but there are ways to nicely replace it).
import numpy as np
import pandas as pd
dates = pd.date_range('20140412',periods=7)
df = pd.DataFrame(np.random.randn(7,4),index=dates,columns=list('ABCD'))
df['categorie'] = ['z', 'z', 'l', 'l', 'e', 'e', 'p']
def apply_to_row(x):
ret = "this is the value i want: %f" % x['A']
if x['B'] > 0:
ret = "no, this one is better: %f" % x['C']
return ret
df['theColumnIWant'] = df.apply(apply_to_row, axis = 1)
print df
Let's try things with apply.
df.apply( lambda x: str( x.loc[ desired_col ] ) + "pasting?" , axis = 1 )
you will recevied things similar like paste