dataframe

Pandas merge two DF with rows replacement

半腔热情 提交于 2021-02-17 05:18:25
问题 I faced with an issue to merge two DF into one and save all duplicate rows by id value from the second DF. Example: df1 = pd.DataFrame({ 'id': ['id1', 'id2', 'id3', 'id4'], 'com': [134.6, 223, 0, 123], 'malicious': [False, False, True, False] }) df2 = pd.DataFrame({ 'id': ['id7', 'id2', 'id5', 'id6'], 'com': [134.6, 27.6, 0, 123], 'malicious': [False, False, False, False] }) df1 id com malicious 0 id1 134.6 False 1 id2 223.0 False 2 id3 0.0 True 3 id4 123.0 False df2 id com malicious date 0

Python - the best way to create a new dataframe from two other dataframes with different shapes?

女生的网名这么多〃 提交于 2021-02-17 05:13:20
问题 Essentially, I'm trying to build a new dataframe from two others but the situation is a little complicated and I'm not sure what the best way to do this is. In DF1, each row is data about objects defined by IDs, and it looks something like this: ID Name datafield1 datafield2 1 Foo info1 info2 2 bar info3 info4 3 Foos info5 info6 DF2 has monthly data about each object formatted like this: ID Name Month data 1 Foo 1/20 53.6 1 Foo 2/20 47.2 1 Foo 3/20 12.7 1 Foo 4/20 3.2 2 Bar 1/20 82.2 2 Bar 2

How to print rolling window equation process from pandas dataframe in python?

和自甴很熟 提交于 2021-02-17 05:11:24
问题 I created a pandas dataframe sample and it tried to sum for every 3 rows: import pandas as pd import numpy as np d={'A':[100,110,120,175,164,169,155,153,156,200]} df=pd.DataFrame(d) A 0 100 1 110 2 120 3 175 4 164 5 169 6 155 7 153 8 156 9 200 0 NaN 1 NaN 2 330.0 #this is the result tho 3 405.0 4 459.0 5 508.0 6 488.0 7 477.0 8 464.0 9 509.0 Name: sum, dtype: float64 And i want to display the equation process like this: NaN NaN 330.0 = 100+110+120 405.0 = 110+120+175 459.0 . 508.0 . 488.0 .

How to print rolling window equation process from pandas dataframe in python?

心已入冬 提交于 2021-02-17 05:09:23
问题 I created a pandas dataframe sample and it tried to sum for every 3 rows: import pandas as pd import numpy as np d={'A':[100,110,120,175,164,169,155,153,156,200]} df=pd.DataFrame(d) A 0 100 1 110 2 120 3 175 4 164 5 169 6 155 7 153 8 156 9 200 0 NaN 1 NaN 2 330.0 #this is the result tho 3 405.0 4 459.0 5 508.0 6 488.0 7 477.0 8 464.0 9 509.0 Name: sum, dtype: float64 And i want to display the equation process like this: NaN NaN 330.0 = 100+110+120 405.0 = 110+120+175 459.0 . 508.0 . 488.0 .

How to extract exact matches with list from a dataframe column?

99封情书 提交于 2021-02-17 05:07:02
问题 I have a large dataframe with text that I want to use to find matches from a list of words (around 1k words in there). I have managed to get the absence/presence of the word from the list in the dataframe, but it is also important to me to know which word matched. Sometimes there is exact match with more than one word from the list, I would like to have them all. I tried to use the code below, but it gives me partial matches - syllables instead of full words. #this is a code to recreate the

How to extract exact matches with list from a dataframe column?

自作多情 提交于 2021-02-17 05:06:51
问题 I have a large dataframe with text that I want to use to find matches from a list of words (around 1k words in there). I have managed to get the absence/presence of the word from the list in the dataframe, but it is also important to me to know which word matched. Sometimes there is exact match with more than one word from the list, I would like to have them all. I tried to use the code below, but it gives me partial matches - syllables instead of full words. #this is a code to recreate the

How to extract exact matches with list from a dataframe column?

偶尔善良 提交于 2021-02-17 05:06:19
问题 I have a large dataframe with text that I want to use to find matches from a list of words (around 1k words in there). I have managed to get the absence/presence of the word from the list in the dataframe, but it is also important to me to know which word matched. Sometimes there is exact match with more than one word from the list, I would like to have them all. I tried to use the code below, but it gives me partial matches - syllables instead of full words. #this is a code to recreate the

Pandas DataFrame: how to reference to multiple sub set of row from itself?

霸气de小男生 提交于 2021-02-17 04:56:48
问题 I want to get a dataframe which included multiple subset from itself. For example: DataFrame(data = a[1,2,3,4,5,6,7,8,9]) . I want build a dataframe with iloc[0,3] and iloc[6:9] which resulting: DataFrame(data = a[1,2,3,6,7,8]) . Currently I am doing like this which is keep doing data copying and very slow: if my_df is not None: domain += 1 new_domain = df.iloc[begin_iloc: begin_of_next_iloc] new_domain['domain'] = domain my_df = my_df.append(new_domain) else: my_df = df.iloc[begin_iloc:

Pandas DataFrame: how to reference to multiple sub set of row from itself?

笑着哭i 提交于 2021-02-17 04:56:06
问题 I want to get a dataframe which included multiple subset from itself. For example: DataFrame(data = a[1,2,3,4,5,6,7,8,9]) . I want build a dataframe with iloc[0,3] and iloc[6:9] which resulting: DataFrame(data = a[1,2,3,6,7,8]) . Currently I am doing like this which is keep doing data copying and very slow: if my_df is not None: domain += 1 new_domain = df.iloc[begin_iloc: begin_of_next_iloc] new_domain['domain'] = domain my_df = my_df.append(new_domain) else: my_df = df.iloc[begin_iloc:

How to compare the rows of two dataframes in R

杀马特。学长 韩版系。学妹 提交于 2021-02-17 03:36:51
问题 I'm trying to compare two columns of different data frames to create a new data frame. If the value of the row of the first col is less than the second, it will add a 1 to the new column. When the value is greater, it will add a 2 and so on. I'll give you an example. I have this df df1 <- data.frame(col=c(1,seq(1:9),9,10)) # col # 1 1 # 2 1 # 3 2 # 4 3 # 5 4 # 6 5 # 7 6 # 8 7 # 9 8 # 10 9 # 11 9 # 12 10 And this one, which has less rows df2<-data.frame(col2=c(3,6,8)) # col2 # 1 3 # 2 6 # 3 8