dataframe | 易学教程

Pandas merge two DF with rows replacement

阅读更多关于 Pandas merge two DF with rows replacement

问题 I faced with an issue to merge two DF into one and save all duplicate rows by id value from the second DF. Example: df1 = pd.DataFrame({ 'id': ['id1', 'id2', 'id3', 'id4'], 'com': [134.6, 223, 0, 123], 'malicious': [False, False, True, False] }) df2 = pd.DataFrame({ 'id': ['id7', 'id2', 'id5', 'id6'], 'com': [134.6, 27.6, 0, 123], 'malicious': [False, False, False, False] }) df1 id com malicious 0 id1 134.6 False 1 id2 223.0 False 2 id3 0.0 True 3 id4 123.0 False df2 id com malicious date 0

Python - the best way to create a new dataframe from two other dataframes with different shapes?

阅读更多关于 Python - the best way to create a new dataframe from two other dataframes with different shapes?

问题 Essentially, I'm trying to build a new dataframe from two others but the situation is a little complicated and I'm not sure what the best way to do this is. In DF1, each row is data about objects defined by IDs, and it looks something like this: ID Name datafield1 datafield2 1 Foo info1 info2 2 bar info3 info4 3 Foos info5 info6 DF2 has monthly data about each object formatted like this: ID Name Month data 1 Foo 1/20 53.6 1 Foo 2/20 47.2 1 Foo 3/20 12.7 1 Foo 4/20 3.2 2 Bar 1/20 82.2 2 Bar 2

How to print rolling window equation process from pandas dataframe in python?

阅读更多关于 How to print rolling window equation process from pandas dataframe in python?

问题 I created a pandas dataframe sample and it tried to sum for every 3 rows: import pandas as pd import numpy as np d={'A':[100,110,120,175,164,169,155,153,156,200]} df=pd.DataFrame(d) A 0 100 1 110 2 120 3 175 4 164 5 169 6 155 7 153 8 156 9 200 0 NaN 1 NaN 2 330.0 #this is the result tho 3 405.0 4 459.0 5 508.0 6 488.0 7 477.0 8 464.0 9 509.0 Name: sum, dtype: float64 And i want to display the equation process like this: NaN NaN 330.0 = 100+110+120 405.0 = 110+120+175 459.0 . 508.0 . 488.0 .

How to print rolling window equation process from pandas dataframe in python?

阅读更多关于 How to print rolling window equation process from pandas dataframe in python?

How to extract exact matches with list from a dataframe column?

阅读更多关于 How to extract exact matches with list from a dataframe column?

问题 I have a large dataframe with text that I want to use to find matches from a list of words (around 1k words in there). I have managed to get the absence/presence of the word from the list in the dataframe, but it is also important to me to know which word matched. Sometimes there is exact match with more than one word from the list, I would like to have them all. I tried to use the code below, but it gives me partial matches - syllables instead of full words. #this is a code to recreate the

How to extract exact matches with list from a dataframe column?

阅读更多关于 How to extract exact matches with list from a dataframe column?

How to extract exact matches with list from a dataframe column?

阅读更多关于 How to extract exact matches with list from a dataframe column?

Pandas DataFrame: how to reference to multiple sub set of row from itself?

阅读更多关于 Pandas DataFrame: how to reference to multiple sub set of row from itself?

问题 I want to get a dataframe which included multiple subset from itself. For example: DataFrame(data = a[1,2,3,4,5,6,7,8,9]) . I want build a dataframe with iloc[0,3] and iloc[6:9] which resulting: DataFrame(data = a[1,2,3,6,7,8]) . Currently I am doing like this which is keep doing data copying and very slow: if my_df is not None: domain += 1 new_domain = df.iloc[begin_iloc: begin_of_next_iloc] new_domain['domain'] = domain my_df = my_df.append(new_domain) else: my_df = df.iloc[begin_iloc:

Pandas DataFrame: how to reference to multiple sub set of row from itself?

阅读更多关于 Pandas DataFrame: how to reference to multiple sub set of row from itself?

How to compare the rows of two dataframes in R

阅读更多关于 How to compare the rows of two dataframes in R

问题 I'm trying to compare two columns of different data frames to create a new data frame. If the value of the row of the first col is less than the second, it will add a 1 to the new column. When the value is greater, it will add a 2 and so on. I'll give you an example. I have this df df1 <- data.frame(col=c(1,seq(1:9),9,10)) # col # 1 1 # 2 1 # 3 2 # 4 3 # 5 4 # 6 5 # 7 6 # 8 7 # 9 8 # 10 9 # 11 9 # 12 10 And this one, which has less rows df2<-data.frame(col2=c(3,6,8)) # col2 # 1 3 # 2 6 # 3 8