pandas | 易学教程

Text similarity using Word2Vec

阅读更多关于 Text similarity using Word2Vec

问题 I would like to use Word2Vec to check similarity of texts. I am currently using another logic: from fuzzywuzzy import fuzz def sim(name, dataset): matches = dataset.apply(lambda row: ((fuzz.ratio(row['Text'], name) ) = 0.5), axis=1) return (name is my column). For applying this function I do the following: df['Sim']=df.apply(lambda row: sim(row['Text'], df), axis=1) Could you please tell me how to replace fuzzy.ratio with Word2Vec in order to compare texts in a dataset? Example of dataset:

Grouping data by id, var1 into consecutive dates in python using pandas

阅读更多关于 Grouping data by id, var1 into consecutive dates in python using pandas

问题 I have some data that looks like: df_raw_dates = pd.DataFrame({"id": [102, 102, 102, 103, 103, 103, 104], "var1": ['a', 'b', 'a', 'b', 'b', 'a', 'c'], "val": [9, 2, 4, 7, 6, 3, 2], "dates": [pd.Timestamp(2020, 1, 1), pd.Timestamp(2020, 1, 1), pd.Timestamp(2020, 1, 2), pd.Timestamp(2020, 1, 2), pd.Timestamp(2020, 1, 3), pd.Timestamp(2020, 1, 5), pd.Timestamp(2020, 3, 12)]}) I want group this data into IDs and var1 where the dates are consecutive, if a day is missed I want to start a new record

Grouping data by id, var1 into consecutive dates in python using pandas

阅读更多关于 Grouping data by id, var1 into consecutive dates in python using pandas

Filling missing middle values in pandas dataframe

阅读更多关于 Filling missing middle values in pandas dataframe

问题 I have a pandas dataframe df as Date cost NC 20 5 NaN 21 7 NaN 23 9 78.0 25 6 80.0 Now what I need to do is fillup the missing dates and hence fill the column with a value say x only if there is number in the previous row. That is I want the output like Date cost NC 20 5 NaN 21 7 NaN 22 x NaN 23 9 78.0 24 x x 25 6 80.0 See Date 22 was missing and on 21 NC was missing, So on 22 cost is assigned to x but NC is assigned to NaN . Now setting the Date column to index and reindex ing it to missing

Python: Numpy and Pandas Transforming timestamp/data into one-hot-encoding

阅读更多关于 Python: Numpy and Pandas Transforming timestamp/data into one-hot-encoding

问题 I have a column of a dataframe that is like this time 0 2017-03-01 15:30:00 1 2017-03-01 16:00:00 2 2017-03-01 16:30:00 3 2017-03-01 17:00:00 4 2017-03-01 17:30:00 5 2017-03-01 18:00:00 6 2017-03-01 18:30:00 7 2017-03-01 19:00:00 8 2017-03-01 19:30:00 9 2017-03-01 20:00:00 10 2017-03-01 20:30:00 11 2017-03-01 21:00:00 12 2017-03-01 21:30:00 13 2017-03-01 22:00:00 . . . I want to "encode" the time of the day. I want to do this by firsly assigning each half an-hour a integer number. Starting

Pandas Dataframe nan values not replacing

阅读更多关于 Pandas Dataframe nan values not replacing

问题 Trying to replace values in my data frame which are listed as 'nan' (note, not 'NaN') I've read in an excel file, then tried to replace the nan values like this: All_items_df = ALL_df[df_items].fillna(' ') Finally I get an output that still contains 'nan' All_items_df ['Colour'].head(10) Out[]: 7 nan 8 nan 9 nan 10 nan 13 nan 14 nan 15 nan 16 nan 18 nan 19 nan Name: Colour, dtype: object Checking the nan values using isna() or isnull().value.all() gives me False for the above values. Why is

How to handle ValueError: Index contains duplicate entries using df.pivot or pd.pivot_table?

阅读更多关于 How to handle ValueError: Index contains duplicate entries using df.pivot or pd.pivot_table?

问题 I've got a table showing the accumulated number of hours ( dataframe values ) different specialists ( ID ) have taken to complete a sequence of four tasks ['Task1, 'Tas2', 'Task3, 'Tas4'] like this: Input: ID Task1 Task2 Task3 Task4 0 10 1 3 4 6 1 11 1 3 4 5 2 12 1 3 4 6 Now I'd like to reshape that dataframe so that I can easily find out which task each specialist was working on after 1 hour, 2 hours, and so on. So the desired output looks like this: Desired output: value 1 3 4 5 6 ID 10

How to handle ValueError: Index contains duplicate entries using df.pivot or pd.pivot_table?

阅读更多关于 How to handle ValueError: Index contains duplicate entries using df.pivot or pd.pivot_table?

Faking whether an object is an Instance of a Class in Python

阅读更多关于 Faking whether an object is an Instance of a Class in Python

问题 Suppose I have a class FakePerson which imitates all the attributes and functionality of a base class RealPerson without extending it . In Python 3, is it possible to fake isinstance() in order to recognise FakePerson as a RealPerson object by only modifying the FakePerson class. For example: class RealPerson(): def __init__(self, age): self.age = age def are_you_real(self): return 'Yes, I can confirm I am a real person' def do_something(self): return 'I did something' # Complicated

Decode one-hot dataframe in Pandas

阅读更多关于 Decode one-hot dataframe in Pandas

问题 I have 2 dataframes with the data as below: df1: ==== id name age likes --- ----- ---- ----- 0 A 21 rose 1 B 22 apple 2 C 30 grapes 4 D 21 lily df2: ==== category Fruit Flower --------- ------- ------- orange 1 0 apple 1 0 rose 0 1 lily 0 1 grapes 1 0 What I am trying to do is add another column to df1 which would contain the word 'Fruit' or 'Flower' depending on the one-hot encoding in df2 for that entry. I am looking for a purely pandas/numpy implementation. Any help would be appreciated.