data-analysis | 易学教程

Splitting dictionary/list inside a Pandas Column and convert as new dataframe

阅读更多关于 Splitting dictionary/list inside a Pandas Column and convert as new dataframe

问题 I have data saved in a excel. I am querying this data using Python2.7 and turning it into a Pandas DataFrame. i have a column called category in my dataframe.It has a dictionary (or list?) of values within it. The DataFrame looks like this: [1] df ID CATEGORY 1 {60: 'SHOES'} 2 {46: 'HARDWARE'} 3 {60: 'SHOES'} 4 {219: 'GOVERNMENT OFFICE'} 5 {87: 'ARCADES',60: 'SHOES'} I need to split this column into separate columns so that the DataFrame looks like this: [2] df2 CATEGORY_ID CATEGORY_NAME 60

Plotting event density in Python with ggplot and pandas

阅读更多关于 Plotting event density in Python with ggplot and pandas

问题 I am trying to visualize data of this form: timestamp senderId 0 735217 106758968942084595234 1 735217 114647222927547413607 2 735217 106758968942084595234 3 735217 106758968942084595234 4 735217 114647222927547413607 5 etc... geom_density works if I don't separate the senderId s: df = pd.read_pickle('data.pkl') df.columns = ['timestamp', 'senderId'] plot = ggplot(aes(x='timestamp'), data=df) + geom_density() print plot The result looks as expected: However if I want to show the senderId s

What names can be used in plt.cm.get_cmap?

阅读更多关于 What names can be used in plt.cm.get_cmap?

问题 I have this code : plt.scatter(data_projected[:,0],data_projected[:,1],c=digits.target ,edgecolors='none',alpha=0.5,cmap=plt.cm.get_cmap('nipy_spectral',10)); My confusion comes from plt.cm.get_cmap('nipy_spectral',10) . Sometimes there will be plt.cm.get_cmap('RdYlBu') instead. Is the 'RdYlBu' , 'nipy_spectral' the name of a color? And is there any other names to use instead? Is there a list of all colors available? I have read the document but it does not seem to help or I do not understand

What names can be used in plt.cm.get_cmap?

阅读更多关于 What names can be used in plt.cm.get_cmap?

Combine date column and time column into datetime column

阅读更多关于 Combine date column and time column into datetime column

问题 I have a Pandas dataframe like this; (obtained by parsing an excel file) | | COMPANY NAME | MEETING DATE | MEETING TIME| -----------------------------------------------------------------------| |YKSGR| YAPI KREDİ SİGORTA A.Ş. | 2013-12-16 00:00:00 |14:00:00 | |TRCAS| TURCAS PETROL A.Ş. | 2013-12-12 00:00:00 |13:30:00 | Column MEETING DATE is a timestamp with a representation like Timestamp('2013-12-20 00:00:00', tz=None) and MEETING TIME is a datetime.time object with a representation like

Combining csv files with mismatched columns

阅读更多关于 Combining csv files with mismatched columns

`error: unbalanced parenthesis` while checking if an item presents in a pandas dataframe

阅读更多关于 `error: unbalanced parenthesis` while checking if an item presents in a pandas dataframe

问题 df=pd.DataFrame({"A":["one","two","three"],"B":["fopur","give","six"]}) when I do, df.B.str.contains("six").any() out[2]=True when I do, df.B.str.contains("six)").any() I am getting the below error, C:\ProgramData\Anaconda3\lib\sre_parse.py in parse(str, flags, pattern) 868 if source.next is not None: 869 assert source.next == ")" --> 870 raise source.error("unbalanced parenthesis") 871 872 if flags & SRE_FLAG_DEBUG: error: unbalanced parenthesis at position 3 Please help! 回答1: You can set

Signal enhancing algorithm

阅读更多关于 Signal enhancing algorithm

问题 I need an algorithm (preferable in a Pascal-like language, but it the end doesn't really matter) that will make the "signal" (actually a series of data points) in left look like the one in right. Signal origin: The signal is generated by a machine. Oversimplifying the explanation, the machine is measuring the density of a liquid flowing through a transparent tube. So, the signal is nothing similar to an electrical signal (audio/radio frequency). The data points could look like this: [1, 2, 1,

pandas: Calculated column based on values in one column

阅读更多关于 pandas: Calculated column based on values in one column

问题 I have columns like this in a csv file (I load it using read_csv('fileA.csv', parse_dates=['ProcessA_Timestamp']) ) Item ProcessA_Timestamp 'A' 2014-06-08 03:32:20 'B' 2014-06-08 03:32:20 'A' 2014-06-08 03:33:19 'C' 2014-06-08 03:33:20 'B' 2014-06-08 03:33:40 'D' 2014-06-08 03:38:20 How would I go about creating a column called ProcessA_ProcessingTime , which would be the time difference between last time an item occurs in the table - first time it occurs in the table. Similarly, I have other

pandas: Calculated column based on values in one column

阅读更多关于 pandas: Calculated column based on values in one column