data-processing | 易学教程

Conditional merge for CSV files using python (pandas)

阅读更多关于 Conditional merge for CSV files using python (pandas)

问题 I am trying to merge >=2 files with the same schema. The files will contain duplicate entries but rows won't be identical, for example: file1: store_id,address,phone 9191,9827 Park st,999999999 8181,543 Hello st,1111111111 file2: store_id,address,phone 9191,9827 Park st Apt82,999999999 7171,912 John st,87282728282 Expected output: 9191,9827 Park st Apt82,999999999 8181,543 Hello st,1111111111 7171,912 John st,87282728282 If you noticed : 9191,9827 Park st,999999999 and 9191,9827 Park st Apt82

Python Pandas replace values by their opposite sign

阅读更多关于 Python Pandas replace values by their opposite sign

问题 I am trying to "clean" some data. I have values which are negative, which they cannot be. And I would like to replace all values that are negative to their corresponding positive values. A | B | C -1.9 | -0.2 | 'Hello' 1.2 | 0.3 | 'World' I would like this to become A | B | C 1.9 | 0.2 | 'Hello' 1.2 | 0.3 | 'World' As of now I have just begun writing the replace statement df.replace(df.loc[(df['A'] < 0) & (df['B'] < 0)],df * -1,inplace=True) Please help me in the right direction 回答1: Just

How should I Handle duplicate times in time series data with pandas?

阅读更多关于 How should I Handle duplicate times in time series data with pandas?

I have the following returned from an API Call as part of a larger dataset: {'Time': datetime.datetime(2017, 5, 21, 18, 18, 1, tzinfo=tzutc()), 'Price': '0.052600'} {'Time': datetime.datetime(2017, 5, 21, 18, 18, 1, tzinfo=tzutc()), 'Price': '0.052500'} Ideally I would use the timestamp as an index on the pandas data frame however this appears to fail as there is a duplicate when converting to JSON: df = df.set_index(pd.to_datetime(df['Timestamp'])) print(new_df.to_json(orient='index')) ValueError: DataFrame index must be unique for orient='index'. Any guidance on the best way to deal with

Conditional merge for CSV files using python (pandas)

阅读更多关于 Conditional merge for CSV files using python (pandas)

I am trying to merge >=2 files with the same schema. The files will contain duplicate entries but rows won't be identical, for example: file1: store_id,address,phone 9191,9827 Park st,999999999 8181,543 Hello st,1111111111 file2: store_id,address,phone 9191,9827 Park st Apt82,999999999 7171,912 John st,87282728282 Expected output: 9191,9827 Park st Apt82,999999999 8181,543 Hello st,1111111111 7171,912 John st,87282728282 If you noticed : 9191,9827 Park st,999999999 and 9191,9827 Park st Apt82,999999999 are similar based on store_id and phone but I picked it up from file2 since the address was

Lexicon dictionary for synonym words

阅读更多关于 Lexicon dictionary for synonym words

问题 There are few dictionaries available for natural language processing. Like positive, negative words dictionaries etc. Is there any dictionary available which contains list of synonym for all dictionary words? Like for nice synonyms: enjoyable, pleasant, pleasurable, agreeable, delightful, satisfying, gratifying, acceptable, to one's liking, entertaining, amusing, diverting, marvellous, good; 回答1: Although WordNet is a good resource to start for finding synonym, one must note its limitations,

Remove rows from dataframe that contains only 0 or just a single 0

阅读更多关于 Remove rows from dataframe that contains only 0 or just a single 0

问题 I am trying to create a function in R that will allow me to filter my data set based on whether a row contains a single column with a zero in it. Furthermore, some times I only want to remove rows that is zero in all columns. Also, and this is where it gets fun; not all columns contains numbers and the number of columns can vary. I have tried to paste some of my data here with the results I want to obtain. unfiltered: ID GeneName DU145small DU145total PC3small PC3total 1 MIR22HG 33221.5 1224

Regular Expressions to insert “\\r” every n characters in a line and before a complete word (basically a wordwrap feature)

阅读更多关于 Regular Expressions to insert “\\r” every n characters in a line and before a complete word (basically a wordwrap feature)

I'm new to JavaScript and regular expression. I'm trying to automatically format a text document to specific number of characters per line or put a "\r" before the word. This is functionally similar to Wordwrap found in numerous text editors. Eg. I want 10 characters per line Original:My name is Davey Blue. Modified:My name \ris Davey \rBlue. See, if the 10th character is a word, it puts that entire word down into a new line. I'm thinking the following should work to some degree /.{1,10}/ (This should find any 10 characters right?) Not sure how to go about the rest. Please help. basically text

Python Pandas replace values by their opposite sign

阅读更多关于 Python Pandas replace values by their opposite sign

I am trying to "clean" some data. I have values which are negative, which they cannot be. And I would like to replace all values that are negative to their corresponding positive values. A | B | C -1.9 | -0.2 | 'Hello' 1.2 | 0.3 | 'World' I would like this to become A | B | C 1.9 | 0.2 | 'Hello' 1.2 | 0.3 | 'World' As of now I have just begun writing the replace statement df.replace(df.loc[(df['A'] < 0) & (df['B'] < 0)],df * -1,inplace=True) Please help me in the right direction Just call abs : In [349]: df = df.abs() df Out[349]: A B 0 1.9 0.2 1 1.2 0.3 Another method would be to create a

How to gracefully fallback to `NaN` value while reading integers from a CSV with Pandas?

阅读更多关于 How to gracefully fallback to `NaN` value while reading integers from a CSV with Pandas?

While using read_csv with Pandas, if i want a given column to be converted to a type, a malformed value will interrupt the whole operation, without an indication about the offending value. For example, running something like: import pandas as pd import numpy as np df = pd.read_csv('my.csv', dtype={ 'my_column': np.int64 }) Will lead to a stack trace ending with the error: ValueError: cannot safely convert passed user dtype of <i8 for object dtyped data in column ... If i had the row number, or the offending value in the error message, i could add it to the list of known NaN values, but this

CKEditor - remove script tag with data processor

阅读更多关于 CKEditor - remove script tag with data processor

I am quite new with CKEditor (starting to use it 2 days ago) and I am still fighting with some configuration like removing the tag from editor. So for example, if a user type in source mode the following: <script type="text/javascript">alert('hello');</script> I would like to remove it. Looking the documentation, I found that this can be done using an HTML filter. I so defined it but it does not work. var editor = ev.editor; var dataProcessor = editor.dataProcessor; var htmlFilter = dataProcessor && dataProcessor.htmlFilter; htmlFilter.addRules( { elements : { script : function(element) {