pandas | 易学教程

INSERT or UPDATE bulk data from dataframe/CSV to PostgreSQL database

阅读更多关于 INSERT or UPDATE bulk data from dataframe/CSV to PostgreSQL database

问题 Requirement: Insert new data and update existing data in bulk (row count > 1000) from a dataframe/CSV (which ever suites) and save it in PostgreSQL database. Table: TEST_TABLE CREATE TABLE TEST_TABLE ( itemid varchar(100) NOT NULL PRIMARY KEY, title varchar(255), street varchar(10), pincode VARCHAR(100)); INSERT: ['756252', 'tom title', 'APC Road', '598733' ], ['75623', 'dick title', 'Bush Road', '598787' ], ['756211', 'harry title', 'Obama Street', '598733' ] dataframe content: data = [[

Generate a Dataframe that follow a mathematical function for each column / row

阅读更多关于 Generate a Dataframe that follow a mathematical function for each column / row

问题 Is there a way to create/generate a Pandas DataFrame from scratch, such that each record follows a specific mathematical function? Background: In Financial Mathematics, very basic financial-derivatives (e.g. calls and puts) have closed-form pricing formulas (e.g. Black Scholes). These pricing formulas can be called stochastic functions (because they involve a random term) I'm trying to create a Monte Carlo simulation of a stock price (and subseuqently an option payoff and price based on the

read csv in a for loop using pandas

阅读更多关于 read csv in a for loop using pandas

问题 inp_file=os.getcwd() files_comp = pd.read_csv(inp_file,"B00234*.csv", na_values = missing_values, nrows=10) for f in files_comp: df_calculated = pd.read_csv(f, na_values = missing_values, nrows=10) col_length=len(df.columns)-1 Hi folks, How can I read 4 csv files in a for a loop. I am getting an error while reading the CSV in above format. Kindly help me 回答1: You basically need this: Get a list of all target files. files=os.listdir(path) and then keep only the filenames that start with your

Pandas fill in missing dates in DataFrame with multiple columns

阅读更多关于 Pandas fill in missing dates in DataFrame with multiple columns

问题 I want to add missing dates for a specific date range, but keep all columns. I found many posts using afreq() , resample() , reindex() , but they seemed to be for Series and I couldn't get them to work for my DataFrame. Given a sample dataframe: data = [{'id' : '123', 'product' : 'apple', 'color' : 'red', 'qty' : 10, 'week' : '2019-3-7'}, {'id' : '123', 'product' : 'apple', 'color' : 'blue', 'qty' : 20, 'week' : '2019-3-21'}, {'id' : '123', 'product' : 'orange', 'color' : 'orange', 'qty' : 8,

When using cut in a pandas dataframe to bin it, why is the binning not properly done?

阅读更多关于 When using cut in a pandas dataframe to bin it, why is the binning not properly done?

问题 I have a dataframe that I want to bin (i.e., group into sub-ranges) by one column, and take the mean of the second column for each of the bins: import pandas as pd import numpy as np data = pd.DataFrame(columns=['Score', 'Age']) data.Score = [1, 1, 1, 1, 0, 1, 2, 1, 0, 1, 1, 0, 2, 1, 1, 2, 1, 0, 1, 1, -1, 1, 0, 1, 1, 0, 1, 0, -2, 1] data.Age = [29, 59, 44, 52, 60, 53, 45, 47, 57, 54, 35, 32, 48, 31, 49, 43, 67, 32, 31, 42, 37, 45, 52, 59, 56, 57, 48, 45, 56, 31] _, bins = np.histogram(data

What are the efficient ways to parse / process huge JSON files in Python? [closed]

阅读更多关于 What are the efficient ways to parse / process huge JSON files in Python? [closed]

问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 2 years ago . Improve this question For my project I have to parse two big JSON files, one is 19.7 GB and another 66.3 GB. The structure of the JSON data is too complex. First Level Dictionary and again in 2nd level there might be List or Dictionary. These are all Network Log files, I have to

What are the efficient ways to parse / process huge JSON files in Python? [closed]

阅读更多关于 What are the efficient ways to parse / process huge JSON files in Python? [closed]

Create distribution in Pandas

阅读更多关于 Create distribution in Pandas

问题 I want to generate a random/simulated data set with a specific distribution. As an example the distribution has the following properties. A population of 1000 The Gender mix is: male 49%, female 50%, other 1% The age has the following distribution: 0-30 (30%), 31-60 (40%), 61-100 (30%) The resulting data frame would have 1000 rows, and two columns called gender and age (with the above value distributions) Is there a way to do this in Pandas or another library? 回答1: You may try: N = 1000

pyodbc to sqlalchemy connection

阅读更多关于 pyodbc to sqlalchemy connection

问题 I am trying to switch a pyodbc connection to sqlalchemy. The working pyodbc connection is: import pyodbc con = 'DRIVER={ODBC Driver 11 for SQL Server};SERVER=server.com\pro;DATABASE=DBase;Trusted_Connection=yes' cnxn = pyodbc.connect(con) cursor = cnxn.cursor() query = "Select * from table" cursor.execute(query) I tried: from sqlalchemy import create_engine dns = 'mssql+pyodbc://server.com\pro/DBase?driver=SQL+Server' engine = create_engine(dns) engine.execute('Select * from table').fetchall(

Pandas dataframe to dict, while keeping duplicate rows

阅读更多关于 Pandas dataframe to dict, while keeping duplicate rows

问题 I have a dataframe that looks like this: kenteken status code 0 XYZ A 123 1 XYZ B 456 2 ABC C 789 And I want to convert it to a dictionary in a dictionary like this: {'XYZ':{'code':'123', 'status':'A'}, {'code':'456', 'status':'B'}, 'ABC' : {'code':'789', 'status:'C'}} The closest I've been able to come was the folling: df.groupby('kenteken')['status', 'code'].apply(lambda x: x.to_dict()).to_dict() Which yields: {'ABC': {'status': {2: 'C'}, 'code': {2: '789'}},'XYZ': {'status': {0: 'A', 1: 'B