pandas

INSERT or UPDATE bulk data from dataframe/CSV to PostgreSQL database

帅比萌擦擦* 提交于 2021-02-19 08:57:05
问题 Requirement: Insert new data and update existing data in bulk (row count > 1000) from a dataframe/CSV (which ever suites) and save it in PostgreSQL database. Table: TEST_TABLE CREATE TABLE TEST_TABLE ( itemid varchar(100) NOT NULL PRIMARY KEY, title varchar(255), street varchar(10), pincode VARCHAR(100)); INSERT: ['756252', 'tom title', 'APC Road', '598733' ], ['75623', 'dick title', 'Bush Road', '598787' ], ['756211', 'harry title', 'Obama Street', '598733' ] dataframe content: data = [[

Generate a Dataframe that follow a mathematical function for each column / row

南笙酒味 提交于 2021-02-19 08:24:22
问题 Is there a way to create/generate a Pandas DataFrame from scratch, such that each record follows a specific mathematical function? Background: In Financial Mathematics, very basic financial-derivatives (e.g. calls and puts) have closed-form pricing formulas (e.g. Black Scholes). These pricing formulas can be called stochastic functions (because they involve a random term) I'm trying to create a Monte Carlo simulation of a stock price (and subseuqently an option payoff and price based on the

read csv in a for loop using pandas

半城伤御伤魂 提交于 2021-02-19 08:10:23
问题 inp_file=os.getcwd() files_comp = pd.read_csv(inp_file,"B00234*.csv", na_values = missing_values, nrows=10) for f in files_comp: df_calculated = pd.read_csv(f, na_values = missing_values, nrows=10) col_length=len(df.columns)-1 Hi folks, How can I read 4 csv files in a for a loop. I am getting an error while reading the CSV in above format. Kindly help me 回答1: You basically need this: Get a list of all target files. files=os.listdir(path) and then keep only the filenames that start with your

Pandas fill in missing dates in DataFrame with multiple columns

瘦欲@ 提交于 2021-02-19 07:41:48
问题 I want to add missing dates for a specific date range, but keep all columns. I found many posts using afreq() , resample() , reindex() , but they seemed to be for Series and I couldn't get them to work for my DataFrame. Given a sample dataframe: data = [{'id' : '123', 'product' : 'apple', 'color' : 'red', 'qty' : 10, 'week' : '2019-3-7'}, {'id' : '123', 'product' : 'apple', 'color' : 'blue', 'qty' : 20, 'week' : '2019-3-21'}, {'id' : '123', 'product' : 'orange', 'color' : 'orange', 'qty' : 8,

When using cut in a pandas dataframe to bin it, why is the binning not properly done?

为君一笑 提交于 2021-02-19 07:40:29
问题 I have a dataframe that I want to bin (i.e., group into sub-ranges) by one column, and take the mean of the second column for each of the bins: import pandas as pd import numpy as np data = pd.DataFrame(columns=['Score', 'Age']) data.Score = [1, 1, 1, 1, 0, 1, 2, 1, 0, 1, 1, 0, 2, 1, 1, 2, 1, 0, 1, 1, -1, 1, 0, 1, 1, 0, 1, 0, -2, 1] data.Age = [29, 59, 44, 52, 60, 53, 45, 47, 57, 54, 35, 32, 48, 31, 49, 43, 67, 32, 31, 42, 37, 45, 52, 59, 56, 57, 48, 45, 56, 31] _, bins = np.histogram(data

What are the efficient ways to parse / process huge JSON files in Python? [closed]

我是研究僧i 提交于 2021-02-19 07:35:07
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 2 years ago . Improve this question For my project I have to parse two big JSON files, one is 19.7 GB and another 66.3 GB. The structure of the JSON data is too complex. First Level Dictionary and again in 2nd level there might be List or Dictionary. These are all Network Log files, I have to

What are the efficient ways to parse / process huge JSON files in Python? [closed]

戏子无情 提交于 2021-02-19 07:35:02
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 2 years ago . Improve this question For my project I have to parse two big JSON files, one is 19.7 GB and another 66.3 GB. The structure of the JSON data is too complex. First Level Dictionary and again in 2nd level there might be List or Dictionary. These are all Network Log files, I have to

Create distribution in Pandas

瘦欲@ 提交于 2021-02-19 07:34:38
问题 I want to generate a random/simulated data set with a specific distribution. As an example the distribution has the following properties. A population of 1000 The Gender mix is: male 49%, female 50%, other 1% The age has the following distribution: 0-30 (30%), 31-60 (40%), 61-100 (30%) The resulting data frame would have 1000 rows, and two columns called gender and age (with the above value distributions) Is there a way to do this in Pandas or another library? 回答1: You may try: N = 1000

pyodbc to sqlalchemy connection

心已入冬 提交于 2021-02-19 07:16:26
问题 I am trying to switch a pyodbc connection to sqlalchemy. The working pyodbc connection is: import pyodbc con = 'DRIVER={ODBC Driver 11 for SQL Server};SERVER=server.com\pro;DATABASE=DBase;Trusted_Connection=yes' cnxn = pyodbc.connect(con) cursor = cnxn.cursor() query = "Select * from table" cursor.execute(query) I tried: from sqlalchemy import create_engine dns = 'mssql+pyodbc://server.com\pro/DBase?driver=SQL+Server' engine = create_engine(dns) engine.execute('Select * from table').fetchall(

Pandas dataframe to dict, while keeping duplicate rows

一笑奈何 提交于 2021-02-19 07:02:47
问题 I have a dataframe that looks like this: kenteken status code 0 XYZ A 123 1 XYZ B 456 2 ABC C 789 And I want to convert it to a dictionary in a dictionary like this: {'XYZ':{'code':'123', 'status':'A'}, {'code':'456', 'status':'B'}, 'ABC' : {'code':'789', 'status:'C'}} The closest I've been able to come was the folling: df.groupby('kenteken')['status', 'code'].apply(lambda x: x.to_dict()).to_dict() Which yields: {'ABC': {'status': {2: 'C'}, 'code': {2: '789'}},'XYZ': {'status': {0: 'A', 1: 'B