pd | 易学教程

python的set_option选择

阅读更多关于 python的set_option选择

1、pd.set_option('expand_frame_repr', False) True就是可以换行显示。设置成False的时候不允许换行 2、pd.set_option('display.max_rows', 10) pd.set_option('display.max_columns', 10) 显示的最大行数和列数，如果超额就显示省略号，这个指的是多少个dataFrame的列。如果比较多又不允许换行，就会显得很乱。 3、pd.set_option('precision', 5) 显示小数点后的位数 4、pd.set_option('large_repr', A) truncate表示截断，info表示查看信息，一般选truncate 5、pd.set_option('max_colwidth', 5) 列长度 6、pd.set_option('chop_threshold', 0.5) 绝对值小于0.5的显示0.0 7、pd.set_option('colheader_justify', 'left') 显示居中还是左边， 8、pd.set_option('display.width', 200) 横向最多显示多少个字符，一般80不适合横向的屏幕，平时多用200. 来源： https://www.cnblogs.com/Nicholasdong/p/11810775

蚯蚓

阅读更多关于蚯蚓

NOIP 2016 蚯蚓题目链接做法：开三个队列，分别表示原始的蚯蚓，砍一刀后较大的一条，砍一刀后较小的一条。我们发现这三个队列都具有单调性，所以开三个队列，然后每次从这三个队列的队首取最长的一条，就可以解决这个问题了。 #include<cstdio> #include<algorithm> #include<iostream> #include<cstring> #include<cmath> using namespace std; #define maxn 7000100 int n, m, q; int u, v; int t; double p; int cnt = 0; int a[maxn], b[maxn], c[maxn]; int ans1[maxn]; int ans2[maxn]; int tot1 = 1, tot2l = 1, tot2r = 0, tot3l = 1, tot3r = 0; bool cmp(int x, int y) { return x > y; } int main() { scanf("%d%d%d%d%d%d", &n, &m, &q, &u, &v, &t); p = (double)u/v; for(int i = 1; i <= n; i++) { scanf("%d", &a[i]); } sort(a +

pd.to_numeric converts entire series to NaN [duplicate]

阅读更多关于 pd.to_numeric converts entire series to NaN [duplicate]

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: This question already has an answer here: Convert Pandas Dataframe to Float with commas and negative numbers 1 answer I'm trying to convert a column using pd.to_numeric, but for some reason it turns all values (except one) into NaN: In[]: pd.to_numeric(portfolio["Principal Remaining"],errors="coerce") Out[]: 1 NaN 2 NaN 3 NaN 4 NaN 5 NaN 6 NaN 7 NaN 8 NaN 9 NaN 10 NaN 11 NaN 12 NaN 13 NaN 14 NaN 15 NaN 16 NaN 17 NaN 18 836.61 19 NaN 20 NaN ... Name: Principal Remaining, Length: 32314, dtype: float64 Thoughts on why this is happening? The

Adding new column to existing DataFrame in Python pandas

阅读更多关于 Adding new column to existing DataFrame in Python pandas

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: I have the following indexed DataFrame with named columns and rows not- continuous numbers: a b c d 2 0.671399 0.101208 -0.181532 0.241273 3 0.446172 -0.243316 0.051767 1.577318 5 0.614758 0.075793 -0.451460 -0.012493 I would like to add a new column, 'e' , to the existing data frame and do not want to change anything in the data frame (i.e., the new column always has the same length as the DataFrame). 0 -0.335485 1 -1.166658 2 -0.385571 dtype: float64 I tried different versions of join , append , merge , but I did not get the result I

Correlated sub query column in SPARK SQL is not allowed as part of a non-equality predicate

阅读更多关于 Correlated sub query column in SPARK SQL is not allowed as part of a non-equality predicate

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: I am tryng to write a subquery in where clause like below. But i am getting "Correlated column is not allowed in a non-equality predicate:" SELECT *, holidays FROM ( SELECT *, s.holidays, s.entity FROM transit_t tt WHERE ( SELECT Count(thedate) AS holidays FROM fact_ent_rt WHERE entity=tt.awborigin AND ( Substring(thedate,1,10)) BETWEEN (Substring(awbpickupdate,1,10)) AND ( Substring(deliverydate,1,10)) AND ( nholidayflag = true OR weekendflag = true))) s Any issues with this query. because i thought spark >2.0 supported subqueries in where

ValueError: array is too big

阅读更多关于 ValueError: array is too big

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: I am trying to merge two excel files using the following code and encountering the error of "ValueError: array is too big; arr.size * arr.dtype.itemsize is larger than the maximum possible size." import pandas as pd file1 = pd.read_excel("file1.xlsx") file2 = pd.read_excel("file2.xlsx") file3 = file1.merge(file2, on="Input E-mail", how="outer") file3.to_excel("merged1.xlsx") File size is ~100MB+~100MB, Available Ram is 9GB (of 16GB) 回答1: Your resulting dataframe can be much larger than your two input ones. Simple example: import pandas as pd

'module' object has no attribute 'DataFrame' [closed]

阅读更多关于 'module' object has no attribute 'DataFrame' [closed]

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: For the following code: df = pd.DataFrame(np.random.rand(12,2), columns=['Apples', 'Oranges'] ) df['Categories'] = pd.Series(list('AAAABBBBCCCC')) pd.options.display.mpl_style = 'default' df.boxplot(by='Categories') I get the error: 'module' object has no attribute 'DataFrame'. Any ideas on what is happening and how to fix this problem? 回答1: The code presented here doesn't show this discrepancy, but sometimes I get stuck when invoking dataframe in all lower case. Switching to camel-case ( pd.DataFrame() ) cleans up the problem. 回答2: The most

concat two pandas DataFrame with same column/index into one DataFrame

阅读更多关于 concat two pandas DataFrame with same column/index into one DataFrame

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: I'm trying to concat multiples pandas.DataFrame to be saved in a mongodb in just one collection, all the dataframes have the same index/columns and I wanted to save it, in just one document, using to_json() method. Having all the cells of the dataframe as dicts, its probably a good approach. To accomplish that I wanted to concat the dataframes like this: df1: index A B 1 'A1' 'B1' 2 'A2' 'B2' 3 'A3' 'B3' df2: index A B 1 'a1' 'b1' 2 'a2' 'b2' 3 'a3' 'b3' Expected solution: df_sol: index A B 1 {d1:'A1', d2:'a1'} {d1:'B1', d2:'b1'} 2 {d1:'A2',

TypeError: unsupported operand type(s) for -: 'str' and 'str' in python 3.x Anaconda

阅读更多关于 TypeError: unsupported operand type(s) for -: 'str' and 'str' in python 3.x Anaconda

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: I am trying to count some instances per hour time in a large dataset. The code below seems to work fine on python 2.7 but I had to upgrade it to 3.x latest version of python with all updated packages on Anaconda. When I am trying to execute the program I am getting following str error Code: import pandas as pd from datetime import datetime,time import numpy as np fn = r'00_input.csv' cols = ['UserId', 'UserMAC', 'HotspotID', 'StartTime', 'StopTime'] df = pd.read_csv(fn, header=None, names=cols) df['m'] = df.StopTime + df.StartTime df['d'] =

pandas.read_excel parameter “sheet_name” not working

阅读更多关于 pandas.read_excel parameter “sheet_name” not working

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: According to the doc , pandas.read_excel has a parameter sheet_name that allows specifying which sheet is read. But when I am trying to read the second sheet from an excel file, no matter how I set the parameter ( sheet_name = 1 , sheet_name = 'Sheet2' ), the dataframe always shows the first sheet, and passing a list of indices ( sheet_name = [0, 1] ) does not return a dictionary of dataframes but still the first sheet. What might be the problem here? 回答1: You can try to use pd.ExcelFile : xls = pd.ExcelFile('path_to_file.xls') df1 = pd.read

订阅 pd