pd

【转】Pandas学习笔记(四)处理丢失值

徘徊边缘 提交于 2019-12-02 21:59:32
原文:https://morvanzhou.github.io/tutorials/data-manipulation/np-pd/3-4-pd-nan/ 本文有删改 创建含 NaN 的矩阵 有时候我们导入或处理数据, 会产生一些空的或者是 NaN 数据,如何删除或者是填补这些 NaN 数据就是我们今天所要提到的内容. 建立了一个6X4的矩阵数据并且把两个位置的值为空. dates = pd.date_range('20130101', periods=6) df = pd.DataFrame(np.arange(24).reshape((6,4)),index=dates, columns=['A','B','C','D']) df.iloc[0,1] = np.nan df.iloc[1,2] = np.nan """ A B C D 2013-01-01 0 NaN 2.0 3 2013-01-02 4 5.0 NaN 7 2013-01-03 8 9.0 10.0 11 2013-01-04 12 13.0 14.0 15 2013-01-05 16 17.0 18.0 19 2013-01-06 20 21.0 22.0 23 """ 注意:以下函数并不会在原数据上做修改,只是会返回一个新的 pandas.DataFrame pd.dropna() 如果想直接去掉有

利用python画小猪佩奇

无人久伴 提交于 2019-12-01 13:13:04
import turtle as t t.pensize(4) t.hideturtle() t.colormode(255) t.color((255,155,192),"pink") t.setup(840,500) t.speed(10) #鼻子 t.pu() t.goto(-100,100) t.pd() t.seth(-30) t.begin_fill() a=0.4 for i in range(120): if 0<=i<30 or 60<=i<90: a=a+0.08 t.lt(3) #向左转3度 t.fd(a) #向前走a的步长 else: a=a-0.08 t.lt(3) t.fd(a) t.end_fill() t.pu() t.seth(90) t.fd(25) t.seth(0) t.fd(10) t.pd() t.pencolor(255,155,192) t.seth(10) t.begin_fill() t.circle(5) t.color(160,82,45) t.end_fill() t.pu() t.seth(0) t.fd(20) t.pd() t.pencolor(255,155,192) t.seth(10) t.begin_fill() t.circle(5) t.color(160,82,45) t.end_fill() #头 t

[已解决]报错:have mixed types. Specify dtype option on import or set low_memory=False

风格不统一 提交于 2019-12-01 12:12:12
报错代码: import pandas as pd pd1 = pd.read_csv('D:/python34/program/wx_chat_single/qq_single.csv') 报错内容: D:\python34\python.exe D:/python34/program/wx_chat_single/t1.py sys:1: DtypeWarning: Columns (18) have mixed types. Specify dtype option on import or set low_memory=False. Process finished with exit code 0 解决方案: import pandas as pd pd1 = pd.read_csv('D:/python34/program/wx_chat_single/qq_single.csv', low_memory=False) from: https://blog.csdn.net/u010212101/article/details/78017924 来源: https://www.cnblogs.com/hankleo/p/11684934.html

第二次作业(pandas)

微笑、不失礼 提交于 2019-12-01 10:00:55
import pandas as pd t=pd.DataFrame(pd.read_excel('C:\\Users\\ASUS\\Desktop\\lw\\python高级设计test\\数据文件\\titanic.xlsx')) s=t['survived'].value_counts() print('存活人数为{}\n死亡人数为{}'.format(s[0],s[1])) s=t['sex'].value_counts() print('male人数为{}\nfemale人数为{}'.format(s[0],s[1])) a = 0 b = 0 for i in t.index: if t['alive'][i] == 'yes': if t['sex'][i] == 'male': a += 1 elif t['sex'][i] == 'female': b += 1 print("男的获救人数为{}\n女的获救人数为{}".format(a, b)) print(t['class'].value_counts()) t = pd.DataFrame(pd.read_excel(file_path)) a = t[['survived', 'pclass']] print(a.corr()) print(t.boxplot(['fare'], ['pclass']))

第二次作业(pandas练习)

三世轮回 提交于 2019-12-01 08:07:21
import pandas as pd t=pd.DataFrame(pd.read_excel('C:\\Users\\ASUS\\Desktop\\lw\\python高级设计test\\数据文件\\titanic.xlsx')) s=t['survived'].value_counts() print('存活人数为{}\n死亡人数为{}'.format(s[0],s[1])) s=t['sex'].value_counts() print('male人数为{}\nfemale人数为{}'.format(s[0],s[1])) a = 0 b = 0 for i in t.index: if t['alive'][i] == 'yes': if t['sex'][i] == 'male': a += 1 elif t['sex'][i] == 'female': b += 1 print("男的获救人数为{}\n女的获救人数为{}".format(a, b))    print(t['class'].value_counts()) t = pd.DataFrame(pd.read_excel(file_path)) a = t[['survived', 'pclass']] print(a.corr()) print(t.boxplot(['fare'], ['pclass']

R credit

江枫思渺然 提交于 2019-12-01 07:46:40
library(devtools) devtools::install_github("ayhandis/creditR") library(creditR) ls("package:creditR") data("germancredit") str(germancredit) head(germancredit) sample_data <- germancredit[,c("duration.in.month","credit.amount", "installment.rate.in.percentage.of.disposable.income", "age.in.years","creditability")] sample_data$creditability <- ifelse(sample_data$creditability == "bad",1,0) missing_ratio(sample_data) traintest <- train_test_split(sample_data,123,0.70) train <- traintest$train test <- traintest$test woerules <- woe.binning(df = train,target.var = "creditability",pred.var = train

TiDB数据库PD混合部署

↘锁芯ラ 提交于 2019-11-30 13:24:28
1、汇总 1.1、问题 多套tidb集群的pd 部署在同样的机器,pd的服务相同,导致pd无法启动 版本:2.1.2 1.2、问题及解决 修改相关文件的端口部分解决 2、具体 2.1、具体问题 2.1.1、系统服务 /etc/systemd/system pd.service 2.1.2、pd的启停脚本 【${deploy_dir}/scripts/start_pd.sh】 #!/bin/bash set -e # WARNING: This file was auto-generated. Do not edit! # All your edit might be overwritten! sudo systemctl start pd.service 【 ${deploy_dir} /scripts/stop_pd.sh】 #!/bin/bash set -e # WARNING: This file was auto-generated. Do not edit! # All your edit might be overwritten! sudo systemctl stop pd.service 郑州冶疗男性不孕不育医院哪家好:http://byby.zztjyy.com/ 2.2、修复 tidb中控机: 【1、更改部署的】 /work/tidb/tidb-ansible

Hive 语句

◇◆丶佛笑我妖孽 提交于 2019-11-29 19:01:32
一、 修改hosts文件: 在hosts文件末尾加上 0.0.0.0 account.jetbrains.com 0.0.0.0 www.jetbrains.com $>spark-sql --queue=dev --num-executors 10 --executor-memory 10G $>show databases; $> show tables $>select * from dwm.cn_tl_base limit 1; $>spark-sql --queue=dev --num-executors 10 --executor-memory 10G --hiveconf hive.cli.print.header=true $>beeline -u jdbc:hive2://localhost:10005 -n liyingying //导入数据 //origion sqoop import -D mapred.job.queue.name=prod \ --connect 'jdbc:sqlserver://10.1.2.55:1433;database=docdbfamily' \ --username 'sa' \ --password 'password123456*' \ --query 'select id,fid from fid_pn with

pandas常用操作

时间秒杀一切 提交于 2019-11-29 12:01:09
数据导入和导出 # 导入 pd.read_csv("z:/data.csv", low_memory=True,encoding="utf-8",sep=",")#读取csv文件内容 pd.read_table(file_name) #从限定分隔符的文本文件导入数据 pd.read_excel(file_name) #从Excel文件导入数据 pd.read_sql(query, connection_object) #从SQL表/库导入数据 pd.read_json(json_string) #从JSON格式的字符串导入数据 pd.read_html(url)       #解析URL、字符串或者HTML文件,抽取其中的tables表格 pd.DataFrame(dict)     #从字典对象导入数据,Key是列名,Value是数 # 导出 df.to_csv("z:/data_new.csv", index=False) #不保存行索引,默认行列索引True df.to_excel(filename)               #导出数据到Excel文件 df.to_sql(table_name, connection_object) #导出数据到SQL表 df.to_json(filename)              #以Json格式导出数据到文本文件 来源:

Pandas库10_存取json和excel文件

五迷三道 提交于 2019-11-29 03:16:28
#json文件:javascript object notation import numpy as np import pandas as pd t_data={ "name":["唐浩","小王","老王","赵三","李四","王姐"], "sex":["男","女","男","女","男","女"], "year":[37,22,15,18,33,25], "city":["成都","北京","上海","成都","深圳","北京"] } #读取json文件有两种方法 # 方法一: # fj=open("json_data.json",encoding="utf-8") # obj=fj.read() # re=json.loads(obj) # print(re) # df=pd.DataFrame(re) # print(df) # 方法二:read_json()直接来,比上面要简单一些吧 # df_j=pd.read_json(open("json_data.json",encoding="utf-8")) # print(df_j) # #重新排序 # df_j.sort_index() # print(df_j) #to_json()函数对json数据进行存储,中文字符是编码了的,待解决 # df_j.to_json(open("json_data7.json",