pd

pdfbox: trying to decrypt PDF

匿名 (未验证) 提交于 2019-12-03 00:46:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: Following this answer I'm trying to decrypt a pdf-document with pdfbox: PDDocument pd = PDDocument.load(path); if(pd.isEncrypted()){ try { pd.decrypt(""); pd.setAllSecurityToBeRemoved(true); } catch (Exception e) { throw new Exception("The document is encrypted, and we can't decrypt it."); } This leads to Exception in thread "main" java.lang.NoClassDefFoundError: org/bouncycastle/jce/provider/BouncyCastleProvider at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1601) at org.apache.pdfbox.pdmodel.PDDocument.decrypt

第二次作业(pandas练习)

匿名 (未验证) 提交于 2019-12-03 00:14:01
import pandas as pd t = pd . DataFrame ( pd . read_excel ( 'C:\\Users\\ASUS\\Desktop\\lw\\python高级设计test\\数据文件\\titanic.xlsx' )) s = t [ 'survived' ]. value_counts () print ( '存活人数为{}\n死亡人数为{}' . format ( s [ 0 ], s [ 1 ])) s = t [ 'sex' ]. value_counts () print ( 'male人数为{}\nfemale人数为{}' . format ( s [ 0 ], s [ 1 ])) a = 0 b = 0 for i in t . index : if t [ 'alive' ][ i ] == 'yes' : if t [ 'sex' ][ i ] == 'male' : a += 1 elif t [ 'sex' ][ i ] == 'female' : b += 1 print ( "男的获救人数为{}\n女的获救人数为{}" . format ( a , b ))    print ( t [ 'class' ]. value_counts ()) t = pd . DataFrame ( pd . read_excel (

R credit

匿名 (未验证) 提交于 2019-12-03 00:14:01
library ( devtools ) devtools :: install_github ( "ayhandis/creditR" ) library ( creditR ) ls ( "package:creditR" ) data ( "germancredit" ) str ( germancredit ) head ( germancredit ) sample_data <- germancredit [, c ( "duration.in.month" , "credit.amount" , "installment.rate.in.percentage.of.disposable.income" , "age.in.years" , "creditability" )] sample_data$creditability <- ifelse ( sample_data$creditability == "bad" , 1 , 0 ) missing_ratio ( sample_data ) traintest <- train_test_split ( sample_data , 123 , 0.70 ) train <- traintest$train test <- traintest$test woerules <- woe . binning ( df

Pandas笔记:缺失值处理

匿名 (未验证) 提交于 2019-12-03 00:03:02
import pandas as pd import numpy as np data = pd.read_csv("./data/test.csv") print(data) print(pd.isnull(data)) # 缺失值True,其他False print(np.any(pd.isnull(data))) # 有缺失值True,没有False print(np.all(pd.notnull(data))) # 没有缺失True,有False print(pd.isnull(data).any()) # 每列是否有缺失 print(pd.notnull(data).all()) # 每列没有缺失 print(data.dropna()) # 删除所有的缺失值 print(data.fillna("NULL")) # 替换NULL # 替换 data_new = data.replace("?",value=np.nan) print(data_new.dropna()) a b c d 0 1 2 3.0 4.0 1 1 2 NaN 4.0 2 1 ? 3.0 4.0 3 1 2 3.0 4.0 4 1 2 3.0 NaN a b c d 0 False False False False 1 False False True False 2 False False

数据分析pandas时间数据索引

匿名 (未验证) 提交于 2019-12-02 23:42:01
基本类型,以时间戳为索引的Series -> DatetimeIndex(以datetime为索引) 创建 指定index为datetime的list pd.date_range() //创建 import numpy as np #指定index为datetime的list 1. date_list = [ datetime ( 2017 , 2 , 18 ), datetime ( 2017 , 2 , 19 ), datetime ( 2017 , 2 , 23 ), datetime ( 2017 , 2 , 24 ), datetime ( 2017 , 3 , 3 ), datetime ( 2017 , 3 , 4 )] time_s = pd . Series ( np . random . randn ( 6 ), index = date_list ) print ( time_s ) print ( type ( time_s . index )) 2. dates = pd . date_range ( '2017-02-18' , #起始日期 periods = 5 , #周期 freq = 'W-SAT' #频率 ) print ( dates ) print ( pd . Series ( np . random . randn ( 5 ),

pdf.js 调用内部方法手动渲染pdf

匿名 (未验证) 提交于 2019-12-02 22:56:40
1、整理的代码,可自己梳理 var url = '//cdn.mozilla.net/pdfjs/tracemonkey.pdf'; PDFJS.workerSrc = '//mozilla.github.io/pdf.js/build/pdf.worker.js'; var pdfDoc = null, pageNum = 1, pageRendering = false, pageNumPending = null, scale = 0.8, canvas = document.getElementById('the-canvas'), ctx = canvas.getContext('2d'); /** * Get page info from document, resize canvas accordingly, and render page. * @param num Page number. */ //渲染某页pdf内容 function renderPage(num) { pageRendering = true; pdfDoc.getPage(num).then(function(page) { var viewport = page.getViewport(scale); canvas.height = viewport.height; canvas

Python画小猪佩奇肖像,小迷妹被萌哭了!

匿名 (未验证) 提交于 2019-12-02 22:51:30
# coding:utf-8 import turtle as t t.pensize(4) t.hideturtle() t.colormode(255) t.color((255,155,192),"pink") t.setup(840,500) t.speed(10) #鼻子 t.pu() t.goto(-100,100) t.pd() t.seth(-30) t.begin_fill() a=0.4 for i in range(120): if 0<=i<30 or 60<=i<90: a=a+0.08 t.lt(3) #向左转3度 t.fd(a) #向前走a的步长 else: a=a-0.08 t.lt(3) t.fd(a) t.end_fill() t.pu() t.seth(90) t.fd(25) t.seth(0) t.fd(10) t.pd() t.pencolor(255,155,192) t.seth(10) t.begin_fill() t.circle(5) t.color(160,82,45) t.end_fill() t.pu() t.seth(0) t.fd(20) t.pd() t.pencolor(255,155,192) t.seth(10) t.begin_fill() t.circle(5) t.color(160,82,45) t

python数据拼接: pd.concat

匿名 (未验证) 提交于 2019-12-02 22:51:30
1.concat concat函数是在pandas底下的方法,可以将数据根据不同的轴作简单的融合 pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False) join:连接的方式 inner,或者outer 其他一些参数不常用,用的时候再补上说明。 1.1 相同字段的表首尾相接 # 现将表构成list,然后在作为concat的输入 In [4]: frames = [df1, df2, df3] In [5]: result = pd.concat(frames) 要在相接的时候在加上一个层次的key来识别数据源自于哪张表,可以增加key参数 In [6]: result = pd.concat(frames, keys=['x', 'y', 'z']) 效果如下 1.2 横向表拼接(行对齐) 1.2.1 axis 当axis = 1的时候,concat就是行对齐,然后将不同列名称的两张表合并 In [9]: result = pd.concat([df1, df4], axis=1) 1.2.2 join 加上join参数的属性,如果为’inner’得到的是两表的交集

python 数据提取及拆分

匿名 (未验证) 提交于 2019-12-02 22:51:30
K线数据提取 依据原有数据集格式,按要求生成新表: 1、每分钟的close数据的第一条、最后一条、最大值及最小值, 2、每分钟vol数据的增长量(每分钟vol的最后一条数据减第一条数据) 3、汇总这些信息生成一个新表 (字段名:[‘time’,‘open’,‘close’,‘high’,‘low’,‘vol’]) import pandas as pd import time start=time.time() df=pd.read_csv('data.csv') df=df.drop('id',axis=1) #删除id列 df1=pd.DataFrame(columns=['time','open','close','high','low','vol'])#新建目标数据表 for i in df.groupby('time'): #按时间分组 new_df=pd.DataFrame(columns=['time','open','close','high','low','vol']) #新建空表用于临时转存要求数据 new_df.time=i[1].time[0:1] #取每组时间为新表时间 new_df.open=i[1].close[0:1] #取每组第一个close数据为新表open数据 new_df.close=i[1]['close'].iloc[-1]

Python数据分析-&gt;pandas玩转excel-&gt; (1)如何利用pandas创建【行,列,单元格】

匿名 (未验证) 提交于 2019-12-02 22:51:30
import pandas as pd #------新建单元格的方法一:通过先创建字典的形式 #可以先新建一个字典 d={'x':100,'y':200,'z':300} #打印字典的索引 print(d.keys()) #打印某个索引对应的value print(d['x']) #将字典d传给Series s1=pd.Series(d) print (s1) #------新建单元格的方法二:分别定义Series的索引和值 L1=[100,200,300] L2=['x','y','z'] s1=pd.Series(L1,index=L2) print(s1) s1=pd.Series([100,200,300],index=['x','y','z']) print (s1) #-----正式建立一个data表格,赋值:value,行,列 s1=pd.Series([1,2,3],index=[1,2,3],name='A') s2=pd.Series([10,20,30],index=[1,2,3],name='B') s3=pd.Series([100,200,300],index=[1,2,3],name='C') df=pd.DataFrame({s1.name:s1,s2.name:s2,s3.name:s3}) print (df) 来源:博客园 作者: 鑫淼森