How to find byte sizes of R figures on pages?

落花浮王杯 提交于 2019-12-06 09:28:31

Download and install the pdftk utility if it is not already on your system and then try one of the following alternatives this from within R.

1) It will return a data frame with the page file sizes in bytes and other information.

myfile <- "Rplots.pdf"
system(paste("pdftk", myfile, "burst"))
file.info(Sys.glob("pg_*.pdf"))

It will also generate a file doc_data.txt with some miscellaneous information that may or may not be of interest.

1a) This alternative will not generate any files. It will simply return the character sizes of the pages as a numeric vector.

myfile <- "Rplots.pdf"
pages <- as.numeric(read.dcf(pipe(paste("pdftk", myfile, "dump_data")))[, "NumberOfPages"])
cmds <- sprintf("pdftk %s cat %d output - | wc -c", myfile, seq_len(pages))
unname(sapply(cmds, function(cmd) scan(pipe(cmd), quiet = TRUE)))

The above should work if pdftk and wc are on your path. Note that on Windows you can find wc in the Rtools distribution and is typically at "C:\\Rtools\\bin\\wc" once Rtools is installed.

2) This alternative is similar to (1) but uses the animation package:

library(animation)

ani.options(pdftk = "/path/to/pdftk")
pdftk("Rplots.pdf", "burst", "pg_%04d.pdf", "")
file.info(Sys.glob("pg_*.pdf"))

To measure the size of each page in a pdf-file I suggest this:

test_size <- TRUE
pdf_name <- "masterpiece"

if(test_size){
  dir.create("test_page_size_pdf")
  pdf_address <- paste0("./test_page_size_pdf/page%02d.pdf")  
} else { pdf_address <- paste0("./", pdf_name, ".pdf")}

pdf(pdf_address, width=10, height=6, onefile=!test_size)
par(mar=c(1,1,1,1), oma=c(1,1,1,1))

  plot(rnorm(10^6, 100, 5), type="l")
  plot(sin, -pi, 2*pi) 
  plot(table(rpois(100, 5)), type = "h", col = "red", lwd = 10,
     main = "rpois(100, lambda = 5)")
  plot(x <- sort(rnorm(47)), type = "s", main = "plot(x, type = \"s\")")
  points(x, cex = .5, col = "dark red")

dev.off()

if(test_size){
  files <- paste0("./test_page_size_pdf/", list.files("./test_page_size_pdf/"))
  size_bytes <- format(file.size(files), big.mark = ",")
  file.remove(files)
  file.remove("test_page_size_pdf")
  cbind(files, size_bytes)
}

The size of a pdf-page in R depends on three things: the content of the plot(), the options used in the pdf() function and the plotting options which are here defined in par().

All this is difficult to estimate. You mention also that you like to have something similar to the shell function ls, which run on files as well. So in this solution I create a temporary folder dir.create() in which we save every page of the pdf separately in a file. We implement this with the option onefile. When the plotting is finish every pdf-page-file as well as the temporary folder will be deleted. And you can see the result in the console.

If you are finish with the testing and want the result in a single file you just have to change in the first line of this script the variable test_size <- FALSE. By the way; I have some doubt that the size of a page is a proxy for the quality of an image. Pdf is a vector format, so the size correspondent with the number of elements: see the size of the first page in my example where I plot 1mio points.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!