I have a very long R script with many if statements and exception cases. As i\'ve been going, if been importing and testing libraries as I\'ve gone and haven\'t really docum
I’ve previously used a shell script for this:
#!/usr/bin/env bash
source_files=($(git ls-files '*.R'))
grep -hE '\b(require|library)\([\.a-zA-Z0-9]*\)' "${source_files[@]}" | \
sed '/^[[:space:]]*#/d' | \
sed -E 's/.*\(([\.a-zA-Z0-9]*)\).*/\1/' | \
sort -uf \
> DEPENDS
This uses Git to collect all R files under version control in a project. Since you should be using version control anyway this is normally a good solution (although you may want to adapt the version control system). For the few cases where the project isn’t under version control you should (1) put it under version control. Or, failing that, (2) use find . -regex '.*\.[rR]'
instead of git ls-files '*.R'
.
And it produces a DEPENDS file containing a very simple list of dependencies.
It only finds direct calls to library
and require
though – if you wrap those calls, the script won’t work.
I found the list.functions.in.file()
function from NCmisc (install.packages("NCmisc")
) quite helpful for this:
list.functions.in.file(filename, alphabetic = TRUE)
For more info see this link: https://rdrr.io/cran/NCmisc/man/list.functions.in.file.html
Based on everyone's response, especially eh21's suggestion of the NCmisc package, I put together a little function that outputs a list of packages used in all your R scripts in a directory, as well as their frequencies.
library(NCmisc)
library(stringr)
library(dplyr)
checkPacks<-function(path){
## get all R files in your directory
## by the way, extract R code from Rmd: http://felixfan.github.io/extract-r-code/
files<-list.files(path)[str_detect(list.files(path), ".R$")]
## extract all functions and which package they are from
## using NCmisc::list.functions.in.file
funs<-unlist(lapply(paste0(path, "/", files), list.functions.in.file))
packs<-funs %>% names()
## "character" functions such as reactive objects in Shiny
characters<-packs[str_detect(packs, "^character")]
## user defined functions in the global environment
globals<-packs[str_detect(packs, "^.GlobalEnv")]
## functions that are in multiple packages' namespaces
multipackages<-packs[str_detect(packs, ", ")]
## get just the unique package names from multipackages
mpackages<-multipackages %>%
str_extract_all(., "[a-zA-Z0-9]+") %>%
unlist() %>%
unique()
mpackages<-mpackages[!mpackages %in% c("c", "package")]
## functions that are from single packages
packages<-packs[str_detect(packs, "package:") & !packs %in% multipackages] %>%
str_replace(., "[0-9]+$", "") %>%
str_replace(., "package:", "")
## unique packages
packages_u<-packages %>%
unique() %>%
union(., mpackages)
return(list(packs=packages_u, tb=table(packages)))
}
checkPacks("~/your/path")
You might want to look at the checkpoint function from Revolution Analytics on GitHub here: https://github.com/RevolutionAnalytics/checkpoint
It does some of this, and solves the problem of reproducibility. But I don't see that it can report a list of what you are using.
However if you looked a the code you probably get some ideas.
I am not sure of a good way to automatize this... but what you could do is:
Check with sessionInfo
that you don't have extra packages loaded.
You could check this using sessionInfo
. If you, by default, load extra packages (e.g. using your .RProfile file) I suggest you avoid doing that, as it's a recipe for disaster.
Normally you should only have the base packages loaded: stats
, graphics
, grDevices
, utils
, datasets
, methods
, and base
.
You can unload any extra libraries using:
detach("package:<packageName>", unload=TRUE)
Now run the script after commenting all of the library
and require
calls and see which functions give an error.
To get which package is required by each function type in the console:
??<functionName>
Load the required packages and re-run steps 3-5 until satisfied.
I had a similar need when I needed to convert my code into a package, thus I need to identify every package dependency and either import or use full qualified name.
In reading book Extending R
I found XRtools::makeImports
can scan a package and find all packages need to be imported. This doesn't solve our problem yet as it only apply to existing package, but it provided the main insight on how to do it.
I made a function and put it into my package mischelper
. You can install the package, either use the RStudio addin menu to scan current file or selected code, or use command line functions. Every external function (fun_inside) and the function that called it (usage) will be listed in table.
You can now go to each function, press F1 to find which package it belongs. I actually have another package that can scan all installed packages for function names and build a database, but that may cause more false positives for this usage because if you only loaded some packages, pressing F1 only search loaded packages.
See details of the usage in my package page
https://github.com/dracodoc/mischelper