Recently I've found myself working with R code that is all over the map in terms of coding style - multiple authors and individual authors who aren't rigorous about sticking to a single structure. There are certain tasks that I'd like to automate better than I currently do.
I'm looking for a tool (or tools) that manage the following tasks - listed in increasing order of desire but also somewhat in increasing order of skepticism of existence.
Basic formatting. Things like converting "if( foo )" to "if (foo)" and achieving uniformity in terms of brace location and that sort of thing.
Converting "foo$blah" to "foo[["blah"]]" for list access. Ideally it'd be able to at least make a guess if an object was really a list and not a data.frame and only convert lists.
Converting '=' to '<-'. Yes, this is a simple search and replace - but not really. The tool (or regexp) needs to be language aware such that it knows to convert "x = 5" but not "foo(x=5)". It'd also be really nice to not simply replace the symbol but also to ensure a single whitespace on both sides of the assignment operator.
Variable renaming, particularly across functions & files. For instance, suppose a list has an element "foo", I'd love to be able to change it to "foobar" once and not have to track down every usage of that list throughout the entire code flow. I'd imagine this would require the tool to be able the entire flow of control in order to identify things such as that list existing as another name in a different function.
Naming conventions. I'd love to be able to define some standard naming convention (e.g. Google's or whatever) and have it identify all of the functions, variables, etc and convert them. Note that this ties in with the previous entry for things like list elements.
Feel free to list basic unix processing commands (e.g. sed) as long as it'll really be smart enough to at least usually not screw things up (e.g. converting "foo(x=5)" to "foo(x<-5)").
I'm guessing that if such a tool already existed in a perfect state that I'd have heard of it by now, and I'm also realizing that with a language like R it's difficult to do some of these sorts of changes automagically, but one can dream, right? Does anyone have pointers on some/all of these?
Since this is still seem relevant I thought to mention styler which reformats r code according to the tidyverse style.
It ticks some of your boxes e.g. basic formatting but doesn't rename variables (although the linter lintr at least is able to show those).
Styler comes as an R package with functions the accept code (e.g. style_text()
, but it can be used on the command line as well:
for example this code in tmp.r
a <-c(1,2,3)
if(foo) {
b=2 }
myVar=2
and running:
Rscript -e 'styler::style_file("tmp.r")'
would overwrite tmp.r into this:
a <- c(1, 2, 3)
if (foo) {
b <- 2
}
myVar <- 2
IMHO, write your own. Writing a pretty printer is actually quite difficult. It requires understanding tokenizing, parsing, building ASTs or other IRs, tracking symbol tables and scopes, templating, etc.
But if you can do it, you'll really learn a lot about programming languages in general. You'll also look pretty impressive to your coworkers and it's amazing to put on a resume. It's also a lot of fun.
I'd recommend "Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages" by Terence Parr. It's a little rough to read, but the content is pretty good. It's written at an introductory level to parsers and it's pretty short, but it contains all the parts you'd need to write this tool yourself.
If you do build it, open source it, come back here and tell us about it, and put up a site with a few ads to make yourself a few bucks. That way everyone can use your awesome creation and you'll get a few dollars in the process.
Best of luck...
来源:https://stackoverflow.com/questions/9105357/r-language-aware-code-reformatting-refactoring-tools