Looping over multiple files using a multi input algorithm with three digit numbers in R

怎甘沉沦 提交于 2021-02-08 11:25:25

问题


I am using a genetic interpretation software called SAIGE-GENE. The algorithm looks like this (full algorithm at https://github.com/weizhouUMICH/SAIGE/wiki/Genetic-association-tests-using-SAIGE#step-2--performing-the-region--or-gene-based-association-tests): It involves multiple different files being entered with chromosome numbers in their file names (1 to 22).

SPAGMMATtest = function(
         vcfFile = "",
                 vcfFileIndex = "",
         vcfField = "DS",
         groupFile ="",
         savFile = "",
         savFileIndex = "",
         sampleFile = "", 
         idstoExcludeFile = "",
         idstoIncludeFile = "",
         rangestoExcludeFile = "",
         rangestoIncludeFile = "",
         chrom = "",
         start = 1,
         end = 250000000,
         IsDropMissingDosages = FALSE,  
         minMAC = 0.5, 
                 minMAF = 0,
         maxMAFforGroupTest = 0.5,
             minInfo = 0,
                 GMMATmodelFile = "", 
                 varianceRatioFile = "", 
                 SPAcutoff=2, 
                 SAIGEOutputFile = "",
         numLinesOutput = 10000, 
         IsSparse=TRUE,

......

I haven't put the whole thing here as it isn't relevant. I am inputting a few different files into this algorithm and normally I name my files chr1_file_name.txt....chr22_file_name.txt.

I then use a for loop in R on the whole algorithm using the paste function to input the different file names by chromosome number:

for(i in 1:22){SPAGMMATtest = function(
         vcfFile = paste("chr",i,"_file_name.txt", sep=""),
                 vcfFileIndex = "",
         vcfField = "DS",
         savFile = "",
         groupFile ="paste("chr",i,".group_file.txt", sep="")",

etc

This works fine however, I thought I would be clever and use three digit naming for my file names for this experiment: chr001_file_name.txt...chr022_file_name.txt.

My previous loop now does not work and if I change the start of the loop to for(i in 001:022) it doesn't work either.

What am I doing wrong and how can I fix this without renaming all my files?


回答1:


Wimpel has suggested to

try: vcfFile = paste("chr",sprintf( "%03d", i),"_file_name.txt", sep="") , edit: for shorter code you can use paste0(), and drop the sep-argument.

in order to create character file names which include 3 digits and leading zeroes, e.g., 001, 002, ..., 022.

This can be further shortened by creating the filename completely with sprintf() thereby removing the calls to paste() or paste0():

sprintf("chr%03d_file_name.txt", i)

With i <- 1, e.g., sprintf("chr%03d_file_name.txt", i) returns "chr001_file_name.txt".


There is a second observation:

The OP has posted the code snippet

for(i in 1:22){SPAGMMATtest = function(
         vcfFile = paste("chr",i,"_file_name.txt", sep=""),
                 vcfFileIndex = "",
         vcfField = "DS",
         savFile = "",
         groupFile ="paste("chr",i,".group_file.txt", sep="")",
         ...

This looks like as if the OP has pulled the function definition into the for loop. I believe it is sufficient only to call the function from within the for loop:

for (i in 1:22) {
     SPAGMMATtest(
         vcfFile = sprintf("chr%03d_file_name.txt", i),
         vcfFileIndex = "",
         vcfField = "DS",
         savFile = "",
         groupFile = sprintf("chr%03d.group_file.txt", i)
         ...


来源:https://stackoverflow.com/questions/63449447/looping-over-multiple-files-using-a-multi-input-algorithm-with-three-digit-numbe

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!