问题
I am using a genetic interpretation software called SAIGE-GENE. The algorithm looks like this (full algorithm at https://github.com/weizhouUMICH/SAIGE/wiki/Genetic-association-tests-using-SAIGE#step-2--performing-the-region--or-gene-based-association-tests): It involves multiple different files being entered with chromosome numbers in their file names (1 to 22).
SPAGMMATtest = function(
vcfFile = "",
vcfFileIndex = "",
vcfField = "DS",
groupFile ="",
savFile = "",
savFileIndex = "",
sampleFile = "",
idstoExcludeFile = "",
idstoIncludeFile = "",
rangestoExcludeFile = "",
rangestoIncludeFile = "",
chrom = "",
start = 1,
end = 250000000,
IsDropMissingDosages = FALSE,
minMAC = 0.5,
minMAF = 0,
maxMAFforGroupTest = 0.5,
minInfo = 0,
GMMATmodelFile = "",
varianceRatioFile = "",
SPAcutoff=2,
SAIGEOutputFile = "",
numLinesOutput = 10000,
IsSparse=TRUE,
......
I haven't put the whole thing here as it isn't relevant. I am inputting a few different files into this algorithm and normally I name my files chr1_file_name.txt....chr22_file_name.txt.
I then use a for loop in R on the whole algorithm using the paste function to input the different file names by chromosome number:
for(i in 1:22){SPAGMMATtest = function(
vcfFile = paste("chr",i,"_file_name.txt", sep=""),
vcfFileIndex = "",
vcfField = "DS",
savFile = "",
groupFile ="paste("chr",i,".group_file.txt", sep="")",
etc
This works fine however, I thought I would be clever and use three digit naming for my file names for this experiment: chr001_file_name.txt...chr022_file_name.txt.
My previous loop now does not work and if I change the start of the loop to for(i in 001:022) it doesn't work either.
What am I doing wrong and how can I fix this without renaming all my files?
回答1:
Wimpel has suggested to
try:
vcfFile = paste("chr",sprintf( "%03d", i),"_file_name.txt", sep="")
, edit: for shorter code you can usepaste0()
, and drop the sep-argument.
in order to create character file names which include 3 digits and leading zeroes, e.g., 001, 002, ..., 022.
This can be further shortened by creating the filename completely with sprintf()
thereby removing the calls to paste()
or paste0()
:
sprintf("chr%03d_file_name.txt", i)
With i <- 1
, e.g., sprintf("chr%03d_file_name.txt", i)
returns "chr001_file_name.txt"
.
There is a second observation:
The OP has posted the code snippet
for(i in 1:22){SPAGMMATtest = function(
vcfFile = paste("chr",i,"_file_name.txt", sep=""),
vcfFileIndex = "",
vcfField = "DS",
savFile = "",
groupFile ="paste("chr",i,".group_file.txt", sep="")",
...
This looks like as if the OP has pulled the function definition into the for
loop. I believe it is sufficient only to call the function from within the for
loop:
for (i in 1:22) {
SPAGMMATtest(
vcfFile = sprintf("chr%03d_file_name.txt", i),
vcfFileIndex = "",
vcfField = "DS",
savFile = "",
groupFile = sprintf("chr%03d.group_file.txt", i)
...
来源:https://stackoverflow.com/questions/63449447/looping-over-multiple-files-using-a-multi-input-algorithm-with-three-digit-numbe