Snakemake: Generic input function for different file locations

生来就可爱ヽ(ⅴ<●) 提交于 2020-02-06 11:00:47

问题


I have two locations where my huge data can be stored: /data and /work.

/data is the folder where (intermediate) results are moved to after quality control. It is mounted read-only for the standard user. /work is the folder where new results are written to. Obviously, it is writable.

I do not want to copy or link data from /data to /work.

So I run my snakemake from within the /work folder and want my input function first to check, if the required file exists in /data (and return the absolute /data path) and if not return the relative path in the /work directory.

def in_func(wildcards):
    file_path = apply_wildcards('{id}/{visit}/{id}_{visit}-file_name_1.txt', wildcards)
    full_storage_path = os.path.join('/data', file_path)
    if os.path.isfile(full_storage_path):
        file_path = full_storage_path
    return {'myfile': file_path}

rule do_something:
    input:
        unpack(in_func),
        params = '{id}/{visit}/{id}_{visit}_params.txt',

This works fine but I would have to define separate input functions for every rule because the file names differ. Is is possible to write a generic input function that takes as input the file name e.g {id}/{visit}/{id}_{visit}-file_name_1.txt and the wildcards?

I also tried something like

def in_func(file_path):
    full_storage_path = os.path.join('/data', file_path)
    if os.path.isfile(full_storage_path):
        file_path = full_storage_path
    file_path

rule do_something:
    input:
        myfile = in_func('{id}/{visit}/{id}_{visit}-file_name_1.txt')
        params = '{id}/{visit}/{id}_{visit}_params.txt',

But then I do not have access to the wildcards in in_func(), do I?

Thanks, Jan


回答1:


You could use something like this:

def handle_storage(pattern):
    def handle_wildcards(wildcards):
        f = pattern.format(**wildcards)
        f_data = os.path.join("/data", f)
        if os.path.exists(f_data):
            return f_data
        return f

    return handle_wildcards


rule do_something:
    input:
        myfile = handle_storage('{id}/{visit}/{id}_{visit}-file_name_1.txt')
        params = '{id}/{visit}/{id}_{visit}_params.txt',

In other words, the function handle_storage returns a pointer to the handle_wildcards function that is tailored for the particular pattern. The latter is then automatically applied by Snakemake once the wildcard values are known. Inside that function, we first format the pattern and then check if it exists in /data.



来源:https://stackoverflow.com/questions/45728159/snakemake-generic-input-function-for-different-file-locations

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!