问题
I'm building an SQL script out of text data. The (part of) script shall consist of a CREATE TABLE
statement and an optional INSERT INTO
statement. The values for INSERT INTO
statement are taken from the list of files, each one may exist or may not; all values of existing files are merged. The crucial part is that the INSERT INTO
statement shall be skipped whenever no one data file exists.
I've created a script in Snakemake that does that. There are two ambiguous rules that create a script: the one that creates a script for empty data, and the one that creates table but inserts data (the ambiguity is resolved with ruleorder
statement).
The interesting part is the rule that merges values from data files. It shall create the output whenever at least one input is present, and this rule shall not be considered otherwise. There are two difficulties: to make each input optional, and to prevent Snakemake using this rule whenever no files exist. I've done that with a trick:
def require_at_least_one(filelist):
existing = [file for file in filelist if os.path.isfile(file)]
return existing if len(existing) else "non_existing_file"
rule merge_values:
input: require_at_least_one(expand("path_to_data/{dataset}/values", dataset=["A", "B", "C"]))
output: ...
shell: ...
The require_at_least_one
function takes a list of filenames, and filters out those filenames that don't represent a file. This allows to make each input optional. For the corner case when no one file exists, this function returns a special value that represents a non-existing file. This allows to prune this branch and prefer the one that creates a script without INSERT
statement.
I feel like reinventing the wheel, moreover the "non_existing_file" trick looks a little dirty. Are there better and idiomatic ways to do that in Snakemake?
来源:https://stackoverflow.com/questions/65227729/how-to-make-snakemake-input-optional-but-not-empty