kedro | 易学教程

How to use tf.data.Dataset with kedro?

阅读更多关于 How to use tf.data.Dataset with kedro?

问题 I am using tf.data.Dataset to prepare a streaming dataset which is used to train a tf.kears model. With kedro, is there a way to create a node and return the created tf.data.Dataset to use it in the next training node? The MemoryDataset will probably not work because tf.data.Dataset cannot be pickled ( deepcopy isn't possible), see also this SO question. According to issue #91 the deep copy in MemoryDataset is done to avoid modifying the data by some other node. Can someone please elaborate a

Kedro - how to pass nested parameters directly to node

阅读更多关于 Kedro - how to pass nested parameters directly to node

问题 kedro recommends storing parameters in conf/base/parameters.yml . Let's assume it looks like this: step_size: 1 model_params: learning_rate: 0.01 test_data_ratio: 0.2 num_train_steps: 10000 And now imagine I have some data_engineering pipeline whose nodes.py has a function that looks something like this: def some_pipeline_step(num_train_steps): """ Takes the parameter `num_train_steps` as argument. """ pass How would I go about and pass that nested parameters straight to this function in data

How to run the nodes in sequence as declared in kedro pipeline?

阅读更多关于 How to run the nodes in sequence as declared in kedro pipeline?

问题 In Kedro pipeline, nodes (something like python functions) are declared sequentially. In some cases, the input of one node is the output of the previous node. However, sometimes, when kedro run API is called in the commandline, the nodes are not run sequentially. In kedro documentation, it says that by default the nodes are ran in sequence. My run.py code: def main( tags: Iterable[str] = None, env: str = None, runner: Type[AbstractRunner] = None, node_names: Iterable[str] = None, from_nodes: