Obtaining different set of configs across multiple calls in ray tune

可紊 提交于 2021-01-28 07:55:29

问题


I am trying to make my code reproducible. I have already added np.random.seed(...) and random.seed(...), and at the moment I am not using pytorch or tf, therefore no scheduler or searcher can introduce any random issue. The set of configs produced with the above code should be always the same across multiple calls. However, it is not the case.

Can anyone help with this?

Thank you!

Here the code:

import ray
from ray import tune
import random
import numpy as np

def training_function(config, data_init):
    print('CONFIG:', config)
    tune.report(end_of_training=1, acc=0, f=0)

if __name__ == '__main__':
    ray.init(num_cpus=12)
    tune_config = {'sentence_classification': False, 
              'norm_word_emb': tune.choice(['True', 'False']), 
              'use_crf': tune.choice(['True', 'False']), 
              'use_char': tune.choice(['True', 'False']), 
              'word_seq_feature': tune.choice(['CNN', 'LSTM', 'GRU']), 
              'char_seq_feature': tune.choice(['CNN', 'LSTM', 'GRU']), 
              'seed_num': 1267}
    data = {'a': 1}
    tune_seed = tune_config['seed_num']
    random.seed(tune_seed)
    np.random.seed(tune_seed)
    n_samples = 15
    exp_name = 'experiment_name'
    analysis = tune.run(
        tune.with_parameters(training_function, data_init={'data': data}),
        name=exp_name,
        metric="f",
        mode="max",
        queue_trials=True,
        config=tune_config,
        num_samples=n_samples,
        resources_per_trial={"cpu": 1},
        checkpoint_at_end=True,
        max_failures=0,
    )

回答1:


Function-level API cannot be made reproducible (ray v1.1.0, may be subject to change).

Wait, but why

  1. tune.run creates an Experiment object, passing your function there.
  2. Experiment registers the function as trainable by calling register_trainable
  3. register_trainable wraps your function using wrap_function
  4. wrap_function will create a class-level API (ray Actor) by inheriting from FunctionRunner class.
  5. FunctionRunner doesn't have any callback access into setup method.

The way Actor works is, oversimplifying, it gets distributed among workers and then initialized in different processes using setup method. This is why it is crutial to pass seed and implement initialization logic inside your custom Trainable, as described in this answer. Seeding is needed because tune.choice is just a wrapper around random/np.random functions. You can observe this in tune/sample.py.

See the example:


import ray
from ray import tune
import random
import numpy as np

class Tunable(tune.Trainable):
    def setup(self, config):
        self.config = config
        self.seed = config['seed_num']
        random.seed(self.seed)
        np.random.seed(self.seed)
    
    def step(self):
        print('CONFIG:', self.config)
        return {tune.result.DONE: 'done', 'acc': 0, 'f': 0}

if __name__ == '__main__':
    ray.init(num_cpus=12)
    tune_config = {'sentence_classification': False, 
              'norm_word_emb': tune.choice(['True', 'False']), 
              'use_crf': tune.choice(['True', 'False']), 
              'use_char': tune.choice(['True', 'False']), 
              'word_seq_feature': tune.choice(['CNN', 'LSTM', 'GRU']), 
              'char_seq_feature': tune.choice(['CNN', 'LSTM', 'GRU']), 
              'seed_num': 1267}
    data = {'a': 1}
    tune_seed = tune_config['seed_num']
    n_samples = 15
    exp_name = 'experiment_name'
    analysis = tune.run(
        Tunable,
        name=exp_name,
        metric="f",
        mode="max",
        queue_trials=True,
        config=tune_config,
        num_samples=n_samples,
        resources_per_trial={"cpu": 1},
        checkpoint_at_end=False,
        max_failures=0,
    )



回答2:


I'm seeing the behavior where the seeding works. I ran this script:

import ray
from ray import tune
import numpy as np
import random


def training_function(config, data_init):
    print('CONFIG:', config)
    tune.report(end_of_training=1, acc=0, f=0)

if __name__ == '__main__':
    # ray.init(num_cpus=12)
    tune_config = {'sentence_classification': False, 
              'norm_word_emb': tune.choice(['True', 'False']), 
              'use_crf': tune.choice(['True', 'False']), 
              'use_char': tune.choice(['True', 'False']), 
              'word_seq_feature': tune.choice(['CNN', 'LSTM', 'GRU']), 
              'char_seq_feature': tune.choice(['CNN', 'LSTM', 'GRU']), 
              'seed': 1267}
    data = {'a': 1}
    tune_seed = tune_config['seed']
    random.seed(tune_seed)
    np.random.seed(tune_seed)
    n_samples = 15
    analysis = tune.run(
        tune.with_parameters(training_function, data_init={'data': data}),
        #name=exp_name,
        metric="f",
        mode="max",
        queue_trials=True,
        config=tune_config,
        num_samples=n_samples,
        resources_per_trial={"cpu": 1},
        verbose=2,
        max_failures=0,
    )

where I ran one run:

Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/27.0 GiB heap, 0.0/9.28 GiB objects
Current best trial: 84b84_00014 with f=0 and parameters={'sentence_classification': False, 'norm_word_emb': 'False', 'use_crf': 'True', 'use_char': 'False', 'word_seq_feature': 'LSTM', 'char_seq_feature': 'GRU', 'seed': 1267}
Number of trials: 15/15 (15 TERMINATED)
+--------------------+------------+-------+--------------------+-----------------+------------+-----------+--------------------+--------+------------------+-------------------+-------+-----+
| Trial name         | status     | loc   | char_seq_feature   | norm_word_emb   | use_char   | use_crf   | word_seq_feature   |   iter |   total time (s) |   end_of_training |   acc |   f |
|--------------------+------------+-------+--------------------+-----------------+------------+-----------+--------------------+--------+------------------+-------------------+-------+-----|
| _inner_84b84_00000 | TERMINATED |       | LSTM               | True            | False      | False     | LSTM               |      1 |       0.00149202 |                 1 |     0 |   0 |
| _inner_84b84_00001 | TERMINATED |       | CNN                | False           | True       | False     | CNN                |      1 |       0.0014801  |                 1 |     0 |   0 |
| _inner_84b84_00002 | TERMINATED |       | GRU                | False           | False      | True      | GRU                |      1 |       0.00152397 |                 1 |     0 |   0 |
| _inner_84b84_00003 | TERMINATED |       | GRU                | False           | False      | False     | GRU                |      1 |       0.00165081 |                 1 |     0 |   0 |
| _inner_84b84_00004 | TERMINATED |       | CNN                | False           | False      | False     | CNN                |      1 |       0.00173998 |                 1 |     0 |   0 |
| _inner_84b84_00005 | TERMINATED |       | LSTM               | True            | True       | True      | CNN                |      1 |       0.00219083 |                 1 |     0 |   0 |
| _inner_84b84_00006 | TERMINATED |       | GRU                | True            | False      | False     | LSTM               |      1 |       0.00192428 |                 1 |     0 |   0 |
| _inner_84b84_00007 | TERMINATED |       | LSTM               | True            | False      | False     | CNN                |      1 |       0.00208902 |                 1 |     0 |   0 |
| _inner_84b84_00008 | TERMINATED |       | LSTM               | True            | True       | True      | GRU                |      1 |       0.00146484 |                 1 |     0 |   0 |
| _inner_84b84_00009 | TERMINATED |       | CNN                | False           | False      | True      | CNN                |      1 |       0.00152087 |                 1 |     0 |   0 |
| _inner_84b84_00010 | TERMINATED |       | LSTM               | False           | True       | False     | CNN                |      1 |       0.00124121 |                 1 |     0 |   0 |
| _inner_84b84_00011 | TERMINATED |       | LSTM               | True            | True       | True      | CNN                |      1 |       0.00124812 |                 1 |     0 |   0 |
| _inner_84b84_00012 | TERMINATED |       | LSTM               | True            | True       | True      | LSTM               |      1 |       0.00133514 |                 1 |     0 |   0 |
| _inner_84b84_00013 | TERMINATED |       | LSTM               | True            | False      | True      | CNN                |      1 |       0.00142407 |                 1 |     0 |   0 |
| _inner_84b84_00014 | TERMINATED |       | GRU                | False           | False      | True      | LSTM               |      1 |       0.00120211 |                 1 |     0 |   0 |
+--------------------+------------+-------+--------------------+-----------------+------------+-----------+--------------------+--------+------------------+-------------------+-------+-----+

and the subsequent run:

Current best trial: 84b84_00014 with f=0 and parameters={'sentence_classification': False, 'norm_word_emb': 'False', 'use_crf': 'True', 'use_char': 'False', 'word_seq_feature': 'LSTM', 'char_seq_feature': 'GRU', 'seed': 1267}
Result logdir: /Users/rliaw/ray_results/_inner_2021-01-07_10-45-31
Number of trials: 15/15 (15 TERMINATED)
+--------------------+------------+-------+--------------------+-----------------+------------+-----------+--------------------+--------+------------------+-------------------+-------+-----+
| Trial name         | status     | loc   | char_seq_feature   | norm_word_emb   | use_char   | use_crf   | word_seq_feature   |   iter |   total time (s) |   end_of_training |   acc |   f |
|--------------------+------------+-------+--------------------+-----------------+------------+-----------+--------------------+--------+------------------+-------------------+-------+-----|
| _inner_84b84_00000 | TERMINATED |       | LSTM               | True            | False      | False     | LSTM               |      1 |       0.00149202 |                 1 |     0 |   0 |
| _inner_84b84_00001 | TERMINATED |       | CNN                | False           | True       | False     | CNN                |      1 |       0.0014801  |                 1 |     0 |   0 |
| _inner_84b84_00002 | TERMINATED |       | GRU                | False           | False      | True      | GRU                |      1 |       0.00152397 |                 1 |     0 |   0 |
| _inner_84b84_00003 | TERMINATED |       | GRU                | False           | False      | False     | GRU                |      1 |       0.00165081 |                 1 |     0 |   0 |
| _inner_84b84_00004 | TERMINATED |       | CNN                | False           | False      | False     | CNN                |      1 |       0.00173998 |                 1 |     0 |   0 |
| _inner_84b84_00005 | TERMINATED |       | LSTM               | True            | True       | True      | CNN                |      1 |       0.00219083 |                 1 |     0 |   0 |
| _inner_84b84_00006 | TERMINATED |       | GRU                | True            | False      | False     | LSTM               |      1 |       0.00192428 |                 1 |     0 |   0 |
| _inner_84b84_00007 | TERMINATED |       | LSTM               | True            | False      | False     | CNN                |      1 |       0.00208902 |                 1 |     0 |   0 |
| _inner_84b84_00008 | TERMINATED |       | LSTM               | True            | True       | True      | GRU                |      1 |       0.00146484 |                 1 |     0 |   0 |
| _inner_84b84_00009 | TERMINATED |       | CNN                | False           | False      | True      | CNN                |      1 |       0.00152087 |                 1 |     0 |   0 |
| _inner_84b84_00010 | TERMINATED |       | LSTM               | False           | True       | False     | CNN                |      1 |       0.00124121 |                 1 |     0 |   0 |
| _inner_84b84_00011 | TERMINATED |       | LSTM               | True            | True       | True      | CNN                |      1 |       0.00124812 |                 1 |     0 |   0 |
| _inner_84b84_00012 | TERMINATED |       | LSTM               | True            | True       | True      | LSTM               |      1 |       0.00133514 |                 1 |     0 |   0 |
| _inner_84b84_00013 | TERMINATED |       | LSTM               | True            | False      | True      | CNN                |      1 |       0.00142407 |                 1 |     0 |   0 |
| _inner_84b84_00014 | TERMINATED |       | GRU                | False           | False      | True      | LSTM               |      1 |       0.00120211 |                 1 |     0 |   0 |
+--------------------+------------+-------+--------------------+-----------------+------------+-----------+--------------------+--------+------------------+-------------------+-------+-----+

Notice that the trials and their configs are exactly the same (in the same order).



来源:https://stackoverflow.com/questions/65617962/obtaining-different-set-of-configs-across-multiple-calls-in-ray-tune

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!