问题
I have a Tensorboard log file with 5 million samples. Tensorboard downsamples it when loading so that I don't run out of memory, but it's possible to override this behavior with --samples_per_plugin
and load all of them. If I do this, I will run out of memory. Suppose I want to load the first 1000 samples without downsampling (e.g. if I'm interested in the details of what's happening to my network at the beginning of training).
Is there a way to have Tensorboard load only a specified subset of samples? I don't think there's a command line argument as of today, but is there perhaps a way to edit the log files or the Tensorboard code or some other workaround?
回答1:
I don't think there is any way to get TensorBoard to do that, but it is possible to "slice" the events files. These files turn out to be record files (only with event data instead of examples), so you can read them as a TFRecordDataset. Apparently, there is a first record indicating the file version number, but other than that it should be straightforward. Assuming you only have the events you want to slice, you can use a function like this (TF 1.x, although it would be about the same in 2.x):
import tensorflow as tf
def slice_events(input_path, output_path, skip, take):
with tf.Graph().as_default():
ds = tf.data.TFRecordDataset([str(input_path)])
rec_first = ds.take(1).make_one_shot_iterator().get_next()
ds_data = ds.skip(skip + 1).take(take)
rec_data = ds_data.batch(1000).make_one_shot_iterator().get_next()
with tf.io.TFRecordWriter(str(output_path)) as writer, tf.Session() as sess:
writer.write(sess.run(rec_first))
while True:
try:
for ev in sess.run(rec_data):
writer.write(ev)
except tf.errors.OutOfRangeError: break
This makes a new events file from an existing one where the first skip
events are discarded and then take
events after that are saved. You can use other Dataset operations to choose what data to keep. For example, downsampling could be done as:
ds_data = ds.skip(1).window(1, 5).unbatch() # Takes one in five events
You can make a script to slice all the events files in a directory and save them into another one with the same structure, for example like this:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# slice_events.py
import sys
import os
from pathlib import Path
os.environ['CUDA_VISIBLE_DEVICES'] = '-1' # Not necessary to use GPU
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # Avoid log messages
def slice_events(input_path, output_path, skip, take):
# Import here to avoid loading on error
import tensorflow as tf
# Code from before...
def slice_events_dir(input_dir, output_dir, skip, take):
input_dir = Path(input_dir)
output_dir = Path(output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
for ev_file in input_dir.glob('**/*.tfevents*'):
out_file = Path(output_dir, ev_file.relative_to(input_dir))
out_file.parent.mkdir(parents=True, exist_ok=True)
slice_events(ev_file, out_file, skip, take)
if __name__ == '__main__':
if len(sys.argv) != 5:
print(f'{sys.argv[0]} <input dir> <output dir> <skip> <take>', file=sys.stderr)
sys.exit(1)
input_dir, output_dir, skip, take = sys.argv[1:]
skip = int(skip)
take = int(take)
slice_events_dir(input_dir, output_dir, skip, take)
Then you would use it as
$ python slice_events.py log log_sliced 100 1000
Note that this assumes the simple case where you just have a sequence of similar events to slice. If you have other kinds of events (e.g. the graph itself), or multiple types of interleaved events in the same file, or something else, then you wold need to adapt the logic as needed.
来源:https://stackoverflow.com/questions/58276718/how-to-load-selected-range-of-samples-in-tensorboard