问题
I have a .txt file which contains a different label on each line. I use this file to create a label index lookup file, for example:
label_index = tf.contrib.lookup.index_table_from_file(vocabulary_file = 'labels.txt'
I am wondering how I should package the vocabulary file with my cloud ml-engine? The packaging suggestions are explicit in how to set up the .py files but I am not entirely sure where I should put relevant .txt files. Should they just be hosted in a storage bucket (ie. gs://) that the engine has access to, or can they be packaged with the trainer somehow?
回答1:
You have multiple options. I think the most straightforward is to store labels.txt
in a GCS location.
However, if you prefer, you can also package the file up in your setup.py
. There are multiple ways to do this, so I'll refer you to the official setuptools documentation.
Let me walk through a quick example:
Create a setup.py
in the directory below your training package (often called trainer
in CloudML Engine's samples, so I will proceed as if you're code is structured the same as the samples, including using trainer
as the package). The following is based on the docs you referenced with one important change, namely, the package_data
argument instead of include_package_data
:
from setuptools import find_packages
from setuptools import setup
setup(
name='my_model',
version='0.1',
install_requires=REQUIRED_PACKAGES,
packages=find_packages(),
package_data={'trainer': ['labels.txt']},
description='My trainer application package.'
)
If you run python setup.py sdist
, you can see that trainer/labels.txt
was copied into the tarball.
Then in your code, you can access the file like this:
from pkg_resources import Requirement, resource_filename
resource_filename(Requirement.parse('trainer'),'labels.txt')
Note that to run this code locally, you're going to have to install your package: python setup.py install [--user]
.
And that's the primary reason I think storing the file on GCS might be easier.
来源:https://stackoverflow.com/questions/45641474/how-to-package-vocabulary-file-for-cloud-ml-engine