How to package vocabulary file for Cloud ML Engine

限于喜欢 提交于 2020-02-25 04:47:50

问题


I have a .txt file which contains a different label on each line. I use this file to create a label index lookup file, for example:

label_index = tf.contrib.lookup.index_table_from_file(vocabulary_file = 'labels.txt'

I am wondering how I should package the vocabulary file with my cloud ml-engine? The packaging suggestions are explicit in how to set up the .py files but I am not entirely sure where I should put relevant .txt files. Should they just be hosted in a storage bucket (ie. gs://) that the engine has access to, or can they be packaged with the trainer somehow?


回答1:


You have multiple options. I think the most straightforward is to store labels.txt in a GCS location.

However, if you prefer, you can also package the file up in your setup.py. There are multiple ways to do this, so I'll refer you to the official setuptools documentation.

Let me walk through a quick example:

Create a setup.py in the directory below your training package (often called trainer in CloudML Engine's samples, so I will proceed as if you're code is structured the same as the samples, including using trainer as the package). The following is based on the docs you referenced with one important change, namely, the package_data argument instead of include_package_data:

from setuptools import find_packages
from setuptools import setup

setup(
    name='my_model',
    version='0.1',
    install_requires=REQUIRED_PACKAGES,
    packages=find_packages(),
    package_data={'trainer': ['labels.txt']},
    description='My trainer application package.'
)

If you run python setup.py sdist, you can see that trainer/labels.txt was copied into the tarball.

Then in your code, you can access the file like this:

from pkg_resources import Requirement, resource_filename
resource_filename(Requirement.parse('trainer'),'labels.txt')

Note that to run this code locally, you're going to have to install your package: python setup.py install [--user].

And that's the primary reason I think storing the file on GCS might be easier.



来源:https://stackoverflow.com/questions/45641474/how-to-package-vocabulary-file-for-cloud-ml-engine

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!