How to load sparse data with TensorFlow?

后端 未结 5 1424
故里飘歌
故里飘歌 2021-02-12 23:23

There is a small snippet about loading sparse data but I have no idea how to use it.

SparseTensors don\'t play well with queues. If you use SparseTensors

相关标签:
5条回答
  • 2021-02-12 23:28

    First, to explain what that documentation means:

    1. For dense data usually you are doing:

      Serialized Example (from reader) -> parse_single_example -> batch queue -> use it.

    2. For sparse data you currently need to do:

      Serialized Example (from reader) -> batch queue -> parse_example -> use it.

    An example of this would be:

    reader  = tf.TFRecordReader()
    _, serialized_example = reader.read(filename_queue)
    batch_serialized_examples = tf.shuffle_batch([serialized_example], batch_size)
    feature_to_type = {
      'label': tf.FixedLenFeature([1], dtype=tf.int64),
      'sparse_feature': tf.VarLenFeature(dtype=tf.int64)
    }
    features = tf.parse_example(batch_serialized_examples, feature_to_type)
    

    Note, shuffle_batch takes a series of strings and returns batch of strings. label should be fixed len of rank == 1 from your example.

    0 讨论(0)
  • 2021-02-12 23:30

    If you are passing sparse values as inputs , you need to create sparse placeholders using tf.sparse_placeholder.

    You should then convert your sparse tensors to dense tensor using tf.sparse_to_dense.

    For this you need to explicitly pass the sparse matrix's values , shape and indices while feeding the data in feed_dict and then later use tf.sparse_to_dense in the graph.

    In the graph :

    dense = tf.sparse_to_dense(
        sparse_indices=sparse_placeholder.indices,
        output_shape=sparse_placeholder.shape,
        sparse_values=sparse_placeholder.values,
    validate_indices=False)
    

    In the feed_dict:

    sparse_placeholder:tf.SparseTensorValue(indices=indices,values=sparse_values,dense_shape=sparse_shape)
    
    0 讨论(0)
  • 2021-02-12 23:45

    For libsvm format you can write and parse like below, if you want sparse tensor result(as opposed to dense tensor result using padding strategy)

        #---write
        _float_feature = lambda v: tf.train.Feature(float_list=tf.train.FloatList(value=v))
        _int_feature = lambda v: tf.train.Feature(int64_list=tf.train.Int64List(value=v))
    
        indexes = []
        values = []
    
        for item in l[start:]:
          index,value = item.split(':')
          indexes.append(int(index))
          values.append(float(value))
    
        example = tf.train.Example(features=tf.train.Features(feature={
          'label': _int_feature([label]),
          'num_features': _int_feature([num_features]),
          'index': _int_feature(indexes),
          'value': _float_feature(values)
          }))
    
        writer.write(example.SerializeToString())
    
        #---read
        def decode(batch_serialized_examples):
            features = tf.parse_example(
                batch_serialized_examples,
                features={
                    'label' : tf.FixedLenFeature([], tf.int64),
                    'index' : tf.VarLenFeature(tf.int64),
                    'value' : tf.VarLenFeature(tf.float32),
                })
    
            label = features['label']
            index = features['index']
            value = features['value']
    
            return label, index, value
    

    So by this way you will get label as dense tensor, index and value as two sparse tensors, you can see one self contained example of writing libsvm format to TFRecord and read it for mlp classification from

    https://github.com/chenghuige/tensorflow-example/tree/master/examples/tf-record/sparse https://github.com/chenghuige/tensorflow-example/tree/master/examples/text-classification

    0 讨论(0)
  • 2021-02-12 23:49

    Store indices and values in your TFRecords Examples, and parse with SparseFeature. For example, to store and load a sparse representation for:

    [[0, 0, 0, 0, 0, 7],
     [0, 5, 0, 0, 0, 0],
     [0, 0, 0, 0, 9, 0],
     [0, 0, 0, 0, 0, 0]]
    

    This creates a TFRecords Example:

    my_example = tf.train.Example(features=tf.train.Features(feature={
        'index_0': tf.train.Feature(int64_list=tf.train.Int64List(value=[0, 1, 2])),
        'index_1': tf.train.Feature(int64_list=tf.train.Int64List(value=[5, 1, 4])),
        'values': tf.train.Feature(int64_list=tf.train.Int64List(value=[7, 5, 9]))
    }))
    my_example_str = my_example.SerializeToString()
    

    And this parses it with SparseFeature:

    my_example_features = {'sparse': tf.SparseFeature(index_key=['index_0', 'index_1'],
                                                      value_key='values',
                                                      dtype=tf.int64,
                                                      size=[4, 6])}
    serialized = tf.placeholder(tf.string)
    parsed = tf.parse_single_example(serialized, features=my_example_features)
    session.run(parsed, feed_dict={serialized: my_example_str})
    
    ## {'sparse': SparseTensorValue(indices=array([[0, 5], [1, 1], [2, 4]]),
    ##                              values=array([7, 5, 9]),
    ##                              dense_shape=array([4, 6]))}
    

    More exposition: Sparse Tensors and TFRecords

    0 讨论(0)
  • 2021-02-12 23:50

    You can use weighted_categorical_column to parse index and value, eg.

    categorical_column = tf.feature_column.categorical_column_with_identity(
                key='index', num_buckets=your_feature_dim)
    sparse_columns = tf.feature_column.weighted_categorical_column(
        categorical_column=categorical_column, weight_feature_key='value')
    

    then feed sparse_columns to linear model estimator, before feed to DNN, please use embedding, eg.

    dense_columns = tf.feature_column.embedding_column(sparse_columns, your_embedding_dim)
    

    then feed dense_columns to your DNN estimator

    0 讨论(0)
提交回复
热议问题