Tensorflow.js Dataset to Tensor?

ⅰ亾dé卋堺 提交于 2021-02-11 14:55:45

问题


Is there a recommended/efficient way to convert a tf.data.Dataset to a Tensor when the underlying 'data examples' in the Dataset are flat arrays?

I am using tf.data.csv to read and parse a CSV but then want to use the Tensorflow.js Core API to process the data as tf.Tensors.


回答1:


tf.data.Dataset.iterator() returns a promise of an iterator.

const it = await flattenedDataset.iterator()
   const t = []
   // read only the data for the first 5 rows
   // all the data need not to be read once 
   // since it will consume a lot of memory
   for (let i = 0; i < 5; i++) {
        let e = await it.next()
      t.push(...e.value)
   }
  tf.concat(await t, 0)

Using for await of

const asyncIterable = {
  [Symbol.asyncIterator]() {
    return {
      i: 0,
      async next() {
        if (this.i < 5) {
          this.i++
          const e = await it.next()
          return Promise.resolve({ value: e.value, done: false });
        }

        return Promise.resolve({ done: true });
      }
    };
  }
};

  const t = []
  for await (let e of asyncIterable) {
        if(e) {
          t.push(e)
        }
   }

const csvUrl =
'https://storage.googleapis.com/tfjs-examples/multivariate-linear-regression/data/boston-housing-train.csv';

(async function run() {
   // We want to predict the column "medv", which represents a median value of
   // a home (in $1000s), so we mark it as a label.
   const csvDataset = tf.data.csv(
     csvUrl, {
       columnConfigs: {
         medv: {
           isLabel: true
         }
       }
     });

   // Number of features is the number of column names minus one for the label
   // column.
   const numOfFeatures = (await csvDataset.columnNames()).length - 1;

   // Prepare the Dataset for training.
   const flattenedDataset =
     csvDataset
     .map(([rawFeatures, rawLabel]) =>
       // Convert rows from object form (keyed by column name) to array form.
       [...Object.values(rawFeatures), ...Object.values(rawLabel)])
   			.batch(1)
  
	const it = await flattenedDataset.iterator()
  const asyncIterable = {
  [Symbol.asyncIterator]() {
    return {
      i: 0,
      async next() {
        if (this.i < 5) {
          this.i++
          const e = await it.next()
          return Promise.resolve({ value: e.value, done: false });
        }

        return Promise.resolve({ done: true });
      }
    };
  }
};
  
  const t = []
  for await (let e of asyncIterable) {
    	if(e) {
          t.push(e)
        }
   }
  console.log(tf.concat(t, 0).shape)
})()
<html>
  <head>
    <!-- Load TensorFlow.js -->
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@0.14.1"> </script>
  </head>

  <body>
  </body>
</html>



回答2:


Beware that this workflow is not typically recommended, because materializing all the data in the main JavaScript memory may not work for large CSV datasets.

You can use the toArray() method of tf.data.Dataset objects. For example:

  const csvUrl =
'https://storage.googleapis.com/tfjs-examples/multivariate-linear-regression/data/boston-housing-train.csv';

  const csvDataset = tf.data.csv(
     csvUrl, {
       columnConfigs: {
         medv: {
           isLabel: true
         }
       }
     }).batch(4);

  const tensors = await csvDataset.toArray();
  console.log(tensors.length);
  console.log(tensors[0][0]);


来源:https://stackoverflow.com/questions/54955341/tensorflow-js-dataset-to-tensor

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!