问题
Is there a recommended/efficient way to convert a tf.data.Dataset
to a Tensor
when the underlying 'data examples' in the Dataset
are flat arrays?
I am using tf.data.csv
to read and parse a CSV but then want to use the Tensorflow.js Core API to process the data as tf.Tensors
.
回答1:
tf.data.Dataset.iterator()
returns a promise of an iterator.
const it = await flattenedDataset.iterator()
const t = []
// read only the data for the first 5 rows
// all the data need not to be read once
// since it will consume a lot of memory
for (let i = 0; i < 5; i++) {
let e = await it.next()
t.push(...e.value)
}
tf.concat(await t, 0)
Using for await of
const asyncIterable = {
[Symbol.asyncIterator]() {
return {
i: 0,
async next() {
if (this.i < 5) {
this.i++
const e = await it.next()
return Promise.resolve({ value: e.value, done: false });
}
return Promise.resolve({ done: true });
}
};
}
};
const t = []
for await (let e of asyncIterable) {
if(e) {
t.push(e)
}
}
const csvUrl =
'https://storage.googleapis.com/tfjs-examples/multivariate-linear-regression/data/boston-housing-train.csv';
(async function run() {
// We want to predict the column "medv", which represents a median value of
// a home (in $1000s), so we mark it as a label.
const csvDataset = tf.data.csv(
csvUrl, {
columnConfigs: {
medv: {
isLabel: true
}
}
});
// Number of features is the number of column names minus one for the label
// column.
const numOfFeatures = (await csvDataset.columnNames()).length - 1;
// Prepare the Dataset for training.
const flattenedDataset =
csvDataset
.map(([rawFeatures, rawLabel]) =>
// Convert rows from object form (keyed by column name) to array form.
[...Object.values(rawFeatures), ...Object.values(rawLabel)])
.batch(1)
const it = await flattenedDataset.iterator()
const asyncIterable = {
[Symbol.asyncIterator]() {
return {
i: 0,
async next() {
if (this.i < 5) {
this.i++
const e = await it.next()
return Promise.resolve({ value: e.value, done: false });
}
return Promise.resolve({ done: true });
}
};
}
};
const t = []
for await (let e of asyncIterable) {
if(e) {
t.push(e)
}
}
console.log(tf.concat(t, 0).shape)
})()
<html>
<head>
<!-- Load TensorFlow.js -->
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@0.14.1"> </script>
</head>
<body>
</body>
</html>
回答2:
Beware that this workflow is not typically recommended, because materializing all the data in the main JavaScript memory may not work for large CSV datasets.
You can use the toArray()
method of tf.data.Dataset
objects. For example:
const csvUrl =
'https://storage.googleapis.com/tfjs-examples/multivariate-linear-regression/data/boston-housing-train.csv';
const csvDataset = tf.data.csv(
csvUrl, {
columnConfigs: {
medv: {
isLabel: true
}
}
}).batch(4);
const tensors = await csvDataset.toArray();
console.log(tensors.length);
console.log(tensors[0][0]);
来源:https://stackoverflow.com/questions/54955341/tensorflow-js-dataset-to-tensor