I am trying to use the 128 byte embeddings produced by the pre-trained base of the VGGish model for transfer learning on audio data. Using python vggish_inference_demo.py --wav_file ...
to encode my training data to a tfrecord worked fine, but now I want to use this as an input to another model (e.g. a neural network I create with keras or something else). Using some similar questions and the documentation, I go this far with the first embedding record of one file:
tfrecords_filename = 'example1.tfrecord'
record_iterator = tf.python_io.tf_record_iterator(path=tfrecords_filename)
string_record = next(record_iterator)
example = tf.train.SequenceExample()
This produces
I am not even sure what this b'...'
is (there's more than 64 and fewer than 128 xs - so not sure how this lines up with anything).
Maybe I am missing some basic Python knowledge here, but how do I turn this into a numeric array of numbers that I can use as an input to some other model?
It turns out that these are bytes that can be converted to hex, which can be converted to an array of integers between 0 to 255.
import tensorflow as tf
import numpy as np
tfrecords_filename = 'example1.tfrecord'
record_iterator = tf.python_io.tf_record_iterator(path=tfrecords_filename)
string_record = next(record_iterator)
example = tf.train.SequenceExample()
hexembed = example.feature_lists.feature_list['audio_embedding'].feature[0].bytes_list.value[0].hex()
arrayembed = [int(hexembed[i:i+2],16) for i in range(0,len(hexembed),2)]
This produces output in the format that I desired:
[153, 7, 170, 62, 210, 95, 82, 95, 159, 187, 113, 78, 153, 161, 56, 86, 173, 127, 147, 240, 41, 221, 52, 128, 126, 176, 164, 100, 142, 133, 182, 136, 163, 63, 85, 166, 81, 91, 155, 3, 56, 255, 0, 69, 69, 62, 79, 74, 165, 184, 130, 56, 41, 151, 94, 138, 170, 18, 104, 255, 255, 195, 57, 206, 155, 19, 128, 0, 106, 202, 90, 172, 255, 255, 15, 172, 28, 144, 38, 210, 46, 98, 226, 123, 193, 21, 233, 186, 237, 212, 169, 255, 220, 181, 153, 93, 33, 4, 202, 255, 166, 59, 98, 224, 25, 191, 87, 235, 80, 33, 255, 197, 255, 130, 255, 26, 190, 236, 45, 104, 255, 141, 255, 13, 150, 0, 0, 255]