问题
I'm working with the MNIST data set. I pulled down the original binary files (i.e. -ubyte; 784 columns X 60,000 rows for training data set), and converted them to CSV so I could do some processing on them.
Now I want to convert the CSV files back to ubyte, to upload them to a pipeline I'm testing.
I found this code, but I would have thought converting .csv to ubyte would be a common process, particularly as the MNIST data set is so famous, and I'm wondering am I missing something and if there's a simpler solution that someone knows of (e.g. I was trying to find something in pandas or numpy?)
Edit 1: I tried this:
import sys
output_file = open(sys.argv[2], 'wb')
for line in open(sys.argv[1]):
output_file.write(line)
output_file.close()
But I got:
File "4.convert_to_binary.py", line 4, in <module>
output_file.write(line)
TypeError: a bytes-like object is required, not 'str'
Edit 2: I think it's clear because I said I'm using MNIST data set, but I'm talking about the specific binary type that MNIST uses, as described here
Edit 3: I made a small amount of progress in that I think the format I'm looking for is IDX format here, so now I think my question is clearer: how to convert from CSV to idx binary (assuming this post is right and MNIST uses idx binary).
来源:https://stackoverflow.com/questions/65143325/convert-mnist-data-set-from-csv-to-ubyte-format