How to unpack pkl file?

后端未结

关注

 4  1899

陌清茗 2021-01-29 23:05

I have a pkl file from MNIST dataset, which consists of handwritten digit images.

I\'d like to take a look at each of those digit images, so I need to unpack the pkl fil

4条回答

离开以前 (楼主)

2021-01-30 00:06

In case you want to work with the original MNIST files, here is how you can deserialize them.

If you haven't downloaded the files yet, do that first by running the following in the terminal:

wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
wget http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
wget http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz

Then save the following as deserialize.py and run it.

import numpy as np
import gzip

IMG_DIM = 28

def decode_image_file(fname):
    result = []
    n_bytes_per_img = IMG_DIM*IMG_DIM

    with gzip.open(fname, 'rb') as f:
        bytes_ = f.read()
        data = bytes_[16:]

        if len(data) % n_bytes_per_img != 0:
            raise Exception('Something wrong with the file')

        result = np.frombuffer(data, dtype=np.uint8).reshape(
            len(bytes_)//n_bytes_per_img, n_bytes_per_img)

    return result

def decode_label_file(fname):
    result = []

    with gzip.open(fname, 'rb') as f:
        bytes_ = f.read()
        data = bytes_[8:]

        result = np.frombuffer(data, dtype=np.uint8)

    return result

train_images = decode_image_file('train-images-idx3-ubyte.gz')
train_labels = decode_label_file('train-labels-idx1-ubyte.gz')

test_images = decode_image_file('t10k-images-idx3-ubyte.gz')
test_labels = decode_label_file('t10k-labels-idx1-ubyte.gz')

The script doesn't normalize the pixel values like in the pickled file. To do that, all you have to do is

train_images = train_images/255
test_images = test_images/255

0 讨论(0)

查看其它4个回答