How to unpack pkl file?

后端 未结 4 1899
陌清茗
陌清茗 2021-01-29 23:05

I have a pkl file from MNIST dataset, which consists of handwritten digit images.

I\'d like to take a look at each of those digit images, so I need to unpack the pkl fil

4条回答
  •  离开以前
    2021-01-30 00:06

    In case you want to work with the original MNIST files, here is how you can deserialize them.

    If you haven't downloaded the files yet, do that first by running the following in the terminal:

    wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
    wget http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
    wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
    wget http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
    

    Then save the following as deserialize.py and run it.

    import numpy as np
    import gzip
    
    IMG_DIM = 28
    
    def decode_image_file(fname):
        result = []
        n_bytes_per_img = IMG_DIM*IMG_DIM
    
        with gzip.open(fname, 'rb') as f:
            bytes_ = f.read()
            data = bytes_[16:]
    
            if len(data) % n_bytes_per_img != 0:
                raise Exception('Something wrong with the file')
    
            result = np.frombuffer(data, dtype=np.uint8).reshape(
                len(bytes_)//n_bytes_per_img, n_bytes_per_img)
    
        return result
    
    def decode_label_file(fname):
        result = []
    
        with gzip.open(fname, 'rb') as f:
            bytes_ = f.read()
            data = bytes_[8:]
    
            result = np.frombuffer(data, dtype=np.uint8)
    
        return result
    
    train_images = decode_image_file('train-images-idx3-ubyte.gz')
    train_labels = decode_label_file('train-labels-idx1-ubyte.gz')
    
    test_images = decode_image_file('t10k-images-idx3-ubyte.gz')
    test_labels = decode_label_file('t10k-labels-idx1-ubyte.gz')
    

    The script doesn't normalize the pixel values like in the pickled file. To do that, all you have to do is

    train_images = train_images/255
    test_images = test_images/255
    

提交回复
热议问题