How to unpack pkl file?

后端 未结 4 1895
陌清茗
陌清茗 2021-01-29 23:05

I have a pkl file from MNIST dataset, which consists of handwritten digit images.

I\'d like to take a look at each of those digit images, so I need to unpack the pkl fil

相关标签:
4条回答
  • 2021-01-29 23:45

    The pickle (and gzip if the file is compressed) module need to be used

    NOTE: These are already in the standard Python library. No need to install anything new

    0 讨论(0)
  • 2021-01-29 23:51

    Generally

    Your pkl file is, in fact, a serialized pickle file, which means it has been dumped using Python's pickle module.

    To un-pickle the data you can:

    import pickle
    
    
    with open('serialized.pkl', 'rb') as f:
        data = pickle.load(f)
    

    For the MNIST data set

    Note gzip is only needed if the file is compressed:

    import gzip
    import pickle
    
    
    with gzip.open('mnist.pkl.gz', 'rb') as f:
        train_set, valid_set, test_set = pickle.load(f)
    

    Where each set can be further divided (i.e. for the training set):

    train_x, train_y = train_set
    

    Those would be the inputs (digits) and outputs (labels) of your sets.

    If you want to display the digits:

    import matplotlib.cm as cm
    import matplotlib.pyplot as plt
    
    
    plt.imshow(train_x[0].reshape((28, 28)), cmap=cm.Greys_r)
    plt.show()
    

    mnist_digit

    The other alternative would be to look at the original data:

    http://yann.lecun.com/exdb/mnist/

    But that will be harder, as you'll need to create a program to read the binary data in those files. So I recommend you to use Python, and load the data with pickle. As you've seen, it's very easy. ;-)

    0 讨论(0)
  • 2021-01-29 23:55

    Handy one-liner

    pkl() (
      python -c 'import pickle,sys;d=pickle.load(open(sys.argv[1],"rb"));print(d)' "$1"
    )
    pkl my.pkl
    

    Will print __str__ for the pickled object.

    The generic problem of visualizing an object is of course undefined, so if __str__ is not enough, you will need a custom script.

    0 讨论(0)
  • 2021-01-30 00:06

    In case you want to work with the original MNIST files, here is how you can deserialize them.

    If you haven't downloaded the files yet, do that first by running the following in the terminal:

    wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
    wget http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
    wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
    wget http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
    

    Then save the following as deserialize.py and run it.

    import numpy as np
    import gzip
    
    IMG_DIM = 28
    
    def decode_image_file(fname):
        result = []
        n_bytes_per_img = IMG_DIM*IMG_DIM
    
        with gzip.open(fname, 'rb') as f:
            bytes_ = f.read()
            data = bytes_[16:]
    
            if len(data) % n_bytes_per_img != 0:
                raise Exception('Something wrong with the file')
    
            result = np.frombuffer(data, dtype=np.uint8).reshape(
                len(bytes_)//n_bytes_per_img, n_bytes_per_img)
    
        return result
    
    def decode_label_file(fname):
        result = []
    
        with gzip.open(fname, 'rb') as f:
            bytes_ = f.read()
            data = bytes_[8:]
    
            result = np.frombuffer(data, dtype=np.uint8)
    
        return result
    
    train_images = decode_image_file('train-images-idx3-ubyte.gz')
    train_labels = decode_label_file('train-labels-idx1-ubyte.gz')
    
    test_images = decode_image_file('t10k-images-idx3-ubyte.gz')
    test_labels = decode_label_file('t10k-labels-idx1-ubyte.gz')
    

    The script doesn't normalize the pixel values like in the pickled file. To do that, all you have to do is

    train_images = train_images/255
    test_images = test_images/255
    
    0 讨论(0)
提交回复
热议问题