How to read MNIST database in R?

后端 未结 5 690
迷失自我
迷失自我 2021-02-05 22:31

I\'m currently working on a case study for which I need to work on the MNIST database.
The files in this site are said to be in IDX file format. I tried to take a look at th

相关标签:
5条回答
  • 2021-02-05 23:14

    I tried the above, using:

    data <- readBin(to.read, integer(), size = 1, n = 784, endian="big")
    

    but ended up with both positive and negative integers in the image. Consequently, when plotted, using:

    plot(as.cimg(data))
    

    I get a grey background with the character in pixels that are darker or lighter than the background.

    I then used: (see [1]https://tensorflow.rstudio.com/tfestimators/articles/examples/mnist.html)

    data <- readBin(to.read, what = "raw", n = 784, endian="big")
    conv <- as.integer(data)
    mm <- matrix(conv, 28, 28)
    

    Now I have only positive values (0 to 255), and the plot gives a proper white character on a black background. Which is what I wanted.

    0 讨论(0)
  • 2021-02-05 23:15

    Here's how you can do it using Darch package:

    Run readMNIST('C:/Users/pj_/Dir/')

    Which will store test.RData and train.RData in your set directory. When you load these two files in your Workspace, you will be able to see 'testData', 'testLabels', 'trainData' and 'trainLabels' in your Global Environment.

    0 讨论(0)
  • 2021-02-05 23:21

    MNIST dataset is also available in the keras package.

    library(keras)
    mnist <- dataset_mnist()
    x_train <- mnist$train$x
    y_train <- mnist$train$y
    x_test <- mnist$test$x
    y_test <- mnist$test$y
    
    0 讨论(0)
  • Following up on the darch (not ~Darch~) package mentioned above:

    The package is called darch. It has been moved to MRAN (Microsoft R Application Network) but is available on CRAN as well.

    It provides two functions for the MNIST data:

    readMNIST which reads the ubyte files stored in your hard drive and saves them as test.Rdata and train.Rdata archives.

    provideMNIST which will download the files and call readMNIST on them.

    When calling these functions you need to give the directory names separated by a single slash e.g. readMNIST("..\MNIST\") (last slash required).

    If you download the files yourself you will need to change the file names: the gz archives contain files with extensions, like t10k-labels.idx1-ubyte but readMNIST looks for files without extension, like t10k-labels-idx1-ubyte, so you have to change the dot to a dash (with darch version 0.12.0, maybe they'll fix this).

    To load the files in R you need to use the load function (e.g. load("..\\MNIST\\test.Rdata"). This will create the matrices trainData and testData in the environment.

    For some reason I did not get any dimnames for the matrices.

    0 讨论(0)
  • 2021-02-05 23:27

    endian="big", not "high":

    > to.read = file("~/Downloads/t10k-images-idx3-ubyte", "rb")
    

    magic number:

    > readBin(to.read, integer(), n=1, endian="big")
    [1] 2051
    

    number of images:

    > readBin(to.read, integer(), n=1, endian="big")
    [1] 10000
    

    number of rows:

    > readBin(to.read, integer(), n=1, endian="big")
    [1] 28
    

    number of columns:

    > readBin(to.read, integer(), n=1, endian="big")
    [1] 28
    

    here comes the data:

    > readBin(to.read, integer(), n=1, endian="big")
    [1] 0
    > readBin(to.read, integer(), n=1, endian="big")
    [1] 0
    

    as per the training set image data description on the web site.

    Now you just need to loop and read 28*28 byte chunks into matrices.

    Start again:

     > to.read = file("~/Downloads/t10k-images-idx3-ubyte", "rb")
    

    skip header:

    > readBin(to.read, integer(), n=4, endian="big")
    [1]  2051 10000    28    28
    

    should really get the 28,28 from the header read but hard-coded here:

     > m = matrix(readBin(to.read,integer(), size=1, n=28*28, endian="big"),28,28)
     > image(m)
    

    Might need to transpose or flip the matrix, I think its an upside-down "7".

    par(mfrow=c(5,5))
    par(mar=c(0,0,0,0))
    for(i in 1:25){m = matrix(readBin(to.read,integer(), size=1, n=28*28, endian="big"),28,28);image(m[,28:1])}
    

    gets you:

    enter image description here

    Oh, and google leads me to: http://www.inside-r.org/packages/cran/darch/docs/readMNIST which might be useful.

    0 讨论(0)
提交回复
热议问题