How to read MNIST database in R?

后端 未结 5 684
迷失自我
迷失自我 2021-02-05 22:31

I\'m currently working on a case study for which I need to work on the MNIST database.
The files in this site are said to be in IDX file format. I tried to take a look at th

5条回答
  •  孤城傲影
    2021-02-05 23:27

    endian="big", not "high":

    > to.read = file("~/Downloads/t10k-images-idx3-ubyte", "rb")
    

    magic number:

    > readBin(to.read, integer(), n=1, endian="big")
    [1] 2051
    

    number of images:

    > readBin(to.read, integer(), n=1, endian="big")
    [1] 10000
    

    number of rows:

    > readBin(to.read, integer(), n=1, endian="big")
    [1] 28
    

    number of columns:

    > readBin(to.read, integer(), n=1, endian="big")
    [1] 28
    

    here comes the data:

    > readBin(to.read, integer(), n=1, endian="big")
    [1] 0
    > readBin(to.read, integer(), n=1, endian="big")
    [1] 0
    

    as per the training set image data description on the web site.

    Now you just need to loop and read 28*28 byte chunks into matrices.

    Start again:

     > to.read = file("~/Downloads/t10k-images-idx3-ubyte", "rb")
    

    skip header:

    > readBin(to.read, integer(), n=4, endian="big")
    [1]  2051 10000    28    28
    

    should really get the 28,28 from the header read but hard-coded here:

     > m = matrix(readBin(to.read,integer(), size=1, n=28*28, endian="big"),28,28)
     > image(m)
    

    Might need to transpose or flip the matrix, I think its an upside-down "7".

    par(mfrow=c(5,5))
    par(mar=c(0,0,0,0))
    for(i in 1:25){m = matrix(readBin(to.read,integer(), size=1, n=28*28, endian="big"),28,28);image(m[,28:1])}
    

    gets you:

    enter image description here

    Oh, and google leads me to: http://www.inside-r.org/packages/cran/darch/docs/readMNIST which might be useful.

提交回复
热议问题