I am trying to run the following code for a brief machine learning algorithm:
import re
import argparse
import csv
from collections import Counter
from sklea
Apart from what @szymon has mentioned you can alternatively load dataset using:
from six.moves import urllib
from sklearn.datasets import fetch_mldata
from scipy.io import loadmat
mnist_alternative_url = "https://github.com/amplab/datascience-sp14/raw/master/lab7/mldata/mnist-original.mat"
mnist_path = "./mnist-original.mat"
response = urllib.request.urlopen(mnist_alternative_url)
with open(mnist_path, "wb") as f:
content = response.read()
f.write(content)
mnist_raw = loadmat(mnist_path)
mnist = {
"data": mnist_raw["data"].T,
"target": mnist_raw["label"][0],
"COL_NAMES": ["label", "data"],
"DESCR": "mldata.org dataset: mnist-original",
}
I experienced the same issue and found different file size of mnist-original.mat at different times while I use my poor WiFi. I switched to LAN and it works fine. It maybe the issue of networking.
If you didn't give the data_home, program look the ${yourprojectpath}/mldata/minist-original.mat you can download the program and put the file the correct path
Try it like this:
dataDict = fetch_mldata('MNIST original')
This worked for me. Since you used the from ... import ...
syntax, you shouldn't prepend datasets
when you use it
I also had this problem in the past. It is due to the dataset is quite large (about 55.4 mb), I run the "fetch_mldata" but because of the internet connection, it took awhile to download them all. I did not know and interrupt the process.
The dataset is corrupted and that why the error happened.