From my understanding, the scikit-learn accepts data in (n-sample, n-feature) format which is a 2D array. Assuming I have data in the form ...
Stock prices in
This is not a CSV file; this is just a space separated file. Assuming there are no missing values, you can easily load this into a Numpy array called data
with
import numpy as np
f = open("filename.txt")
f.readline() # skip the header
data = np.loadtxt(f)
If the stock price is what you want to predict (your y
value, in scikit-learn terms), then you should split data
using
X = data[:, 1:] # select columns 1 through end
y = data[:, 0] # select column 0, the stock price
Alternatively, you might be able to massage the standard Python csv module into handling this type of file.