I\'ve read a csv (which is \\t seperated) into a Dataframe, which is now needed to be in a numpy array format for clustering without changing type
It seems you need read_csv for DataFrame
first with filter only second and third column first and then convert to numpy array by values:
import pandas as pd
from sklearn.cluster import KMeans
from pandas.compat import StringIO
temp=u"""col,iid,rat
4,1,0
5,2,4
6,3,3
7,4,1"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), usecols = [1,2])
print (df)
iid rat
0 1 0
1 2 4
2 3 3
3 4 1
X = df.values
print (X)
[[1 0]
[2 4]
[3 3]
[4 1]]
kmeans = KMeans(n_clusters=2)
a = kmeans.fit(X)
print (a)
KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
n_clusters=2, n_init=10, n_jobs=1, precompute_distances='auto',
random_state=None, tol=0.0001, verbose=0)