Dataframe into numpy array with values comma seperated

后端 未结 3 696
-上瘾入骨i
-上瘾入骨i 2021-01-14 21:53

The Scenario

I\'ve read a csv (which is \\t seperated) into a Dataframe, which is now needed to be in a numpy array format for clustering without changing type

3条回答
  •  野的像风
    2021-01-14 22:48

    It seems you need read_csv for DataFrame first with filter only second and third column first and then convert to numpy array by values: import pandas as pd from sklearn.cluster import KMeans from pandas.compat import StringIO

    temp=u"""col,iid,rat
    4,1,0
    5,2,4
    6,3,3
    7,4,1"""
    #after testing replace 'StringIO(temp)' to 'filename.csv'
    df = pd.read_csv(StringIO(temp), usecols = [1,2])
    print (df)
       iid  rat
    0    1    0
    1    2    4
    2    3    3
    3    4    1
    
    X = df.values 
    print (X)
    [[1 0]
     [2 4]
     [3 3]
     [4 1]]
    
    kmeans = KMeans(n_clusters=2)
    a = kmeans.fit(X)
    print (a)
    KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
        n_clusters=2, n_init=10, n_jobs=1, precompute_distances='auto',
        random_state=None, tol=0.0001, verbose=0)
    

提交回复
热议问题