how to Load CSV Data in scikit and using it for Naive Bayes Classification

后端 未结 1 1007
隐瞒了意图╮
隐瞒了意图╮ 2021-02-10 09:42

Trying to load custom data to perform NB Classification in Scikit. Need help in loading the sample data into Scikit and then perform NB. How to load categorical values for targe

1条回答
  •  生来不讨喜
    2021-02-10 10:14

    The following should get you started you will need pandas and numpy. You can load your .csv into a data frame and use that to input into the model. You all so need to define targets (0 for negatives and 1 for positives, assuming binary classification) depending on what you are trying to separate.

    from sklearn.naive_bayes import GaussianNB
    import pandas as pd
    import numpy as np
    
    # create data frame containing your data, each column can be accessed # by df['column   name']
    df = pd.read_csv('/your/path/yourFile.csv')
    
    target_names = np.array(['Positives','Negatives'])
    
    # add columns to your data frame
    df['is_train'] = np.random.uniform(0, 1, len(df)) <= 0.75
    df['Type'] = pd.Factor(targets, target_names)
    df['Targets'] = targets
    
    # define training and test sets
    train = df[df['is_train']==True]
    test = df[df['is_train']==False]
    
    trainTargets = np.array(train['Targets']).astype(int)
    testTargets = np.array(test['Targets']).astype(int)
    
    # columns you want to model
    features = df.columns[0:7]
    
    # call Gaussian Naive Bayesian class with default parameters
    gnb = GaussianNB()
    
    # train model
    y_gnb = gnb.fit(train[features], trainTargets).predict(train[features])
    

    0 讨论(0)
提交回复
热议问题