training-data

R - factor examcard has new levels

只谈情不闲聊 提交于 2019-12-13 02:15:33
问题 I built a classification model in R using C5.0 given below: library(C50) library(caret) a = read.csv("All_SRN.csv") set.seed(123) inTrain <- createDataPartition(a$anatomy, p = .70, list = FALSE) training <- a[ inTrain,] test <- a[-inTrain,] Tree <- C5.0(anatomy ~ ., data = training, trControl = trainControl(method = "repeatedcv", repeats = 10, classProb = TRUE)) TreePred <- predict(Tree, test) The training set has features like - examcard, coil_used, anatomy_region, bodypart_anatomy and

Predictionio evaluation fails with empty.maxBy exception and training with java.lang.OutOfMemoryError

大城市里の小女人 提交于 2019-12-12 18:18:02
问题 I have downloaded the latest update on text classification template. I created a new app and imported stopwords.json and emails.json by specifying app id $ pio import --appid <appID> --input data/stopwords.json $ pio import --appid <appID> --input data/emails.json Then I changed engine.json and given my app name in it. { "id": "default", "description": "Default settings", "engineFactory": "org.template.textclassification.TextClassificationEngine", "datasource": { "params": { "appName": "

How to random search in a specified grid in caret package?

空扰寡人 提交于 2019-12-12 01:26:20
问题 I wonder it is possible to use random search in a predefined grid. For example, my grid has alpha and lambda for glmnet method. alpha is between 0 and 1, and lambda is between -10 to 10. I want to use random search 5 times to randomly try points in this bound. I wrote the following code for grid search and it works fine, but I cannot modify it for random search in a bound. rand_ctrl <- trainControl(method = "repeatedcv", repeats = 5, search = "random") grid <- expand.grid(alpha=seq(0,1,0.1)

huge size in model output from train function in r caret package

不打扰是莪最后的温柔 提交于 2019-12-12 01:26:12
问题 I am training a bagFDA model using train() function in r caret package, and save the model output as a .Rdata file. the input file is about 300k records with 26 variables, but the output .Rdata has a size of 3G. I simply run the following: modelout <- train(x,y,method="bagFDA") save(file= "myout.Rdata", modelout) under a window system. question: (1) why myout.Rdata is so big? (2) how can I reduce the size of the file? Thanks in advance! JT 回答1: In the trainControl set returnData = FALSE for

DescriptorMatcher OpenCV train()

久未见 提交于 2019-12-11 11:43:41
问题 The Documentation of OpenCV mentions the function "train()" within a DescriptorMatcher. "virtual void cv::cuda::DescriptorMatcher::train ( ) pure virtual Trains a descriptor matcher. Trains a descriptor matcher (for example, the flann index). In all methods to match, the method train() is run every time before matching."(docs) That's all is said there. Does someone know hot it works? Especially what the DescriptorMatcher needs to train itself. A short example in some OOP language would be

Ideas to improve Haar training results

杀马特。学长 韩版系。学妹 提交于 2019-12-11 09:24:56
问题 Please help to get more knowledge on the my first-time haar training results. So I want to train Haar classifier to recognize simple pen, following Dileep Kumar’s article. Using my cellphone I made 14 pen pictures. These pictures size is big about: 263x2814 Then I collected negative pictures, some of them downloaded from web, with size 640x480 , and some of them made using my phone camera, with size: 1920x1080,5313x2388 Some of these negative images are really big. I have total 158 negative

Training and evaluating spaCy model by sentences or paragraphs

柔情痞子 提交于 2019-12-11 06:15:38
问题 Observation: Paragraph: I love apple. I eat one banana a day Sentence: I love apple. , I eat one banana a day There are two sentences in this paragraph, I love apple and I eat one banana a day . If I put the whole paragraph into spaCy, it'll recognize only one entity, for example, apple , but if I put the sentences in paragraph one by one, spaCy can recognize two entities, apple and banana .( This is just an example to show my point, the actual recognition result could be different )

Same form of dataset has 2 different shapes

会有一股神秘感。 提交于 2019-12-11 06:09:47
问题 I am quite new to Machine Learning and am just grasping the techniques. As such, I am trying to train a model on the following classifiers using a dataset that has 4 features and the target feature/class (the truth value 1 or 0 ). Classifiers SGD Classifier Random Forest Classifier Linear Support Vector Classifier Gaussian Process Classifier I am training the model on the following dataset [Part of the dataset is shown below]. Training set : train_sop_truth.csv Subject,Predicate,Object

Bfloat16 training in GPUs

独自空忆成欢 提交于 2019-12-11 05:18:37
问题 Hi I am trying to train a model using the new bfloat16 datatype variables. I know this is supported in Google TPUs. I was wondering if anyone has tried training using GPUs (for example, GTX 1080 Ti). Is that even possible, whether the GPU tensor cores are supportive? If anyone has any experience please share your thoughts. Many thanks! 回答1: I had posted this question in Tensorflow github community. Here is their response so far - " bfloat16 support isn't complete for GPUs, as it's not

Where to get negative sample images for Haar training? [closed]

三世轮回 提交于 2019-12-11 04:09:42
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 4 years ago . I need a collection of sample images to train a Haar-based classifier for face detection. I read that a ratio of 2 negative examples for each positive example is acceptable. I searched around the web and found many databases containing positive examples to train my classifier (that is, images that contain faces)