Sample.split in R - SplitRatio parameter

匿名 (未验证) 提交于 2019-12-03 02:43:01

问题:

After consulting the online caTools documentation and the error message itself, my SplitRatio was correctly as a number between 0 to 1 at 0.7. But no matter how I changed the number to different decimals, I was still thrown this error message.

> split = sample.split(tweetsSparse$Negative, SplitRatio=0.7) Error in sample.split(tweetsSparse$Negative, SplitRatio = 0.7) :    Error in sample.split: 'SplitRatio' parameter has to be i [0, 1] range or [1, length(Y)] range 

http://cran.r-project.org/web/packages/caTools/caTools.pdf

回答1:

Short Story:

  • change this one line: tweetsSparse$Negative = tweets$Negative
  • to: tweetsSparse$Negative = tweets$negative

(Too) Long Story:

As MrFlick notes, list enough code for the issue to be reproducible, including some info about tweetsSparse. Luckily I am in the same MOOC and can offer some assistance without additional info; though with this limited info, MrFlick hit upon the issue.

If you run colnames(tweets) on the original data frame from which we created the tweetsSparse dataframe, you'll see: [1] "Tweet" "Avg" "negative"

But when generating tweetsSparse's "Negative column", the Prof. typed: tweetsSparse$Negative = tweets$Negative

R looked for the appropriate info in the tweets data frame, but did not find it because we directed it to look for the capitalized "Negative" when the actual column we wanted was the lowercase "negative." In response, the column was not added; run colnames(tweetsSparse) and it is not listed.

When we then called sample.split on tweetsSparse with the dependent variable tweetsSparse$Negative, R looked for the column, but was given NULL (i.e. the returned value when looking for a column using the $ in a data frame that lacks said column; for instance, tweets$missing_col also returns NULL when run in R). sample.split expected a vector, I think, and so it threw an error. Look at the code of sample.split by typing it (without parens after it) and you'll see that it performed a sanity check that compared the length of tweetsSparse$Negative to the SplitRatio to know that an input error was made. The length(tweetsSparse$Negative) is zero (i.e. the length(NULL) == 0), which is less than the SplitRatio (a number less than one); clearly you can't meaningfully split zero items to get a subset with 70% of the items and one with 30% of the items.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!