After consulting the online caTools documentation and the error message itself, my SplitRatio was correctly as a number between 0 to 1 at 0.7. But no matter how I changed the number to different decimals, I was still thrown this error message.
> split = sample.split(tweetsSparse$Negative, SplitRatio=0.7) Error in sample.split(tweetsSparse$Negative, SplitRatio = 0.7) : Error in sample.split: 'SplitRatio' parameter has to be i [0, 1] range or [1, length(Y)] range
http://cran.r-project.org/web/packages/caTools/caTools.pdf
Short Story:
- change this one line: tweetsSparse$Negative = tweets$Negative
- to: tweetsSparse$Negative = tweets$negative
(Too) Long Story:
As MrFlick notes, list enough code for the issue to be reproducible, including some info about tweetsSparse. Luckily I am in the same MOOC and can offer some assistance without additional info; though with this limited info, MrFlick hit upon the issue.
If you run colnames(tweets) on the original data frame from which we created the tweetsSparse dataframe, you'll see: [1] "Tweet" "Avg" "negative"
But when generating tweetsSparse's "Negative column", the Prof. typed: tweetsSparse$Negative = tweets$Negative
R looked for the appropriate info in the tweets data frame, but did not find it because we directed it to look for the capitalized "Negative" when the actual column we wanted was the lowercase "negative." In response, the column was not added; run colnames(tweetsSparse) and it is not listed.
When we then called sample.split on tweetsSparse with the dependent variable tweetsSparse$Negative, R looked for the column, but was given NULL (i.e. the returned value when looking for a column using the $ in a data frame that lacks said column; for instance, tweets$missing_col also returns NULL when run in R). sample.split expected a vector, I think, and so it threw an error. Look at the code of sample.split by typing it (without parens after it) and you'll see that it performed a sanity check that compared the length of tweetsSparse$Negative to the SplitRatio to know that an input error was made. The length(tweetsSparse$Negative) is zero (i.e. the length(NULL) == 0), which is less than the SplitRatio (a number less than one); clearly you can't meaningfully split zero items to get a subset with 70% of the items and one with 30% of the items.