I have a dataset as below.
Because of its large amount of data, I uploaded it through the sparklyr
package, so I can use only pipe statements.
Although this isn't the most elegant string of code, it should get the job done. Since no sample dataset is provided other than a screenshot, I just created a sample with the important elements you were interested in.
csj <- tibble(helpful = rep(c("[0,0]","[0,1]","[0,2]","[1,3]"),100),
overall = rep(c(5,4,3,2),100))
#this change the columns and creates the help column
csj %>%
mutate(col1 = as.numeric(stringi::stri_extract_first_regex(helpful, pattern = "[0-9]")),#extract first number
col2 = as.numeric(stringi::stri_extract_last_regex(helpful, pattern = "[0-9]")),#extract second
col3 = ifelse(col2 == 0, 1, row2 ),#change 0s to 1
help = col1/col3) %>% #divide row1 and 3
select(helpful, help)#select the rows you wish to keep
This should work as long as you modify the functions to your dataset as needed. Also note that helpful is a character type in your dataset which is why you need to change it to numeric
EDIT: So I looked up some sparklyr and realized why the code isn't working so I created an example for myself to test out.Although I didn't replicate your data completely I came up with enough things to hopefully provide a working solution.
library(sparklyr)
library(dplyr)
library(ggplot2)
library(magrittr)
sc <- spark_connect(master="local")
#create dataframe
cjs <- tibble(helpful = rep(c("[0, 0]","[0, 1]","[0, 2]","[1, 3]","[,1]",NA,"a"),100),
overall = rep(c(6,5,4,3,2,1,0),100))
#transfer to sparkly
csj <- copy_to(sc, csj,"cjs")
#this should do the trick
csj %>%
mutate(newcol2 = regexp_replace(helpful, "[^0-9,]", " "),
newcol3 = as.numeric(substring_index(newcol2, ",", 1)),
newcol4 = as.numeric(substring_index(newcol2,",",-1)),
newcol5 = ifelse(newcol4 == 0, 1, newcol4),
help = newcol3/newcol5) %>%
select(starts_with("new"),help) #select the columns you need with help calculated appropriately