I need a query to be split into words everywhere a non word character is used. For example:
query = \"I am a great, boy\'s and I like! to have: a lot-of-fun
I am adding this answer as @sawa's did not exactly reproduce the desired output:
#Split using any single non-word character:
query.split(/\W/) #=> ["I", "am", "a", "great", "", "boy", "s", "and", "I", "like", "", "to", "have", "", "a", "lot", "of", "fun", "and", "", "do", "", "nice", "acti", "vities", "enjoy", "good", "", "times"]
Now if you do not want the empty strings in the result just use sawa's answer.
The result above will create many empty strings in the result if the string contains multiple spaces, as each extra spaces will be matched again and create a new splitting point. To avoid that we can add an or condition:
# Split using any number of spaces or a single non-word character:
query.split(/\s+|\W/)
query.split(/\W+/)
# => ["I", "am", "a", "great", "boy", "s", "and", "I", "like", "to", "have", "a", "lot", "of", "fun", "and", "do", "nice", "acti", "vities", "enjoy", "good", "times"]
query.scan(/\w+/)
# => ["I", "am", "a", "great", "boy", "s", "and", "I", "like", "to", "have", "a", "lot", "of", "fun", "and", "do", "nice", "acti", "vities", "enjoy", "good", "times"]
This is different from the expected output in that it does not include empty strings.