i currently have a file that contains a list that is looks like
example = [\'Mary had a little lamb\' ,
\'Jack went up the hill\' ,
\'Ji
This also can be done by pytorch
torchtext as
from torchtext.data import get_tokenizer
tokenizer = get_tokenizer('basic_english')
example = ['Mary had a little lamb' ,
'Jack went up the hill' ,
'Jill followed suit' ,
'i woke up suddenly' ,
'it was a really bad dream...']
tokens = []
for s in example:
tokens += tokenizer(s)
# ['mary', 'had', 'a', 'little', 'lamb', 'jack', 'went', 'up', 'the', 'hill', 'jill', 'followed', 'suit', 'i', 'woke', 'up', 'suddenly', 'it', 'was', 'a', 'really', 'bad', 'dream', '.', '.', '.']