Tokenize words in a list of sentences Python

后端 未结 7 1456
广开言路
广开言路 2021-02-04 06:41

i currently have a file that contains a list that is looks like

example = [\'Mary had a little lamb\' , 
           \'Jack went up the hill\' , 
           \'Ji         


        
7条回答
  •  抹茶落季
    2021-02-04 07:24

    This also can be done by pytorch torchtext as

    from torchtext.data import get_tokenizer
    
    tokenizer = get_tokenizer('basic_english')
    example = ['Mary had a little lamb' , 
                'Jack went up the hill' , 
                'Jill followed suit' ,    
                'i woke up suddenly' ,
                'it was a really bad dream...']
    tokens = []
    for s in example:
        tokens += tokenizer(s)
    # ['mary', 'had', 'a', 'little', 'lamb', 'jack', 'went', 'up', 'the', 'hill', 'jill', 'followed', 'suit', 'i', 'woke', 'up', 'suddenly', 'it', 'was', 'a', 'really', 'bad', 'dream', '.', '.', '.']
    

提交回复
热议问题