How do i split a very long string into a list of shorter strings in python

二次信任 提交于 2019-12-12 15:15:35

问题


In my current django project I have a model that stores very long strings (can be 5000-10000 or even more characters per DB entry) and then i need to split them when a user is calling the record (it really need to be in one record in the DB). What i need is it to return a list (queryset? depends if in the "SQL" part or getting all the list as is and doing the parsing in the view) of shorter strings (100 - 500 characters per sting in the list i return to the template).

I couldn't find anywhere a python split command nor example or any kind of answer for that....

I could always count words and append but count words.... but i am sure there has to be some kind of function for that sort of things....

EDIT: thank you everyone, but i guess i wasn't understood,

Example:

The String: "This is a very long string with many many many many and many more sentences and there is not one character that i can use to split by, just by number of words"

the string is a textField of django model.

i need to split it, lets say every 5 words so i will get:

['This is a very long string','with many many many many','and many more sentences and','there is not one character','that i can use to','split by, just by number',' of words']

The thing is that is almost every programming languages there is split per number of words" kind of utility function but i can't find one in python.

thanks, Erez


回答1:


>>> s = "This is a very long string with many many many many and many more sentences and there is not one character that i can use to split by, just by number of words"
>>> l = s.split()
>>> n = 5
>>> [' '.join(l[x:x+n]) for x in xrange(0, len(l), n)]
['This is a very long',
 'string with many many many',
 'many and many more sentences',
 'and there is not one',
 'character that i can use',
 'to split by, just by',
 'number of words']



回答2:


Here is an idea:

def split_chunks(s, chunksize):
    pos = 0
    while(pos != -1):
        new_pos = s.rfind(" ", pos, pos+chunksize)
        if(new_pos == pos):
            new_pos += chunksize # force split in word
        yield s[pos:new_pos]
        pos = new_pos

This tries to split strings into chunks at most chunksize in length. It tries to split at spaces, but if it can't it splits in the middle of a word:

>>> foo = "asdf qwerty sderf sdefw regf"
>>> list(split_chunks(foo, 6)
['asdf', ' qwert', 'y', ' sderf', ' sdefw', ' regf', '']

I guess it requires some tweaking though (for instance how to handle splits that occur inside words), but it should give you a starting point.


To split by number of words, do this:

def split_n_chunks(s, words_per_chunk):
    s_list = s.split()
    pos = 0
    while pos < len(s_list):
        yield s_list[pos:pos+words_per_chunk]
        pos += words_per_chunk


来源:https://stackoverflow.com/questions/6186746/how-do-i-split-a-very-long-string-into-a-list-of-shorter-strings-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!