Titlecasing a string with exceptions

前端 未结 9 1551
栀梦
栀梦 2020-11-28 06:28

Is there a standard way in Python to titlecase a string (i.e. words start with uppercase characters, all remaining cased characters have lowercase) but leaving articles like

相关标签:
9条回答
  • 2020-11-28 07:09

    There are these methods:

    >>> mytext = u'i am a foobar bazbar'
    >>> print mytext.capitalize()
    I am a foobar bazbar
    >>> print mytext.title()
    I Am A Foobar Bazbar
    

    There's no lowercase article option. You'd have to code that yourself, probably by using a list of articles you want to lower.

    0 讨论(0)
  • 2020-11-28 07:09

    One important case that is not being considered is acronyms (the python-titlecase solution can handle acronyms if you explicitly provide them as exceptions). I prefer instead to simply avoid down-casing. With this approach, acronyms that are already upper case remain in upper case. The following code is a modification of that originally provided by dheerosaur.

    # This is an attempt to provide an alternative to ''.title() that works with 
    # acronyms.
    # There are several tricky cases to worry about in typical order of importance:
    # 0. Upper case first letter of each word that is not an 'minor' word.
    # 1. Always upper case first word.
    # 2. Do not down case acronyms
    # 3. Quotes
    # 4. Hyphenated words: drive-in
    # 5. Titles within titles: 2001 A Space Odyssey
    # 6. Maintain leading spacing
    # 7. Maintain given spacing: This is a test.  This is only a test.
    
    # The following code addresses 0-3 & 7.  It was felt that addressing the others 
    # would add considerable complexity.
    
    
    def titlecase(
        s,
        exceptions = (
            'and', 'or', 'nor', 'but', 'a', 'an', 'and', 'the', 'as', 'at', 'by',
            'for', 'in', 'of', 'on', 'per', 'to'
        )
    ):
        words = s.strip().split(' ')
            # split on single space to maintain word spacing
            # remove leading and trailing spaces -- needed for first word casing
    
        def upper(s):
            if s:
                if s[0] in '‘“"‛‟' + "'":
                    return s[0] + upper(s[1:])
                return s[0].upper() + s[1:]
            return ''
    
        # always capitalize the first word
        first = upper(words[0])
    
        return ' '.join([first] + [
            word if word.lower() in exceptions else upper(word)
            for word in words[1:]
        ])
    
    
    cases = '''
        CDC warns about "aggressive" rats as coronavirus shuts down restaurants
        L.A. County opens churches, stores, pools, drive-in theaters
        UConn senior accused of killing two men was looking for young woman
        Giant asteroid that killed the dinosaurs slammed into Earth at ‘deadliest possible angle,’ study reveals
        Maintain given spacing: This is a test.  This is only a test.
    '''.strip().splitlines()
    
    for case in cases:
        print(titlecase(case))
    

    When run, it produces the following:

    CDC Warns About "Aggressive" Rats as Coronavirus Shuts Down Restaurants L.A. County Opens Churches, Stores, Pools, Drive-in Theaters
    UConn Senior Accused of Killing Two Men Was Looking for Young Woman
    Giant Asteroid That Killed the Dinosaurs Slammed Into Earth at ‘Deadliest Possible Angle,’ Study Reveals
    Maintain Given Spacing: This Is a Test.  This Is Only a Test.
    
    0 讨论(0)
  • 2020-11-28 07:10

    Use the titlecase.py module! Works only for English.

    >>> from titlecase import titlecase
    >>> titlecase('i am a foobar bazbar')
    'I Am a Foobar Bazbar'
    

    GitHub: https://github.com/ppannuto/python-titlecase

    0 讨论(0)
提交回复
热议问题