.NET method to convert a string to sentence case

后端 未结 9 1705
一向
一向 2020-11-30 09:47

I\'m looking for a function to convert a string of text that is in UpperCase to SentenceCase. All the examples I can find turn the text into TitleCase.

<
相关标签:
9条回答
  • 2020-11-30 10:09

    If your input string is not a sentence, but many sentences, this becomes a very difficult problem.

    Regular expressions will prove an invaluable tool, but (1) you'll have to know them quite well to be effective, and (2) they might not be up to doing the job entirely on their own.

    Consider this sentence

    "Who's on 1st," Mr. Smith -- who wasn't laughing -- replied.

    This sentence doesn't start with a letter, it has a digit, various punctuation, a proper name, and a . in the middle.

    The complexities are enormous, and this is one sentence.

    One of the most important things when using RegEx is to "know your data." If you know the breadth of types of sentences you'll be dealing with, your task will be more manageable.

    In any event, you'll have to toy with your implementation until you are satisfied with your results. I suggest writing some automated tests with some sample input -- as you work on your implementation, you can run the tests regularly to see where you're getting close and where you're still missing the mark.

    0 讨论(0)
  • 2020-11-30 10:09

    A solution in F#:

    open System
    
    let proper (x : string) =
        x.Split(' ')
        |> Array.filter ((<>) "")
        |> Array.map (fun t ->
            let head = Seq.head t |> Char.ToUpper |> string
            let tail = Seq.tail t |> Seq.map (Char.ToLower >> string)
            Seq.append [head] tail
            |> Seq.reduce (fun acc elem -> acc + elem))
        |> Array.reduce (fun acc elem -> acc + " " + elem)
    
    0 讨论(0)
  • 2020-11-30 10:11

    This is what I use (VB.NET). It works in most situations, including:

    • multiple sentences
    • sentences beginning and ending with spaces
    • sentences beginning with characters other than A-Z. For example it will work for: "if you want $100.00 then just ask me".

      <Extension()>
      Public Function ToSentanceCase(ByVal s As String) As String
          ' Written by Jason. Inspired from: http://www.access-programmers.co.uk/forums/showthread.php?t=147680
      
          Dim SplitSentence() As String = s.Split(".")
      
          For i = 0 To SplitSentence.Count - 1
              Dim st = SplitSentence(i)
      
              If st.Trim = "" Or st.Trim.Count = 1 Then Continue For ' ignore empty sentences or sentences with only 1 character.
      
              ' skip past characters that are not A-Z, 0-9 (ASCII) at start of sentence.
              Dim y As Integer = 1
              Do Until y > st.Count
                  If (Asc(Mid(st, y, 1)) >= 65 And Asc(Mid(st, y, 1)) <= 90) Or _
                        (Asc(Mid(st, y, 1)) >= 97 And Asc(Mid(st, y, 1)) <= 122) Or _
                       (Asc(Mid(st, y, 1)) >= 48 And Asc(Mid(st, y, 1)) <= 57) Then
                      GoTo Process
                  Else
                      Dim w = Asc(Mid(st, y, 1))
                      y += 1
                  End If
              Loop
              Continue For
      
      Process:
              Dim sStart As String = ""
              If y > 1 Then sStart = Left(st, 0 + (y - 1))
      
              Dim sMid As String = UCase(st(y - 1)) ' capitalise the first non-space character in sentence.
      
              Dim sEnd As String = Mid(st, y + 1, st.Length)
      
              SplitSentence(i) = sStart & sMid & sEnd
      
          Next
      
          ' rejoin sentances back together:
          Dim concat As String = ""
          For Each st As String In SplitSentence
              concat &= st & "."
          Next
      
          concat = concat.TrimEnd(1)
      
          Return concat
      
      End Function
      

    But as for proper nouns and acronyms, well... there are always going to be situations in the English language where punctuation is not as simple. For example this script won't detect an ellipsis ("..."), or abbreviations (eg: "Mr. Jones lived on Magnolia Blvd. near Chris' house").

    To address the problem completely you will need to produce a dictionary of all the possible abbreviations/punctuation's for the language, and keep the dictionary up-to-date! After considering this most will be happy with a compromise, otherwise just use Microsoft Word.

    0 讨论(0)
  • 2020-11-30 10:13

    There is a built in ToTitleCase() function that will be extended to support multiple cultures in future.

    Example from MSDN:

    using System;
    using System.Globalization;
    
    public class Example
    {
       public static void Main()
       {
          string[] values = { "a tale of two cities", "gROWL to the rescue",
                              "inside the US government", "sports and MLB baseball",
                              "The Return of Sherlock Holmes", "UNICEF and children"};
    
          TextInfo ti = CultureInfo.CurrentCulture.TextInfo;
          foreach (var value in values)
             Console.WriteLine("{0} --> {1}", value, ti.ToTitleCase(value));
       }
    }
    // The example displays the following output:
    //    a tale of two cities --> A Tale Of Two Cities
    //    gROWL to the rescue --> Growl To The Rescue
    //    inside the US government --> Inside The US Government
    //    sports and MLB baseball --> Sports And MLB Baseball
    //    The Return of Sherlock Holmes --> The Return Of Sherlock Holmes
    //    UNICEF and children --> UNICEF And Children
    

    While it is generally useful it has some important limitations:

    Generally, title casing converts the first character of a word to uppercase and the rest of the characters to lowercase. However, this method does not currently provide proper casing to convert a word that is entirely uppercase, such as an acronym. The following table shows the way the method renders several strings.

    ...the ToTitleCase method provides an arbitrary casing behavior which is not necessarily linguistically correct. A linguistically correct solution would require additional rules, and the current algorithm is somewhat simpler and faster. We reserve the right to make this API slower in the future.

    Source: http://msdn.microsoft.com/en-us/library/system.globalization.textinfo.totitlecase.aspx

    0 讨论(0)
  • 2020-11-30 10:17
    public string GetSentenceCase(string ReqdString) {
        string StrInSentCase = "";
        for (int j = 0; j < ReqdString.Length; j++) {
            if (j == 0) {
               StrInSentCase = ReqdString.ToString().Substring(j, 1).ToUpper();
            }
            else {
                StrInSentCase = StrInSentCase + ReqdString.ToString().Substring(j, 1).ToLower();
            }
        }
        return StrInSentCase.ToString();
    }
    
    0 讨论(0)
  • 2020-11-30 10:18

    There isn't anything built in to .NET - however, this is one of those cases where regular expression processing actually may work well. I would start by first converting the entire string to lower case, and then, as a first approximation, you could use regex to find all sequences like [a-z]\.\s+(.), and use ToUpper() to convert the captured group to upper case. The RegEx class has an overloaded Replace() method which accepts a MatchEvaluator delegate, which allows you to define how to replace the matched value.

    Here's a code example of this at work:

    var sourcestring = "THIS IS A GROUP. OF CAPITALIZED. LETTERS.";
    // start by converting entire string to lower case
    var lowerCase = sourcestring.ToLower();
    // matches the first sentence of a string, as well as subsequent sentences
    var r = new Regex(@"(^[a-z])|\.\s+(.)", RegexOptions.ExplicitCapture);
    // MatchEvaluator delegate defines replacement of setence starts to uppercase
    var result = r.Replace(lowerCase, s => s.Value.ToUpper());
    
    // result is: "This is a group. Of uncapitalized. Letters."
    

    This could be refined in a number of different ways to better match a broader variety of sentence patterns (not just those ending in a letter+period).

    0 讨论(0)
提交回复
热议问题