Splitting a string at all whitespace

前端 未结 7 2044
刺人心
刺人心 2021-01-17 11:06

I need to split a string at all whitespace, it should ONLY contain the words themselves.

How can I do this in vb.net?

Tabs, Newlines, etc. must all be split

相关标签:
7条回答
  • 2021-01-17 11:32

    So, after seeing Adam Ralph's post, I suspected his solution of being faster than the Regex solution. Just thought I'd share the results of my testing since I did find it was faster.


    There are really two factors at play (ignoring system variables): number of sub-strings extracted (determined by number of delimiters), and total string length. The very simple scenario plotted below uses "A" as the sub-string delimited by two white space characters (a space followed by tab). This accentuates the effect of number of sub-strings extracted. I went ahead and did some multiple variable testing to arrive at the following general equations for my operating system.

    Regex()
    t = (28.33*SSL + 572)(SSN/10^6)

    Split().Where()
    t = (6.23*SSL + 250)(SSN/10^6)

    Where t is execution time in milliseconds, SSL is average sub-string length, and SSN is number of sub-strings delimited in string.

    These equations can also written as

    t = (28.33*SL + 572*SSN)/10^6

    and

    t = (6.23*SL + 250*SSN)/10^6

    where SL is total string length (SL = SSL * SSN)

    Conclusion: The Split().Where() solution is faster than Regex(). The major factor is number of sub-strings, while string length plays a minor role. Performance gains are about 2x and 5x for the respective coefficients.



    Here's my testing code (probably way more material than necessary, but it's set-up for getting the multi-variable data I talked about)

    using System;
    using System.Linq;
    using System.Diagnostics;
    using System.Text.RegularExpressions;
    using System.Windows.Forms;
    namespace ConsoleApplication1
    {
        class Program
        {
            public enum TestMethods {regex, split};
            [STAThread]
            static void Main(string[] args)
            {
                //Compare TestMethod execution times and output result information
                //to the console at runtime and to the clipboard at program finish (so that data is ready to paste into analysis environment)
                #region Config_Variables
                //Choose test method from TestMethods enumerator (regex or split)
                TestMethods TestMethod = TestMethods.split;
                //Configure RepetitionString
                String RepetitionString =  string.Join(" \t", Enumerable.Repeat("A",100));
                //Configure initial and maximum count of string repetitions (final count may not equal max)
                int RepCountInitial = 100;int RepCountMax = 1000 * 100;
    
                //Step increment to next RepCount (calculated as 20% increase from current value)
                Func<int, int> Step = x => (int)Math.Round(x / 5.0, 0);
                //Execution count used to determine average speed (calculated to adjust down to 1 execution at long execution times)
                Func<double, int> ExecutionCount = x => (int)(1 + Math.Round(500.0 / (x + 1), 0));
                #endregion
    
                #region NonConfig_Variables
                string s; 
                string Results = "";
                string ResultInfo; 
                double ResultTime = 1;
                #endregion
    
                for (int RepCount = RepCountInitial; RepCount < RepCountMax; RepCount += Step(RepCount))
                {
                    s = string.Join("", Enumerable.Repeat(RepetitionString, RepCount));
                    ResultTime = Test(s, ExecutionCount(ResultTime), TestMethod);
                    ResultInfo = ResultTime.ToString() + "\t" + RepCount.ToString() + "\t" + ExecutionCount(ResultTime).ToString() + "\t" + TestMethod.ToString();
                    Console.WriteLine(ResultInfo); 
                    Results += ResultInfo + "\r\n";
                }
                Clipboard.SetText(Results);
            }
            public static double Test(string s, int iMax, TestMethods Method)
            {
                switch (Method)
                {
                    case TestMethods.regex:
                        return Math.Round(RegexRunTime(s, iMax),2);
                    case TestMethods.split:
                        return Math.Round(SplitRunTime(s, iMax),2);
                    default:
                        return -1;
                }
            }
            private static double RegexRunTime(string s, int iMax)
            {
                Stopwatch sw = new Stopwatch();
                sw.Restart();
                for (int i = 0; i < iMax; i++)
                {
                    System.Collections.Generic.IEnumerable<string> ens = Regex.Split(s, @"\s+");
                }
                sw.Stop();
                return Math.Round(sw.ElapsedMilliseconds / (double)iMax, 2);
            }
            private static double SplitRunTime(string s,int iMax)
            {
                Stopwatch sw = new Stopwatch();
                sw.Restart();
                for (int i = 0; i < iMax; i++)
                {
                    System.Collections.Generic.IEnumerable<string> ens = s.Split().Where(x => x != string.Empty);
                }
                sw.Stop();
                return Math.Round(sw.ElapsedMilliseconds / (double)iMax, 2);
            }
        }
    }
    
    0 讨论(0)
提交回复
热议问题