I currently am making my own library, called TextCheckerExtension
which basically tries to check Text Format before further processing (short code snippet shown
It does make a difference.
To my surprise: as I continue this project out of curiosity, I found out that doing the actual parsing and simply checking if a string is of certain format does make a significant difference in time performance.
In my experiment below, by creating checker without parser, we could gain 33.77% to 58.26% time gain as compared to using built-in TryParse
. In addition, I also compare my extension with VB.Net
IsNumeric
in Microsoft.VisualBasic.Information
dll.
Here are the (1) tested code, (2) testing scenario, (3) testing code, and (4) testing result (notes are added in each part whenever necessary):
Here is the tested code, my extension code named Extension.Checker.Text
. I only tested scenarios for generic integer
and float/double
(with/without dot - perhaps better termed fraction-ed number) so far. By generic integer
I mean that the maximum and minimum value range (such as -128 to 127 for 8-bit signed integer) is unchecked. This code is just to determine if a text is integer
as human understands it without looking at its range. That goes the same for float/double
.
Compare with this post which has 400+ upvotes on its answer by the time this answer is posted, I believe it is safe to assume that generally we will use int.TryParse
to test if a text is an integer
or not as a first try (albeit its range is limited to -2e9
to 2e9
) for generic integer
text. Some other posts also show the same trend alike. Another way which we could see from those posts are to check by Visual Basic
IsNumeric
. Thus, I included that method for the benchmarking
too.
public static bool IsFloatOrDoubleByDot(string str) { //another criterion for float, giving "f" in the last part?
if (string.IsNullOrWhiteSpace(str))
return false;
int dotCounter = 0;
for (int i = str[0] == '-' ? 1 : 0; i < str.Length; i++) { //Check if it is float
if (!(char.IsDigit(str, i)) && (str[i] != '.'))
return false;
else if (str[i] == '.')
++dotCounter; //Increase the dotCounter whenever dot is found
if (dotCounter > 1) //If there is more than one dot for whatever reason, return error
return false;
}
return dotCounter == 0 || dotCounter == 1 && str.Length > 1;
}
public static bool IsDigitsOnly(string str) {
foreach (char c in str)
if (c < '0' || c > '9')
return false;
return str.Length >= 1; //there must be at least one character here to continue
}
public static bool IsInt(string str) { //is not designed to handle null input or empty string
if (string.IsNullOrWhiteSpace(str))
return false;
return str[0] == '-' && str.Length > 1 ? IsDigitsOnly(str.Substring(1)) : IsDigitsOnly(str);
}
So far, I have tested four different scenarios:
dot
(max of 7-digit precision, in the accurate parse-able range by float.TryParse)dot
(max of 11-digit precision, in the accurate parse-able range by double.TryParse)And for each scenario, I have four cases to test:
And for each case I tested the time needed to do the checking by:
TryParse
Extension.Checker.Text
Visual Basic
IsNumeric
To test the above scenarios, I use the following data:
string intpos = "1342517340";
string intneg = "-1342517340";
string intfalsepos = "134251734u";
string intfalseneg = "-134251734u";
string floatpos = "56.34251";
string floatneg = "-56.34251";
string floatfalsepos = "56.3425h";
string floatfalseneg = "-56.3425h";
string doublepos = "56.342515312";
string doubleneg = "-56.342515312";
string doublefalsepos = "56.34251531y";
string doublefalseneg = "-56.34251531y";
List<string> liststr = new List<string>() {
intpos, intneg, intfalsepos, intfalseneg,
floatpos, floatneg, floatfalsepos, floatfalseneg,
doublepos, doubleneg, doublefalsepos, doublefalseneg
};
List<string> liststrcode = new List<string>() {
"i+", "i-", "if+", "if-",
"f+", "f-", "ff+", "ff-",
"d+", "d-", "df+", "df-"
};
bool parsed = false; //to store checking result
int intval; //for int.TryParse result
float fval; //for float.TryParse result
double dval; //for double.TryParse result
text code is in the format of . Examples:
And I use the following testing loop to get the time performance of each method per case:
//time snap
for (int i = 0; i < 10000000; ++i) //for integer case
parsed = int.TryParse(str, out intval); //built-in TryParse
//time snap
//Print the result
//time snap
for (int i = 0; i < 10000000; ++i)
parsed = Extension.Checker.Text.IsInt(str); //extension Text checker
//time snap
//Print the result
//time snap
for (int i = 0; i < 10000000; ++i)
parsed = Information.IsNumeric(str); //Microsoft.VisualBasic
//time snap
//Print the result
//time snap
for (int i = 0; i < 10000000; ++i)
parsed = str[0] == '-' ? str.Substring(1).All(char.IsDigit) : str.All(char.IsDigit); //misc methods
//time snap
//Print the result
//Print the result difference
I tested as many as 10 million iterations per testing case per method using my laptop.
Note: it is noted that the behavior of my Extension.Checker.Text
is not completely equivalent with built-in TryParse
such as checking the range of the numerical value of the string or string with other formats which might be acceptable for TryParse
case but not in my case. This is because the main purpose of my Extension.Checker.Text
is not to necessarily convert the given text into certain data type in C# as built-in TryParse
. And that is the very point of my Extension.Checker.Text
. The comparisons made here is merely done to compare - in terms of time performance benefits - (1) the popular way of checking certain text format with (2) the extension method we could possibly made given that we do not need the result of the TryParse
, but only if a text is of certain format or not. That goes the same for comparison with VB IsNumeric
I printed out the parse/check
result to ensure that my extension has the same result as the built-in TryParse
, VB.Net IsNumeric
, and other alternative tricks for the given cases. I also print the original text for easy reading/checking. Then, by the time snap in between the testing, I could get the time performance as well as time difference for each testing case, which I also printed out. The time gain comparison however, is only done with the TryParse
. Here is the complete result.
[2016-01-05 06:04:25.466 UTC] Integer:
[2016-01-05 06:04:26.999 UTC] TryParse i+: 1531 ms Result: True Text: 1342517340
[2016-01-05 06:04:27.639 UTC] Extension i+: 639 ms Result: True Text: 1342517340
[2016-01-05 06:04:30.345 UTC] VB.IsNumeric i+: 2705 ms Result: True Text: 1342517340
[2016-01-05 06:04:31.468 UTC] All is digit i+: 1124 ms Result: True Text: 1342517340
[2016-01-05 06:04:31.469 UTC] Gain on TryParse i+: 892 ms Percent: -58.26%
[2016-01-05 06:04:31.469 UTC]
[2016-01-05 06:04:32.996 UTC] TryParse i-: 1527 ms Result: True Text: -1342517340
[2016-01-05 06:04:33.846 UTC] Extension i-: 849 ms Result: True Text: -1342517340
[2016-01-05 06:04:36.413 UTC] VB.IsNumeric i-: 2566 ms Result: True Text: -1342517340
[2016-01-05 06:04:37.693 UTC] All is digit i-: 1280 ms Result: True Text: -1342517340
[2016-01-05 06:04:37.694 UTC] Gain on TryParse i-: 678 ms Percent: -44.40%
[2016-01-05 06:04:37.694 UTC]
[2016-01-05 06:04:39.058 UTC] TryParse if+: 1364 ms Result: False Text: 134251734u
[2016-01-05 06:04:39.845 UTC] Extension if+: 786 ms Result: False Text: 134251734u
[2016-01-05 06:04:42.436 UTC] VB.IsNumeric if+: 2590 ms Result: False Text: 134251734u
[2016-01-05 06:04:43.540 UTC] All is digit if+: 1103 ms Result: False Text: 134251734u
[2016-01-05 06:04:43.540 UTC] Gain on TryParse if+: 578 ms Percent: -42.38%
[2016-01-05 06:04:43.540 UTC]
[2016-01-05 06:04:44.937 UTC] TryParse if-: 1397 ms Result: False Text: -134251734u
[2016-01-05 06:04:45.745 UTC] Extension if-: 807 ms Result: False Text: -134251734u
[2016-01-05 06:04:48.275 UTC] VB.IsNumeric if-: 2530 ms Result: False Text: -134251734u
[2016-01-05 06:04:49.541 UTC] All is digit if-: 1267 ms Result: False Text: -134251734u
[2016-01-05 06:04:49.542 UTC] Gain on TryParse if-: 590 ms Percent: -42.23%
[2016-01-05 06:04:49.542 UTC]
[2016-01-05 06:04:49.542 UTC] Float by Dot:
[2016-01-05 06:04:51.136 UTC] TryParse f+: 1594 ms Result: True Text: 56.34251
[2016-01-05 06:04:51.967 UTC] Extension f+: 830 ms Result: True Text: 56.34251
[2016-01-05 06:04:54.328 UTC] VB.IsNumeric f+: 2360 ms Result: True Text: 56.34251
[2016-01-05 06:04:54.329 UTC] Time Gain f+: 764 ms Percent: -47.93%
[2016-01-05 06:04:54.329 UTC]
[2016-01-05 06:04:55.962 UTC] TryParse f-: 1634 ms Result: True Text: -56.34251
[2016-01-05 06:04:56.790 UTC] Extension f-: 827 ms Result: True Text: -56.34251
[2016-01-05 06:04:59.102 UTC] VB.IsNumeric f-: 2313 ms Result: True Text: -56.34251
[2016-01-05 06:04:59.103 UTC] Time Gain f-: 807 ms Percent: -49.39%
[2016-01-05 06:04:59.103 UTC]
[2016-01-05 06:05:00.623 UTC] TryParse ff+: 1519 ms Result: False Text: 56.3425h
[2016-01-05 06:05:01.429 UTC] Extension ff+: 802 ms Result: False Text: 56.3425h
[2016-01-05 06:05:03.730 UTC] VB.IsNumeric ff+: 2301 ms Result: False Text: 56.3425h
[2016-01-05 06:05:03.730 UTC] Time Gain ff+: 717 ms Percent: -47.20%
[2016-01-05 06:05:03.731 UTC]
[2016-01-05 06:05:05.312 UTC] TryParse ff-: 1581 ms Result: False Text: -56.3425h
[2016-01-05 06:05:06.147 UTC] Extension ff-: 835 ms Result: False Text: -56.3425h
[2016-01-05 06:05:08.485 UTC] VB.IsNumeric ff-: 2337 ms Result: False Text: -56.3425h
[2016-01-05 06:05:08.486 UTC] Time Gain ff-: 746 ms Percent: -47.19%
[2016-01-05 06:05:08.486 UTC]
[2016-01-05 06:05:08.487 UTC] Double by Dot:
[2016-01-05 06:05:10.341 UTC] TryParse d+: 1854 ms Result: True Text: 56.342515312
[2016-01-05 06:05:11.492 UTC] Extension d+: 1151 ms Result: True Text: 56.342515312
[2016-01-05 06:05:14.035 UTC] VB.IsNumeric d+: 2541 ms Result: True Text: 56.342515312
[2016-01-05 06:05:14.035 UTC] Time Gain d+: 703 ms Percent: -37.92%
[2016-01-05 06:05:14.036 UTC]
[2016-01-05 06:05:15.916 UTC] TryParse d-: 1879 ms Result: True Text: -56.342515312
[2016-01-05 06:05:17.051 UTC] Extension d-: 1133 ms Result: True Text: -56.342515312
[2016-01-05 06:05:19.542 UTC] VB.IsNumeric d-: 2492 ms Result: True Text: -56.342515312
[2016-01-05 06:05:19.543 UTC] Time Gain d-: 746 ms Percent: -39.70%
[2016-01-05 06:05:19.543 UTC]
[2016-01-05 06:05:21.210 UTC] TryParse df+: 1667 ms Result: False Text: 56.34251531y
[2016-01-05 06:05:22.315 UTC] Extension df+: 1104 ms Result: False Text: 56.34251531y
[2016-01-05 06:05:24.797 UTC] VB.IsNumeric df+: 2481 ms Result: False Text: 56.34251531y
[2016-01-05 06:05:24.798 UTC] Time Gain df+: 563 ms Percent: -33.77%
[2016-01-05 06:05:24.798 UTC]
[2016-01-05 06:05:26.509 UTC] TryParse df-: 1711 ms Result: False Text: -56.34251531y
[2016-01-05 06:05:27.596 UTC] Extension df-: 1086 ms Result: False Text: -56.34251531y
[2016-01-05 06:05:30.039 UTC] VB.IsNumeric df-: 2442 ms Result: False Text: -56.34251531y
[2016-01-05 06:05:30.040 UTC] Time Gain df-: 625 ms Percent: -36.53%
[2016-01-05 06:05:30.041 UTC]
[2016-01-05 06:05:30.041 UTC] Integer as Double by Dot:
[2016-01-05 06:05:31.794 UTC] TryParse (doubled) i+: 1752 ms Result: True Text: 1342517340
[2016-01-05 06:05:32.904 UTC] Extension (doubled) i+: 1109 ms Result: True Text: 1342517340
[2016-01-05 06:05:35.590 UTC] VB.IsNumeric (doubled) d+: 2684 ms Result: True Text: 1342517340
[2016-01-05 06:05:35.590 UTC] Time Gain d+: 643 ms Percent: -36.70%
[2016-01-05 06:05:35.591 UTC]
[2016-01-05 06:05:37.390 UTC] TryParse (doubled) i-: 1799 ms Result: True Text: -1342517340
[2016-01-05 06:05:38.515 UTC] Extension (doubled) i-: 1125 ms Result: True Text: -1342517340
[2016-01-05 06:05:41.139 UTC] VB.IsNumeric (doubled) d-: 2623 ms Result: True Text: -1342517340
[2016-01-05 06:05:41.139 UTC] Time Gain d-: 674 ms Percent: -37.47%
[2016-01-05 06:05:41.140 UTC]
[2016-01-05 06:05:42.840 UTC] TryParse (doubled) if+: 1700 ms Result: False Text: 134251734u
[2016-01-05 06:05:43.933 UTC] Extension (doubled) if+: 1092 ms Result: False Text: 134251734u
[2016-01-05 06:05:46.575 UTC] VB.IsNumeric (doubled) df+: 2642 ms Result: False Text: 134251734u
[2016-01-05 06:05:46.576 UTC] Time Gain df+: 608 ms Percent: -35.76%
[2016-01-05 06:05:46.577 UTC]
[2016-01-05 06:05:48.328 UTC] TryParse (doubled) if-: 1750 ms Result: False Text: -134251734u
[2016-01-05 06:05:49.434 UTC] Extension (doubled) if-: 1106 ms Result: False Text: -134251734u
[2016-01-05 06:05:52.042 UTC] VB.IsNumeric (doubled) df-: 2607 ms Result: False Text: -134251734u
[2016-01-05 06:05:52.042 UTC] Time Gain df-: 644 ms Percent: -36.80%
[2016-01-05 06:05:52.043 UTC]
The conclusions I got from the results so far:
TryParse
. VB IsNumeric
is rather slower than the rests for all cases (this is also to my surprise, because according to the benchmarking in this post, VB seems to be pretty fast - though not the best).One possible use of this extension checking is in the case where you receive a certain string and you know that it can be of more than one format types (say, integer or double), but you want to check the actual text type first without an actual parsing at the time of checking. For such given case, an extension method may speed up the process.
Another use is in the computational linguistic area, where often you want to know the type a text without actually parsing it to be used computationally.