text-parsing | 易学教程

Why does this regular expression test give different results for what should be the same body text?

阅读更多关于 Why does this regular expression test give different results for what should be the same body text?

问题 Here's the pertinent code, which is giving different results on the regular expression test for the message body depending on whether I launch it using TestLaunchURL or the message is passed to it by Outlook when an incoming message arrives: Public Sub OpenLinksMessage(olMail As Outlook.MailItem) Dim Reg1 As RegExp Dim AllMatches As MatchCollection Dim M As Match Dim strURL As String Dim RetCode As Long Set Reg1 = New RegExp With Reg1 .Pattern = "(https?[:]//([0-9a-z=\?:/\.&-^!#$;_])*)"

Character strings in Fortran: Portable LEN_TRIM and LNBLNK?

阅读更多关于 Character strings in Fortran: Portable LEN_TRIM and LNBLNK?

问题 I need a portable function/subroutine to locate the position of the last non-blank character in a string . I've found two options: LEN_TRIM and LNBLNK . However, different compilers seem to have different standards. The official documentation for the following compilers suggests that LEN_TRIM is part of the Fortran 95 standard on the following platforms: IBM: LEN_TRIM Intel: LNBLNK and LEN_TRIM gfortran: LNBLNK and LEN_TRIM PGI: LEN_TRIM However, it appears that nothing is guaranteed on

Replace HTML links with text

阅读更多关于 Replace HTML links with text

问题 How to replace links with anchors in html (python)? for example input: Hello <a href="http://example.com">link text1</a> and <a href="http://example.com">link text2</a> ! i want at result with saved p tag (just a tag remove): Hello link text1 and link text2 ! 回答1: You could do this with a simple regex and the sub function: import re text = ' Hello <a href="http://example.com">link text1</a> and <a href="http://example.com">link text2</a> ! ' pattern =r'<(a|/a).*?>'

Convertfrom-string removes leading zeros

阅读更多关于 Convertfrom-string removes leading zeros

问题 Im having a problem with the Convertfrom-String cmdlet $value = 'something:009' $value | ConvertFrom-String -Delimiter ':' Output: P1 P2 -- -- something 9 The output i want is P1 P2 -- -- something 009 Anyone got any ideas? Thanks in advance. 回答1: I suggest avoiding ConvertFrom-String altogether - it performs type conversions that you cannot control when you use -Delimiter , as you've experienced, and its example-driven template-based parsing is awkward. On a side note: ConvertFrom-String is

How to parse unstructured table-like data?

阅读更多关于 How to parse unstructured table-like data?

问题 I have a text file that holds some result of an operation. The data is displayed in a human-readable format (like a table). How do I parse this data so that I can form a data structure such as dictionaries with this data? An example of the unstructured data is shown below. =============================================================== Title =============================================================== Header Header Header Header Header Header 1 2 3 4 5 6 -----------------------------------

Parse in C# with Dictionary<string, string>

阅读更多关于 Parse in C# with Dictionary

问题 I am new to programming and have been trying hard to parse a file. I, initially was trying to parse it in a certain way, but that didn't end up working correctly. I want to parse the following line in a Dictionary< string,string> . Network Card(s): 7 NIC(s) Installed. [01]: Broadcom Connection Name: Local Area Connection DHCP Enabled: No IP address(es) [01]: abc.de.xyz. [02]: Broadcom Connection Name: eth1 Status: Media disconnected [03]: Broadcom Connection Name: eth0 Status: Media

Simple get string (ignore numbers at end) in C#

阅读更多关于 Simple get string (ignore numbers at end) in C#

问题 I figure regex is overkill also it takes me some time to write some code (i guess i should learn now that i know some regex). Whats the simplest way to separate the string in an alphanumeric string? It will always be LLLLDDDDD. I only want the letters(l's), typically its only 1 or 2 letters. 回答1: TrimEnd: string result = input.TrimEnd(new char[]{'0','1','2','3','4','5','6','7','8','9'}); // I'm sure using LINQ and Range can simplify that. // also note that a string like "abc123def456" would

Compare dynamic XML/JSON content with static tokenised payload and retrieve token values

阅读更多关于 Compare dynamic XML/JSON content with static tokenised payload and retrieve token values

问题 I am implementing mock http response server. The server has to validate the input request url and payload then match the request to configured response then return it back to the caller. In that i need help on validating the http request dynamic content payload with static tokenised payload. So when i got the request payload say json, compare it with configured tokenised content, and return failure if it not matches. e.g) I am doing the same for request url with below code. import java.util

Given upper case names transform to Proper Case, handling “O'Hara”, “McDonald” “van der Sloot” etc

阅读更多关于 Given upper case names transform to Proper Case, handling “O'Hara”, “McDonald” “van der Sloot” etc

问题 I am provided a list of names in upper case. For the purpose of a salutation in an email I would like them them to be Proper Cased. Easy enough to do using PHP's ucwords. But I feel I need some regex function to handle common exceptions, such as: "O'Hara", "McDonald", "van der Sloot", etc It's not so much that I need help constructing a regex statement to handle the three examples above (tho that would be nice), as it is that I don't know what all the common exceptions might be. Surely

Extracing specific tags from arbitrary plain text

阅读更多关于 Extracing specific tags from arbitrary plain text

问题 I want to parse plain text comments and look for certain tags within them. The types of tags I'm looking for look like: <name#1234> Where "name" is a [a-z] string (from a fixed list) and "1234" represents a [0-9]+ number. These tags can occur within a string zero or more times and be surrounded by arbitrary other text. For example, the following strings are all valid: "Hello <foo#56> world!" "<bar#1>!" "1 < 2" "+<baz#99>+<squid#0> and also<baz#99>.\n\nBy the way, maybe <foo#9876>" The