.net regex - strings that don't contain full stop preceding last `<item>` - Attempt 2

╄→гoц情女王★ 提交于 2020-01-25 06:44:35

问题


This question follows from .net regex - strings that don't contain full stop on last list item

Problem is now the below. Note that examples have been amended and more added - all need to be satisfied. Good examples should return no matches, and bad examples should return matches.

I'm trying to use .net regex for identifying strings in XML data that don't contain a full stop before the last tag. I have not much experience with regex. I'm not sure what I need to change & why to get the result I'm looking for.

There are line breaks and carriage returns at end of each line in the data.

A schema is used for the XML. We have no access to .Net code - just users using a custom built application.

Example 1 of bad XML Data - should give 1 match:

<randlist prefix="unorder">
    <item>abc</item>
    <item>abc</item>
    <item>abc</item>
</randlist>

Example 2 of bad XML Data - should give 1 match:

<randlist prefix="unorder">
    <item>abc. abc</item>
    <item>abc. abc</item>
    <item>abc. abc</item>
</randlist>

Example 1 of good XML Data - regexp should give no matches - full stop preceding last </item>:

<randlist prefix="unorder">
    <item>abc</item>
    <item>abc</item>
    <item>abc.</item>
</randlist>

Example 2 of good XML Data - regexp should give no matches - full stop preceding last </item>:

<randlist prefix="unorder">
    <item>abc. abc</item>
    <item>abc. abc</item>
    <item>abc. abc.</item>
</randlist>

Reg exp patterns I tried that didn't work (either false positives or no matches using https://regex101.com/) for criteria above in the bad XML data (not tested on good XML data):

^<randlist \w*=[\S\s]*\.*[^.]*<\/item>[\n]*<\/randlist>$
^\s+<item>[^<]*?(?<=\.)<\/item>$

回答1:


Seeing how you are using .NET, you could:

  1. Load the XML file in an XML Document.
  2. Use the GetElementsByTagName method to get all your item tags within the randlist element.
  3. Get the last element returned by [2].
  4. Check if it contains the period character.

The above should be more readable, and if the structure of the XML changes, you won't have to rewrite half your script.




回答2:


The regexp pattern below works for us - tested in Notepad++

[^.]<\/item>\s{1,2}<\/randlist>


来源:https://stackoverflow.com/questions/59858437/net-regex-strings-that-dont-contain-full-stop-preceding-last-item-atte

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!