So I\'ve got a big text file which looks like the following:
A little after the fact, but in case its useful to anyone, I was able to follow one of the examples on here (by sdgfsdg) and quickly pick up Regular Expressions for Notepad++.
I had to similarly pull out some redundant data from a list of HTML select dropdown options, of the form:
<select>
<option value="AC">saint_helena">Ascension Island</option>
<option value="AD">andorra">Andorra</option>
<option value="AE">united_arab_emirates">United Arab Emirates</option>
<option value="AF">afghanistan">Afghanistan</option>:
...
</select>
And what I really wanted was:
<select>
<option value="AC">Ascension Island</option>
<option value="AD">Andorra</option>
<option value="AE">United Arab Emirates</option>
<option value="AF">Afghanistan</option>
...
</select>
After some hair-pulling I realized that as of version 5.8.5 (Sep. 2010) the Regular Expressions still don't seem to allow certain loops in the expressions (unless there is another syntax), for example, the following would find even ">united_arab_emirated_emirates"> despite its additional separating underscores:
(">)([a-z]+([_]*[a-z]*)*)(">)
This query worked in most generic RegEx tools but while within Notepad++, I had to account for the maximum number of nested underscores (which unfortunately was 8) by hand, using the much uglier:
(">)([a-z]+[_]*[a-z]*[_]*[a-z]*[_]*[a-z]*[_]*[a-z]*)[_]*[a-z]*[_]*[a-z]*[_]*[a-z]*[_]*[a-z]*(">)
If someone knows a way to simulate a Regex loop in Notepad++'s replace feature, please let me know.
In vim
:%s/<option value='.\{1,}' >//
or
:%s/<option value='.\+' >//
In vim regular expressions you have to escape the one-or-more symbol, capturing parentheses, the bounded number curly braces and some others.
See :help /magic
to see which special characters need to be escaped (and how to change that).
Vim:
:%s/.* >//
In Notepad++ :
<option value value='1' >A
<option value value='2' >B
<option value value='3' >C
<option value value='4' >D
Find what: (.*)(>)(.)
Replace with: \3
Replace All
A
B
C
D
It may help if you're less specific. Your expression there is "greedy", which may be interpreted different ways by different programs. Try this in vim:
%s/^<[^>]+>//
Everything before the A, B, C, etc.
That seems so simple I must be misinterpreting you. It's just
:%s/<.*>//