How can I fix this wiki link parsing regular expression?

≡放荡痞女 提交于 2019-12-12 01:56:28

问题


I've got an old wiki that I'm converting to a new wiki which uses Markdown and [[]] wiki link format. Unfortunately, the old wiki is really old and had many ways of producing links, incl. CamelCase, single-bracket ([]) wiki links, among others.

I'm converting w/regular expressions in sed and use the following regular expression to convert stand-alone CamelCase links to double-bracket ([[]]) wiki links:

s/([^[|])([A-Z][a-z]+[A-Z][A-Za-z]+)([^]|])/\1\[\[\2\]\]\3/g

Unfortunately, the one problem with the above (in my attempt to not convert CamelCase in existing single-bracket wiki links, since there's a mix of both) is that something like [BluetoothConnection|UsingBluetoothIndex] will get converted to [BluetoothConnection|Using[[BluetoothInde]]x].

How can I resolve this issue and force the match to be more greedy and therefore fail and not make a substitution in that case? If sed's enhanced regular expressions turn out to be too limiting, I'm willing to pass through perl instead of sed.


回答1:


Alright can you try this:

$ echo "UsingBluetoothIndex" | sed -E 's!([^\[\|]?)([A-Z][a-z]+[A-Z][A-Za-z]+)($|\b|[]|])!\1\[\[\2\]\]\3!g'
Output: [[UsingBluetoothIndex]]

$ echo "[BluetoothConnection|UsingBluetoothIndex]" | sed -E 's!([^\[\|]?)([A-Z][a-z]+[A-Z][A-Za-z]+)($|\b|[]|])!\1\[\[\2\]\]\3!g'
Output: [[[BluetoothConnection]]|[[UsingBluetoothIndex]]]

Update:

Alright I believe now I have regex for your problem using perl's negative look behind directive. So here it is:

perl -pe 's#(^|\b)((?![|\[])[A-Z][a-z]+[A-Z][A-Za-z]+(?![|\]]))($|\b)#\[\[\2\]\]#g'

echo "BluetoothConnection" | perl -pe 's#(^|\b)((?![|\[])[A-Z][a-z]+[A-Z][A-Za-z]+(?![|\]]))($|\b)#\[\[\2\]\]#g'
Output: [[BluetoothConnection]]

echo "[BluetoothConnection|UsingBluetoothIndex]" | perl -pe 's#(^|\b)((?![|\[])[A-Z][a-z]+[A-Z][A-Za-z]+(?![|\]]))($|\b)#\[\[\2\]\]#g'
Output: [BluetoothConnection|UsingBluetoothIndex]

All it is doing is checking if text is not starting with '|' or '[' and NOT ending with | or ] then enclose it in [[ and ]].



来源:https://stackoverflow.com/questions/5442432/how-can-i-fix-this-wiki-link-parsing-regular-expression

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!