regexReplace in String Manipulation KNIME

≡放荡痞女 提交于 2020-02-24 09:01:41

问题


I'm trying to remove the content of all cells that start with a character that is not a number using KNIME (v3.2.1). I have different ideas but nothing works.

1) String Manipulation Node: regexReplace(§column§,"^[^0-9].*","")

The cells contain multiple lines, however only the first line is removed by this approach.

2) String Manipulation Node: regexMatcher($casrn_new$,"^[^0-9].*") followed by Rule Engine Node to remove all columns that are "TRUE".

The regexMatcher gives me "False" even for columns that should be "True" though.

3) String Replacer Node: I inserted the expression ^[^0-9].* into the Pattern column and selected "Replace whole String" but the regex is not recognised by that node so nothing gets replaced.

Does anyone have a solution for any of those approaches or knows another Node that might do the job? Help is much appreciated!


回答1:


I would go with your first solution, since it has already worked, you just have to expand your regex to include newlines. I would try something like this:

regexReplace($column$,"^[^0-9].(.|\n)*","")

This should match any text starting with a character that is not a number, followed by any number of occurrences of any character or a newline. Depending on the line endings, you might need (.|\n|\r) instead of (.|\n).




回答2:


You should use the following expression:

"(?s)^\D.*$"

So the dot will match even new lines. (Based on this: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#DOTALL)

In case you need to only change the content of the cells that do not start with a number, I do not think you need to filter any columns or rows. (BTW in case you want to remove rows, there are the Rule-based Row Filter/Splitter nodes which also support regular expressions with the MATCHES predicate.)



来源:https://stackoverflow.com/questions/40003509/regexreplace-in-string-manipulation-knime

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!