Ruby - Parse a multi-line tab-delimited string into an array of arrays

送分小仙女□ 提交于 2019-12-12 03:53:09

问题


My apologies if this has already been asked in a Ruby setting--I checked before posting but to be perfectly honest it has been a very long day and If I am missing the obvious, I apologize in advance!

I have the following string which contains a list of software packages installed on a system and for some reason I am having the hardest time parsing it. I know there has got to be a straight forward means of doing this in Ruby but I keep coming up short.

I would like to parse the below multi-line, tab-delimited, string into an array of arrays where I can then loop through each array element with an each_with_index and spit out the HTML code into my Rails app.

str = 'Product and/or Software Full Name 5242     [version 6.5.24]     [Installed on: 12/31/2015]

 Product and/or Software Full Name 5426     [version 22.4]     [Installed on: 06/11/2013]

 Product and/or Software Full Name 2451     [version 1.63]     [Installed on: 12/17/2015]

 Product and/or Software Full Name 5225     [version 43.22.51]     [Installed on: 11/15/2011]

 Product and/or Software Full Name 2420     [version 43.51-r2]     [Installed on: 12/31/2015]'

The end result would be an array of arrays with 5 elements like so:

[["Product and/or Software Full Name 5245"],["version 6.5.24"], ["Installed on: 12/31/2015"],["Product and/or Software Full Name 5426"],["version 22.4"],["Installed on: 06/11/2013"],["Product and/or Software Full Name 2451"],["version 1.63"],["Installed on: 12/17/2015"]]

Please Note: Only 3 of 5 arrays are shown for brevity

I would prefer to strip out the brackets from both 'version' and 'Installed on' but I can do that with gsub separately if that cannot easily be baked into an answer.

Last thing is that there won't always be an 'Installed on' entry for every line in the multiline string, so the answer will need to take that into account as applicable.


回答1:


This ought to do:

expr = /(.+?)\s+\[([^\]]+)\](?:\s+\[([^\]]+)\])?/
str.scan(expr)

The expression is actually a lot less complex than it looks. It looks complex because we're matching square brackets, which have to be escaped, and also using character classes, which are enclosed in square brackets in the regular expression language. All together it adds a lot of noise.

Here it is split up:

expr = /
  (.+?)  # Capture #1: Any characters (non-greedy)

  \s+    # Whitespace
  \[     # Literal '['
    (      # Capture #2:
      [^\]]+   # One or more characters that aren't ']'
    )
  \]     # Literal ']'

  (?:    # Non-capturing group
    \s+    # Whitespace
    \[     # Literal '['
      ([^\]]+) # Capture #3 (same as #2)
    \]     # Literal ']'
  )?     # Preceding group is optional
/x

As you can see, the third part is identical to the second part, except it's in a non-capture group followed by a ? to make it optional.

It's worth noting that this may fail if e.g. the product name contains square brackets. If that's a possibility, one potential solution is include the version and Installed text in the match, e.g.:

expr = /(.+?)\s+\[(version [^\]]+)\](?:\s+\[(Installed [^\]]+)\])?/

P.S. Here's a solution that uses String#split instead:

expr = /\]?\s+\[|\]$/
res = str.each_line.map {|ln| ln.strip.split(expr) }
        .reject {|arr| arr.empty? }

If you have brackets in your product names, a possible workaround here is to specify a minimum number of spaces between parts, e.g.:

expr = /\]?\s{3,}\[|\]$/

...which of course depends on product names never having more than three consecutive spaces.



来源:https://stackoverflow.com/questions/35713260/ruby-parse-a-multi-line-tab-delimited-string-into-an-array-of-arrays

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!