I\'m trying to parse a table but I don\'t know how to save the data from it. I want to save the data in each row row to look like:
[\'Raw name 1\', 2,094, 0,017,
The key of the problem is that calling #text
on multiple results will return the concatenation of the #text
of each individual element.
Lets examine what each step does:
# Finds all s with class open
# I'm assuming you have only one so
# you don't actually have to loop through
# all tables, instead you can just operate
# on the first one. If that is not the case,
# you can use a loop the way you did
tables = doc.css('table.open')
# The text of all s in one in the table
title = table.css('tr[1] > th').text
# The text of all s in all s in the table
# You obviously wanted just the s in one
cell_data = table.css('tr > td').text
# The text of all s in all s in the table
# You obviously wanted just the s in one
raw_name = table.css('tr > th').text
Now that we know what is wrong, here is a possible solution:
html = <
Table name
Column name 1
Column name 2
Column name 3
Column name 4
Column name 5
Raw name 1
1001
1002
1003
1004
1005
Raw name 2
2001
2002
2003
2004
2005
Raw name 3
3001
3002
3003
3004
3005
EOT
doc = Nokogiri::HTML(html, nil, 'UTF-8')
# Fetches only the first . If you have
# more than one, you can loop the way you
# originally did.
table = doc.css('table.open').first
# Fetches all rows (s)
rows = table.css('tr')
# The column names are the first row (shift returns
# the first element and removes it from the array).
# On that row we get the text of each individual
# This will be Table name, Column name 1, Column name 2...
column_names = rows.shift.css('th').map(&:text)
# On each of the remaining rows
text_all_rows = rows.map do |row|
# We get the name ( )
# On the first row this will be Raw name 1
# on the second - Raw name 2, etc.
row_name = row.css('th').text
# We get the text of each individual value ( )
# On the first row this will be 1001, 1002, 1003...
# on the second - 2001, 2002, 2003... etc
row_values = row.css('td').map(&:text)
# We map the name, followed by all the values
[row_name, *row_values]
end
p column_names # => ["Table name", "Column name 1", "Column name 2",
# "Column name 3", "Column name 4", "Column name 5"]
p text_all_rows # => [["Raw name 1", "1001", "1002", "1003", "1004", "1005"],
# ["Raw name 2", "2001", "2002", "2003", "2004", "2005"],
# ["Raw name 3", "3001", "3002", "3003", "3004", "3005"]]
# If you want to combine them
text_all_rows.each do |row_as_text|
p column_names.zip(row_as_text).to_h
end # =>
# {"Table name"=>"Raw name 1", "Column name 1"=>"1001", "Column name 2"=>"1002", "Column name 3"=>"1003", "Column name 4"=>"1004", "Column name 5"=>"1005"}
# {"Table name"=>"Raw name 2", "Column name 1"=>"2001", "Column name 2"=>"2002", "Column name 3"=>"2003", "Column name 4"=>"2004", "Column name 5"=>"2005"}
# {"Table name"=>"Raw name 3", "Column name 1"=>"3001", "Column name 2"=>"3002", "Column name 3"=>"3003", "Column name 4"=>"3004", "Column name 5"=>"3005"}
- 热议问题