Nokogiri: Select content between element A and B

前端 未结 3 942
误落风尘
误落风尘 2020-12-15 01:40

What\'s the smartest way to have Nokogiri select all content between the start and the stop element (including start-/stop-element)?

Check example code below to unde

相关标签:
3条回答
  • 2020-12-15 02:39
    # monkeypatches for Nokogiri::NodeSet
    # note: versions of these functions will be in Nokogiri 1.3
    class Nokogiri::XML::NodeSet
      unless method_defined?(:index)
        def index(node)
          each_with_index { |member, j| return j if member == node }
        end
      end
    
      unless method_defined?(:slice)
        def slice(start, length)
          new_set = Nokogiri::XML::NodeSet.new(self.document)
          length.times { |offset| new_set << self[start + offset] }
          new_set
        end
      end
    end
    
    #
    #  solution #1: picking elements out of node children
    #  NOTE that this will also include whitespacy text nodes between the <p> elements.
    #
    possible_matches = parent.children
    start_index = possible_matches.index(@start_element)
    stop_index = possible_matches.index(@end_element)
    answer_1 = possible_matches.slice(start_index, stop_index - start_index + 1)
    
    #
    #  solution #2: picking elements out of a NodeSet
    #  this will only include elements, not text nodes.
    #
    possible_matches = value.xpath("//body/*")
    start_index = possible_matches.index(@start_element)
    stop_index = possible_matches.index(@end_element)
    answer_2 = possible_matches.slice(start_index, stop_index - start_index + 1)
    
    0 讨论(0)
  • 2020-12-15 02:41

    For the sake of completeness a XPath only solution :)
    It builds an intersection of two sets, the following siblings of the start element and the preceding siblings of the end element.

    Basically you can build an intersection with:

      $a[count(.|$b) = count($b)]
    

    A little divided on variables for readability:

    @start_element = "//p[@id='para-3']"
    @end_element = "//p[@id='para-7']"
    @set_a = "#@start_element/following-sibling::*"
    @set_b = "#@end_element/preceding-sibling::*"
    
    @my_content = value.xpath("#@set_a[ count(.|#@set_b) = count(#@set_b) ]
                             | #@start_element | #@end_element")
    

    Siblings don't include the element itself, so the start and end elements must be included in the expression separately.

    Edit: Easier solution:

    @start_element = "p[@id='para-3']"
    @end_element = "p[@id='para-7']"
    @my_content = value.xpath("//*[preceding-sibling::#@start_element and
                                   following-sibling::#@end_element]
                             | //#@start_element | //#@end_element")
    
    0 讨论(0)
  • 2020-12-15 02:44

    A way-too-smart oneliner which uses recursion:

    def collect_between(first, last)
      first == last ? [first] : [first, *collect_between(first.next, last)]
    end
    

    An iterative solution:

    def collect_between(first, last)
      result = [first]
      until first == last
        first = first.next
        result << first
      end
      result
    end
    

    EDIT: (Short) explanation of the asterix

    It's called the splat operator. It "unrolls" an array:

    array = [3, 2, 1]
    [4, array]  # => [4, [3, 2, 1]]
    [4, *array] # => [4, 3, 2, 1]
    
    some_method(array)  # => some_method([3, 2, 1])
    some_method(*array) # => some_method(3, 2, 1)
    
    def other_method(*array); array; end
    other_method(1, 2, 3) # => [1, 2, 3] 
    
    0 讨论(0)
提交回复
热议问题