Parsing street addresses in Ruby

前端 未结 5 1773
萌比男神i
萌比男神i 2021-02-09 18:08

I am processing addresses into their respective field format for the database. I can get the house number out and the street type but trying to determine best method to get the

5条回答
  •  余生分开走
    2021-02-09 18:36

    I'd recommend using a library for this if possible, since address parsing can be difficult. Check out the Indirizzo Ruby gem, which makes this easy:

    require 'Indirizzo'
    address = Indirizzo::Address.new("7707 Foo Bar Blvd")
    address.number
     => "7707"
    address.street
     => ["foo bar blvd", "foo bar boulevard"] 
    

    Even if you don't use the Indirizzo library itself, reading through its source code is probably very useful to see how they solved the problem. For instance, it has finely-tuned regular expressions to match different parts of an address:

    Match = {
      # FIXME: shouldn't have to anchor :number and :zip at start/end
      :number   => /^(\d+\W|[a-z]+)?(\d+)([a-z]?)\b/io,
      :street   => /(?:\b(?:\d+\w*|[a-z'-]+)\s*)+/io,
      :city     => /(?:\b[a-z][a-z'-]+\s*)+/io,
      :state    => State.regexp,
      :zip      => /\b(\d{5})(?:-(\d{4}))?\b/o,
      :at       => /\s(at|@|and|&)\s/io,
      :po_box => /\b[P|p]*(OST|ost)*\.*\s*[O|o|0]*(ffice|FFICE)*\.*\s*[B|b][O|o|0][X|x]\b/
    }
    

    These files from its source code can give more specifics:

    • https://github.com/daveworth/Indirizzo/blob/master/lib/indirizzo/address.rb
    • https://github.com/daveworth/Indirizzo/blob/master/lib/indirizzo/constants.rb
    • https://github.com/daveworth/Indirizzo/blob/master/lib/indirizzo/numbers.rb

    (But I would also generally agree with @drhenner's comment that, in order to make this easier on yourself, you could probably just accept these data inputs in separate fields.)

    Edit: To give a more specific answer about how to remove the street suffix (e.g., "Blvd"), you could use Indirizzo's regular expression constants (such as Suffix_Type from constants.rb) like so:

    address = Indirizzo::Address.new("7707 Foo Bar Blvd", :expand_streets => false)
    address.street.map {|street| street.gsub(Indirizzo::Suffix_Type.regexp, '').strip }
     => ["foo bar"]
    

    (Notice I also passed :expand_streets => false to the initializer, to avoid having both "Blvd" and "Boulevard" alternatives expanded, since we're discarding the suffix anyway.)

提交回复
热议问题