Rails friendly_id with arabic slug

问题

My question is closely related to this one Rails friendly id with non-Latin characters. Following the suggested answer there, I implemented a little bit different solution (I know, it's primitive, but I just want to make sure it works before adding complex behavior).

In my user model I have:

extend FriendlyId
friendly_id :slug_candidates, :use => [:slugged]

def slug_candidates
  [ 
    [:first_name, :last_name],
    [:first_name, :last_name, :uid]
    ]
end

def should_generate_new_friendly_id?
  first_name_changed? || last_name_changed? || uid_changed? || super
end

def normalize_friendly_id(value)
  ERB::Util.url_encode(value.to_s.gsub("\s","-"))
end

now when I submit "مرحبا" as :first_name through the browser, slug value is set to "%D9%85%D8%B1%D8%AD%D8%A8%D8%A7-" in the database, which is what I expect (apart from the trailing "-").

However the url shown in the browser looks like this: http://localhost:3000/en/users/%25D9%2585%25D8%25B1%25D8%25AD%25D8%25A8%25D8%25A7- , which is not what I want. Does anyone know where these extra %25s are coming from and why?

回答1:

Meanwhile, I came a bit further, so I put my solution here maybe it could be helpful for someone else. The 25s in the url seem to be the result of url_encoding the '%' in my slug. I don't know where this happens, but I modified my normalize_friendly_id function, so that it doesn't affect me anymore. Here it is:

def normalize_friendly_id(value)
  sep = '-'
  #strip out tashkeel etc...
  parameterized_string = value.to_s.gsub(/[\u0610-\u061A\u064B-\u065F\u06D6-\u06DC\u06DF-\u06E8\u06EA-\u06ED]/,''.freeze)
  # Turn unwanted chars into the separator
  parameterized_string.gsub!(/[^0-9A-Za-zÀ-ÖØ-öø-ÿ\u0620-\u064A\u0660-\u0669\u0671-\u06D3\u06F0-\u06F9\u0751-\u077F]+/,sep)
  unless sep.nil? || sep.empty?
    re_sep = Regexp.escape(sep)
    # No more than one of the separator in a row.
    parameterized_string.gsub!(/#{re_sep}{2,}/, sep)
    # Remove leading/trailing separator.
    parameterized_string.gsub!(/^#{re_sep}|#{re_sep}$/, ''.freeze)
  end
  parameterized_string.downcase
end

Some comments on that:

I took only Latin and Arabic alphabets into account
I decided that if I allowed arabic characters in the url, then there is no sense to keep the friendly_id behavior of converting e.g. "ü" to "ue", "ö" to "oe", etc. So I leave such characters in the url.
I tried also to keep characters which might not be used in Arabic, but in other languages which use the Arabic alphabet such as Farsi or Urdu. I speak Arabic only, so I did a guess of which characters might be regarded as regular in other languages. For example is "ڿ" a regular character in any language? I have no idea, but I guess it could well be.
again, since I speak arabic, I stripped the "Tashkil" out of the text. I would say, that texts without tashkil are in general easier to read than the ones with. However, I don't know if I should take care of some similar stuff in other languages. Any hints are much appreciated.
Last: adding another alphabet would be as easy as adding the appropriate sequences to the regex. One only needs to know which characters should be white-listed.

I appreciate any comments or improvement suggestions.

来源：https://stackoverflow.com/questions/43482523/rails-friendly-id-with-arabic-slug

标签

ruby-on-rails

url

arabic

friendly-id