问题
My question is closely related to this one Rails friendly id with non-Latin characters. Following the suggested answer there, I implemented a little bit different solution (I know, it's primitive, but I just want to make sure it works before adding complex behavior).
In my user model I have:
extend FriendlyId
friendly_id :slug_candidates, :use => [:slugged]
def slug_candidates
[
[:first_name, :last_name],
[:first_name, :last_name, :uid]
]
end
def should_generate_new_friendly_id?
first_name_changed? || last_name_changed? || uid_changed? || super
end
def normalize_friendly_id(value)
ERB::Util.url_encode(value.to_s.gsub("\s","-"))
end
now when I submit "مرحبا" as :first_name through the browser, slug value is set to "%D9%85%D8%B1%D8%AD%D8%A8%D8%A7-" in the database, which is what I expect (apart from the trailing "-").
However the url shown in the browser looks like this: http://localhost:3000/en/users/%25D9%2585%25D8%25B1%25D8%25AD%25D8%25A8%25D8%25A7- , which is not what I want. Does anyone know where these extra %25s are coming from and why?
回答1:
Meanwhile, I came a bit further, so I put my solution here maybe it could be helpful for someone else. The 25s in the url seem to be the result of url_encoding the '%' in my slug. I don't know where this happens, but I modified my normalize_friendly_id function, so that it doesn't affect me anymore. Here it is:
def normalize_friendly_id(value)
sep = '-'
#strip out tashkeel etc...
parameterized_string = value.to_s.gsub(/[\u0610-\u061A\u064B-\u065F\u06D6-\u06DC\u06DF-\u06E8\u06EA-\u06ED]/,''.freeze)
# Turn unwanted chars into the separator
parameterized_string.gsub!(/[^0-9A-Za-zÀ-ÖØ-öø-ÿ\u0620-\u064A\u0660-\u0669\u0671-\u06D3\u06F0-\u06F9\u0751-\u077F]+/,sep)
unless sep.nil? || sep.empty?
re_sep = Regexp.escape(sep)
# No more than one of the separator in a row.
parameterized_string.gsub!(/#{re_sep}{2,}/, sep)
# Remove leading/trailing separator.
parameterized_string.gsub!(/^#{re_sep}|#{re_sep}$/, ''.freeze)
end
parameterized_string.downcase
end
Some comments on that:
- I took only Latin and Arabic alphabets into account
- I decided that if I allowed arabic characters in the url, then there is no sense to keep the friendly_id behavior of converting e.g. "ü" to "ue", "ö" to "oe", etc. So I leave such characters in the url.
- I tried also to keep characters which might not be used in Arabic, but in other languages which use the Arabic alphabet such as Farsi or Urdu. I speak Arabic only, so I did a guess of which characters might be regarded as regular in other languages. For example is "ڿ" a regular character in any language? I have no idea, but I guess it could well be.
- again, since I speak arabic, I stripped the "Tashkil" out of the text. I would say, that texts without tashkil are in general easier to read than the ones with. However, I don't know if I should take care of some similar stuff in other languages. Any hints are much appreciated.
- Last: adding another alphabet would be as easy as adding the appropriate sequences to the regex. One only needs to know which characters should be white-listed.
I appreciate any comments or improvement suggestions.
来源:https://stackoverflow.com/questions/43482523/rails-friendly-id-with-arabic-slug