How to get rid of non-ascii characters in ruby

后端 未结 7 968
遥遥无期
遥遥无期 2020-11-30 18:55

I have a Ruby CGI (not rails) that picks photos and captions from a web form. My users are very keen on using smart quotes and ligatures, they are pasting from other sources

相关标签:
7条回答
  • 2020-11-30 19:48

    With a bit of help from @masakielastic I have solved this problem for my personal purposes using the #chars method.

    The trick is to break down each character into its own separate block so that ruby can fail.

    Ruby needs to fail when it confronts binary code etc. If you don't allow ruby to go ahead and fail its a tough road when it comes to this stuff. So I use the String#chars method to break the given string into an array of characters. Then I pass that code into a sanitizing method that allows the code to have "microfailures" (my coinage) within the string.

    So, given a "dirty" string, lets say you used File#read on a picture. (my case)

    dirty = File.open(filepath).read    
    clean_chars = dirty.chars.select do |c|
      begin
        num_or_letter?(c)
      rescue ArgumentError
        next
      end
    end
    clean = clean_chars.join("")
    
    def num_or_letter?(char)
      if char =~ /[a-zA-Z0-9]/
        true
      elsif char =~ Regexp.union(" ", ".", "?", "-", "+", "/", ",", "(", ")")
        true
      end
    end
    
    0 讨论(0)
提交回复
热议问题