How to get rid of non-ascii characters in ruby

后端 未结 7 966
遥遥无期
遥遥无期 2020-11-30 18:55

I have a Ruby CGI (not rails) that picks photos and captions from a web form. My users are very keen on using smart quotes and ligatures, they are pasting from other sources

相关标签:
7条回答
  • 2020-11-30 19:32

    Here's my suggestion using Iconv.

    class String
      def remove_non_ascii
        require 'iconv'
        Iconv.conv('ASCII//IGNORE', 'UTF8', self)
      end
    end
    
    0 讨论(0)
  • 2020-11-30 19:37
    
    class String
     def remove_non_ascii(replacement="") 
       self.gsub(/[\u0080-\u00ff]/, replacement)
     end
    end
    
    0 讨论(0)
  • 2020-11-30 19:37

    No there isn't short of removing all characters beside the basic ones (which is recommended above). The best slution would be handling these names properly (since most filesystems today do not have any problems with Unicode names). If your users paste in ligatures they sure as hell will want to get them back too. If filesystem is your problem, abstract it away and set the filename to some md5 (this also allows you to easily shard uploads into buckets which scan very quickly since they never have too many entries).

    0 讨论(0)
  • 2020-11-30 19:39

    Quick GS revealed this discussion which suggests the following method:

    class String
      def remove_nonascii(replacement)
        n=self.split("")
        self.slice!(0..self.size)
        n.each { |b|
         if b[0].to_i< 33 || b[0].to_i>127 then
           self.concat(replacement)
         else
           self.concat(b)
         end
        }
        self.to_s
      end
    end
    
    0 讨论(0)
  • 2020-11-30 19:45

    Use String#encode

    The official way to convert between string encodings as of Ruby 1.9 is to use String#encode.

    To simply remove non-ASCII characters, you could do this:

    some_ascii   = "abc"
    some_unicode = "áëëçüñżλφθΩ                                                                    
    0 讨论(0)
  • 2020-11-30 19:47
    class String
      def strip_control_characters
        self.chars.reject { |char| char.ascii_only? and (char.ord < 32 or char.ord == 127) }.join
      end
    end
    
    0 讨论(0)
提交回复
热议问题