I\'m practicing with Ruby and regex to delete certain unwanted characters. For example:
input = input.gsub(/<\\/?[^>]*>/, \'\')
and fo
First of all, I think it might be easier to define what constitutes "correct input" and remove everything else. For example:
input = input.gsub(/[^0-9A-Za-z]/, '')
If that's not what you want (you want to support non-latin alphabets, etc.), then I think you should make a list of the glyphs you want to remove (like ™ or ☻), and remove them one-by-one, since it's hard to distinguish between a Chinese, Arabic, etc. character and a pictograph programmatically.
Finally, you might want to normalize your input by converting to or from HTML escape sequences.
An easier way to do this inspirated by Can Berk Güder answer is:
In order to delete special characters:
input = input.gsub(/\W/, '')
In order to keep word characters:
input = input.scan(/\w/)
At the end input is the same! Try it on : http://rubular.com/
You can match all the characters you want, and then join them together, like this:
original = "aøbæcå"
stripped = original.scan(/[a-zA-Z]/).to_s
puts stripped
which outputs "abc"
If you just wanted ASCII characters, then you can use:
original = "aøbauhrhræoeuacå"
cleaned = ""
original.each_byte { |x| cleaned << x unless x > 127 }
cleaned # => "abauhrhroeuac"
You can use parameterize:
'@!#$%^&*()111'.parameterize
=> "111"