How can you strip non-ASCII characters from a string? (in C#)

前端未结

关注

 11  1087

迷失自我

相关标签:

11条回答

爱一瞬间的悲伤

2020-11-22 17:29
I believe MonsCamus meant:
```
parsememo = Regex.Replace(parsememo, @"[^\u0020-\u007E]", string.Empty);
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
陌清茗

2020-11-22 17:33

If you want not to strip, but to actually convert latin accented to non-accented characters, take a look at this question: How do I translate 8bit characters into 7bit characters? (i.e. Ü to U)

0 讨论(0)
发布评论:

提交评论
- 加载中...
鱼传尺愫

2020-11-22 17:33
I came here looking for a solution for extended ascii characters, but couldnt find it. The closest I found is bzlm's solution. But that works only for ASCII Code upto 127(obviously you can replace the encoding type in his code, but i think it was a bit complex to understand. Hence, sharing this version). Here's a solution that works for extended ASCII codes i.e. upto 255 which is the ISO 8859-1

It finds and strips out non-ascii characters(greater than 255)
```
Dim str1 as String= "â, ??î or ôu                                                                    
                                                        
            
```
0 讨论(0) 发布评论: 提交评论加载中...


          	          
            
           
            
                              
                
              
              
                
                  遇见更好的自我        
                
              
                            
                2020-11-22 17:34
              
            
            
                                                                       
This is not optimal performance-wise, but a pretty straight-forward Linq approach:

string strippedString = new string(
    yourString.Where(c => c <= sbyte.MaxValue).ToArray()
    );


The downside is that all the "surviving" characters are first put into an array of type char[] which is then thrown away after the string constructor no longer uses it.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  爱一瞬间的悲伤        
                
              
                            
                2020-11-22 17:38
              
            
            
                                                                       
I used this regex expression:

    string s = "søme string";
    Regex regex = new Regex(@"[^a-zA-Z0-9\s]", (RegexOptions)0);
    return regex.Replace(s, "");

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  清歌不尽        
                
              
                            
                2020-11-22 17:40
              
            
            
                                                                       
string s = "søme string";
s = Regex.Replace(s, @"[^\u0000-\u007F]+", string.Empty);

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页