PHP nested array search

前端未结

关注

 5  699

I\'m new in PHP

I have an array like this

$suspiciousList = array(
array (\"word\" => \"badword1\", \"score\" => 400, \"type\" => 1), 
array


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  轻奢々        
                
              
                            
                2021-01-15 17:47
              
            
            
                                                                       
As Jirka Helmich suggested you could remove whitespaces (and maybe other special chars) and then search the string to find words from your array.

public function searchForBadWords($strippedText) {
     foreach($suspiciousList as $suspiciousPart) {
          $count = substr_count($strippedText, $suspiciousPart['word']);
          //you can use str_replace here or something, it depends what you want to achive
     }
}


Problem is if you have words like blablabad wordblabla and you remove spaces to normal words could become bad words blablabadwordblabla (know what I mean?) :D

Cheers

Edit: So Ahmad I see you just get words recognizing them by " " on the beginning/end(in shortcut). Maybe you should try to implement both methods, yours with single words and this above with substring searching. It depends also how much you care about performance. Maybe you should try do some reserches or sth to see how effective it is?:D
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一生所求        
                
              
                            
                2021-01-15 17:50
              
            
            
                                                                       

Strip spaces
Search with ONE regular expression containing all your keywords, like this: (word1|word2|word3)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  旧巷少年郎        
                
              
                            
                2021-01-15 17:55
              
            
            
                                                                       
Anyway, you can strip whitespace characters and use (mb_)substr_count() but it leads to getting false positives.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  生来不讨喜        
                
              
                            
                2021-01-15 18:00
              
            
            
                                                                       
@f1ames : I'm using these following code to make it array.

    $words = mb_strtolower($words, 'UTF-8');
    $words = $this->removeUniCharCategories($words);
    $words = explode(" ",$words);
    //Remove empty Array !
    $words = array_filter($words);
    foreach ($words as &$value) {
        $newWords[] = $value;
    }
    $words = $newWords;


But i'm still find the best sollution
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  北海茫月        
                
              
                            
                2021-01-15 18:08
              
            
            
                                                                       
This question is a good start: How do you implement a good profanity filter? - and I agree with the conclusion, i.e. the detection will have always poor results. 

I would try these approaches:

1) Simply detect words that are vulgar according to your dictionary.

2) Come up with a few heuristics like "continuous sequence of 'words' composed of one letter" (b a d w o r d) and use them to evaluate users' posts. Then you can compute expected number of vulgar words: \sum_i^{number of your heuristics} P_i * N_i, where P_i is the probability that word found with heuristic i is really a vulgar one and N_i is a number of words found by heuristics i. I think the probabilistic approach is better than simply stating "this post does (not) contain a vulgar word".

3) Let a moderator decide if a post is really vulgar or not. Otherwise imperfection of your automatic replacing method will most probably get your users mad.

4) I think it's useless to look up words in an English (or Turkish?) dictionary in order to find words that are not really English words because people misspell words too much these days. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复