Detecting specific words in a textarea submission

前端 未结 3 1044
深忆病人
深忆病人 2021-01-26 07:07

I have a new feature on my site, where users can submit any text (I stopped all HTML entries) via a textarea. The main problem I still have though is that they could type \"htt

相关标签:
3条回答
  • 2021-01-26 07:14

    This is a job for Regular Expressions.

    What you need to do it something like this:

    // A list of words you don't allow
    $disallowedWords = array(
      'these',
      'words',
      'are',
      'not',
      'allowed'
    );
    // Search for disallowed words.
    // The Regex used here should e.g. match 'are', but not match 'care' or 'stare'
    foreach ($disallowedWords as $word) {
      if (preg_match("/\s+$word\s+/i", $entry)) {
        die("The word '$word' is not allowed...");
      }
    }
    
    // This variable should contain a regex that will match URLs
    // there are thousands out there, take your pick. I have just
    // used an arbitrary one I found with Google
    $urlRegex = '(http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*';
    
    // Search for URLs
    if (preg_match($urlRegex, $entry)) {
      die("URLs are not allowed...");
    }
    
    0 讨论(0)
  • 2021-01-26 07:25

    A simple way to do this is to put all the words not allowed into an array and loop through them to check each one.

    $banned = array('http://', '.com', '.net', 'www.', '.org'); // Add more
    foreach ($banned as $word):
        if (strpos($entry, $word) !== false) die('Contains banned word');
    endforeach;
    

    The problem with this is if you get too carried away and start banning the word 'com' or something, there are other words and phrases that could be perfectly legal that contains the letters 'com' in that way that would cause a false positive. You could use regular expressions to search for strings that look like URLs, but then you can easily just break them up like I did above. There is no effective way to completely stop people from posting links into a comment. If you don't want them there, you'll ultimately just have to use moderation. Community moderation works very well, look at Stack Overflow for instance.

    0 讨论(0)
  • 2021-01-26 07:28

    You must use strpos more the once. With your way you evaluate the or statement with returns true / false and pass it to strpos.

    This way it should work:

    if (strpos($entry, "http://") !== false || strpos($entry, "https://") !== false || strpos($entry, ".com") !== false)
    
    0 讨论(0)
提交回复
热议问题