How to Block Spam Referrers like darodar.com from Accessing Website?

前端 未结 14 2274
北海茫月
北海茫月 2020-11-22 16:22

I have several websites that get daily around 5% of visits from spam referrers. There is one strange things I noticed about this referrers: they show in Google Analytics, bu

相关标签:
14条回答
  • 2020-11-22 16:57

    Lunametrics posted a nice article to solve this issue using Google Tag Manager: http://www.lunametrics.com/blog/2014/03/11/goodbye-to-exclude-filters-google-analytics/

    0 讨论(0)
  • 2020-11-22 16:57

    Filter future and historical ga spam of all types with the link provided. Hostname filtering is particularly easy.

    https://www.ohow.co/ultimate-guide-to-removing-irrelevant-traffic-in-google-analytics/

    0 讨论(0)
  • 2020-11-22 17:00

    Yes you can block with .htaccess and actually you should do it.

    Your .htaccess file could look like this:

    <IfModule mod_setenvif.c>
    # Set spammers referral as spambot
    SetEnvIfNoCase Referer darodar.com spambot=yes
    SetEnvIfNoCase Referer 7makemoneyonline.com spambot=yes
    ## add as many as you find
    
    Order allow,deny
    Allow from all
    Deny from env=spambot
    </IfModule>
    

    When traffic comes from these sites, they are blocked with this .htaccess, so the HTML is never loaded and therefore GA script is not fired up (from these sites).

    They try to collect traffic from you, once you see the incoming traffic in Google Analytics then trying to find out what is the source you go to that URL. It is harmless to your site, except your statistics are full of junk data.

    Google Analytics should prevent this, the same way GMail prevents spam email.

    0 讨论(0)
  • 2020-11-22 17:02

    I used these mod_rewrite methods for semalt:

    RewriteCond %{HTTP_REFERER} ^http(s)?://(www\.)?semalt\.com.*$ [NC]
    RewriteCond %{HTTP_REFERER} ^http(s)?://(.*\.)?semalt\.*$ [NC,OR]
    RewriteCond %{HTTP_REFERER} ^https?://([^.]+\.)*semalt\.com\ [NC,OR]
    

    or with the .htaccess module mod_setenvif

    SetEnvIfNoCase Referer semalt.com spambot=yes
    SetEnvIfNoCase REMOTE_ADDR "217\.23\.11\.15" spambot=yes
    SetEnvIfNoCase REMOTE_ADDR "217\.23\.7\.144" spambot=yes
    
    Order allow,deny
    Allow from all
    Deny from env=spambot
    

    I even created an Apache, Nginx & Varnish blacklist plus Google Analytics segment to prevent referrer spam traffic, you can find it here:

    https://github.com/Stevie-Ray/referrer-spam-blocker/

    0 讨论(0)
  • 2020-11-22 17:04

    This blog post suggests that the spam referrers manipulate Google Analytics and never actually visit your site, so blocking them is pointless. Google Analytics offers filtering if you want to mitigate fake site hits.

    0 讨论(0)
  • 2020-11-22 17:05

    2019 update

    I may have a solution to this problem as I find none of the other solutions to be effective.

    Let me address the problems of the existing solutions first

    1. Add a filter for each referrer spam domain.
    2. How many domains will you add?
    3. Most of these referrer spam domains exist for sometime and then disappear
    4. Maintain a blacklist of referrer spam domains.
    5. This gets even more complicated as they are basically endless in numbers.
    6. You would have to keep updating the blacklist.
    7. Also bigger the blacklist, the more time you need to scan it
    8. Anything else such as maintaining a manual htaccess or something will require manual intervention which will not scale as your site becomes more popular
    9. Anything automatic such as using AI to determine patterns in how referrer spam domains appear will have a hit/miss thing

    How do these bots work?

    First, it is crucial to understand how these bots work

    1. They use regex patterns at the least such as /UA-\d{6}/ to load tracking ids which they visit recursively after starting at a seed website

    I believe I have a solution that offers the following advantages

    1. No need to maintain whitelists and blacklist
    2. Will work against 99% of them easily and can always be modified to take it to 100%
    3. Requires almost NO manual intervention
    4. The idea is to NOT have a tracking ID at all in the script

    Here is an example

    script.
          //- Google Analytics ID
          var a = [85, 65, 45, 49, 49, 49, 49, 49, 49, 49, 49, 49, 45, 50];
    
          var newScript = document.createElement("script");
          newScript.type = "text/javascript";
          newScript.setAttribute("async", "true");
          newScript.setAttribute("src", "https://www.googletagmanager.com/gtag/js?id=" + a.map(i => String.fromCharCode(i)).join(""));
          document.documentElement.firstChild.appendChild(newScript);
    
          window.dataLayer = window.dataLayer || [];
          function gtag(){dataLayer.push(arguments);}
          gtag('js', new Date());
          gtag('config', a.map(i => String.fromCharCode(i)).join(""), { 'send_page_view': false });
          // Feature detects Navigation Timing API support.
          if (window.performance) {
            // Gets the number of milliseconds since page load
            // (and rounds the result since the value must be an integer).
            var timeSincePageLoad = Math.round(performance.now());
            console.log(timeSincePageLoad)
            // Sends the timing event to Google Analytics.
            gtag('event', 'timing_complete', {
              'name': 'load',
              'value': timeSincePageLoad,
              'event_category': '#{title}'
            });
          }
    
    1. We take a very simple approach, break the tracking ID of the form 'UA-1111111-1' into a char code array

    2. Now we construct the tracking ID dynamically from the char code array at any point we need a reference to the tracking ID

    3. The approach can be made infinitely more complex by turning it into encrypted bunch of numbers, base 8 , hexadecimal, adding a fixed offset, a random offset during each run, RSA encrypting the tracking ID with a private key on the server and decrypting it with a public key but the basic approach is REALLY fast, as arrays in JS are really fast, can easily beat 99% of the bots

    0 讨论(0)
提交回复
热议问题