I have several websites that get daily around 5% of visits from spam referrers. There is one strange things I noticed about this referrers: they show in Google Analytics, bu
Lunametrics posted a nice article to solve this issue using Google Tag Manager: http://www.lunametrics.com/blog/2014/03/11/goodbye-to-exclude-filters-google-analytics/
Filter future and historical ga spam of all types with the link provided. Hostname filtering is particularly easy.
https://www.ohow.co/ultimate-guide-to-removing-irrelevant-traffic-in-google-analytics/
Yes you can block with .htaccess and actually you should do it.
Your .htaccess file could look like this:
<IfModule mod_setenvif.c>
# Set spammers referral as spambot
SetEnvIfNoCase Referer darodar.com spambot=yes
SetEnvIfNoCase Referer 7makemoneyonline.com spambot=yes
## add as many as you find
Order allow,deny
Allow from all
Deny from env=spambot
</IfModule>
When traffic comes from these sites, they are blocked with this .htaccess, so the HTML is never loaded and therefore GA script is not fired up (from these sites).
They try to collect traffic from you, once you see the incoming traffic in Google Analytics then trying to find out what is the source you go to that URL. It is harmless to your site, except your statistics are full of junk data.
Google Analytics should prevent this, the same way GMail prevents spam email.
I used these mod_rewrite methods for semalt:
RewriteCond %{HTTP_REFERER} ^http(s)?://(www\.)?semalt\.com.*$ [NC]
RewriteCond %{HTTP_REFERER} ^http(s)?://(.*\.)?semalt\.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^https?://([^.]+\.)*semalt\.com\ [NC,OR]
or with the .htaccess module mod_setenvif
SetEnvIfNoCase Referer semalt.com spambot=yes
SetEnvIfNoCase REMOTE_ADDR "217\.23\.11\.15" spambot=yes
SetEnvIfNoCase REMOTE_ADDR "217\.23\.7\.144" spambot=yes
Order allow,deny
Allow from all
Deny from env=spambot
I even created an Apache, Nginx & Varnish blacklist plus Google Analytics segment to prevent referrer spam traffic, you can find it here:
https://github.com/Stevie-Ray/referrer-spam-blocker/
This blog post suggests that the spam referrers manipulate Google Analytics and never actually visit your site, so blocking them is pointless. Google Analytics offers filtering if you want to mitigate fake site hits.
2019 update
I may have a solution to this problem as I find none of the other solutions to be effective.
Let me address the problems of the existing solutions first
How do these bots work?
First, it is crucial to understand how these bots work
I believe I have a solution that offers the following advantages
Here is an example
script.
//- Google Analytics ID
var a = [85, 65, 45, 49, 49, 49, 49, 49, 49, 49, 49, 49, 45, 50];
var newScript = document.createElement("script");
newScript.type = "text/javascript";
newScript.setAttribute("async", "true");
newScript.setAttribute("src", "https://www.googletagmanager.com/gtag/js?id=" + a.map(i => String.fromCharCode(i)).join(""));
document.documentElement.firstChild.appendChild(newScript);
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', a.map(i => String.fromCharCode(i)).join(""), { 'send_page_view': false });
// Feature detects Navigation Timing API support.
if (window.performance) {
// Gets the number of milliseconds since page load
// (and rounds the result since the value must be an integer).
var timeSincePageLoad = Math.round(performance.now());
console.log(timeSincePageLoad)
// Sends the timing event to Google Analytics.
gtag('event', 'timing_complete', {
'name': 'load',
'value': timeSincePageLoad,
'event_category': '#{title}'
});
}
We take a very simple approach, break the tracking ID of the form 'UA-1111111-1' into a char code array
Now we construct the tracking ID dynamically from the char code array at any point we need a reference to the tracking ID
The approach can be made infinitely more complex by turning it into encrypted bunch of numbers, base 8 , hexadecimal, adding a fixed offset, a random offset during each run, RSA encrypting the tracking ID with a private key on the server and decrypting it with a public key but the basic approach is REALLY fast, as arrays in JS are really fast, can easily beat 99% of the bots