How to Block Semalt from Crawling Your Site

Categories
Web Design

Don’t know what Semalt is? Well you are about to learn what it is and why it is a huge pain. More importantly, you will learn the detailed steps on how to get rid of it for good. For all those who have ever tried to get rid of Semalt, your prayers have been answered in this blog post.

What is Semalt?

Semalt is a site crawler, which is a robot that looks at everything on your site. However, Semalt doesn’t behave like a normal site crawler. It just makes itself at home with your site and completely ignores anything you politely ask it to do with robots.txt. Normally a site crawler will ignore pages that it is told to or just acknowledge the fact that it is not welcome on your site.

Why is Semalt bad?

It skews your analytics data and this is not exactly a good thing. Sure it brings your total visits up, but it’s fake. Semalt also brings your bounce rate up, because it spends less than a second on the site. This is horrible, because you don’t know if what you’re doing to improve your site is actually working.

56531257

How do we stop it?

Okay, not magic, but I can’t tell you the answer yet. First, I need to tell you what doesn’t work. One would think they could just ignore visits from this site in analytics, and you can, but there is a but. You see Semalt is evil, just like I said earlier, and to combat this they decided to just keep making infinite subdomains that do the same thing. So no matter what you do in analytics, or with robots.txt, it will keep skewing your results.

The Answer to Blocking Semalt

I guess I’ll tell you the secret since you made it this far. The trick is to use .htacccess to block any crawler with Semalt in the name. I achieved it with this, and yes it is messy.

# block spammers
RewriteCond %{HTTP_REFERER} (.)?(semalt).com$ [NC]
RewriteRule . - [F]

HTTP_REFERER allows us to block traffic based on the name of the referrer. Therefore, with this chunk of code we are telling the site to block access to anything with Semalt in the name, including semalt.semalt.com and any other sub domain that they come up with.

How to Edit Your .htaccess File

The most common way to do this is to use an FTP program, such as FileZilla. So we will be discussing how to set this up through FileZilla. We must also point out that we are not responsible for any damage you do to your site while messing with these files.

  1. Get your FTP login information. The way to get this will vary based on your host. Since we cannot go over every host, we recommend calling your host if you cannot find the information. What you will need to login is:
    • Host: The IP address of the server you’re using or ftp.yourdomain.com
    • Username and Password: This is sometimes different then what you use to login to the hosting providers website. For example, the username and password you use to get into GoDaddy might be different than your FTP one.
  2. After you get the above information go ahead and install the FileZilla client from here: https://filezilla-project.org/.
  3. After installing and opening FileZilla fill out each section with the information you found in step one. For Port put the number 21 and hit enter.
  4. On the right side you will now see a new list of files. Look for public_html, or sometimes there is a folder named WWW. Double click the folder.
  5. In here you will find a file named .htaccess. First download a copy of .htaccess
    • Make sure it is a place you can easily get back to. This will be a backup copy of your .htaccess in case something goes wrong.
    • The easiest way to do this is simply drag the file to a folder on the left.
  6. Now right click the first .htaccess file on the right side of FileZilla. Click View/Edit.
    • It may ask you what program to open it with; any basic text editor, such as Notepad, will be fine.
  7. Add the blocking Semalt code that I discussed earlier to the bottom of the bottom of the file and save it. Here it is again for you convenience:
    • # block spammers
      RewriteCond %{HTTP_REFERER} (.)?(semalt).com$ [NC]
      RewriteRule . – [F]
  8. Go back to FileZilla. Select Yes and it will upload the changes you made.
  9. Now go make sure your website is working.
    • If your website isn’t working, then upload that backup of .htaccess that we made earlier and overwrite the file on your server when asked.

Final Thoughts

If you are still having problems or need some more clarification, then I recommend checking out these related posts by popular hosting companies:

You could also try calling your hosting companies support to do this or just hire us at In2itive Search. 😉 😉

Update!

Since writing this guide a few more spam sites have popped up. I have updated the code to block these other sites as well.

# block spammers
RewriteCond %{HTTP_REFERER} (.)?(semalt|kambasoft|savetubevideo|seoanalyses|buttons-for-website|7makemoneyonline).com$ [NC,OR]
RewriteCond %{HTTP_REFERER} ilovevitaly.co
RewriteRule . - [F]

You will notice in the code above that I have put “semalt|kambasoft|savetubevideo|seoanalyses|buttons-for-website|7makemoneyonline”. The code is blocking all of these sites. To add another site that you want to block simply add | followed by the site URL without .com at the end. So if you want to add a site called theworstspam.com your code would look like this.

# block spammers
RewriteCond %{HTTP_REFERER} (.)?(semalt|kambasoft|savetubevideo|seoanalyses|buttons-for-website|7makemoneyonline|theworstspam).com$ [NC,OR]
RewriteCond %{HTTP_REFERER} ilovevitaly.co
RewriteRule . - [F]

If the site does not end in .com, another line of code will need to be added like we did for iloveitaly.co

RewriteCond %{HTTP_REFERER} ilovevitaly.co

For example if we want to block spammy.org we would add a line like this to the code. One important thing not to forget is that [NC, OR] needs to be at the end of every site except the last as shown below.

# block spammers
RewriteCond %{HTTP_REFERER} (.)?(semalt|kambasoft|savetubevideo|seoanalyses|buttons-for-website|7makemoneyonline).com$ [NC,OR]
RewriteCond %{HTTP_REFERER} ilovevitaly.co [NC, OR]
RewriteCond %{HTTP_REFERER} spammy.org
RewriteRule . - [F]

About the Author

Nick Footer

Nick Footer is an entrepreneur and founder of Intuitive Digital, a national award-winning digital marketing agency in Portland, Oregon. With over 15 years of experience, he has helped hundreds of businesses improve their online presence through search engine optimization, paid advertising, and website design.

4 thoughts on “How to Block Semalt from Crawling Your Site

  1. Hi Brandon,

    Great tip tp combat an ever increasing problem! I has a couple of questions as a non coder:

    Why is (.) only used after the first {HTTP_REFERER} and the dollar sign at the end only of the first one?

    Thanks,

    Richard

Leave a Reply

Your email address will not be published. Required fields are marked *

Receive expert marketing tips