Soft 404 Crawl Error Search Spam
You’re more than likely here because you were in your Google Search Console account and saw a message that said “Increase in “soft-404” pages on (insert your URL here)”. Upon further investigation, you probably saw something that looked a lot like the screenshot below, right?
Don’t panic, your site hasn’t been hacked, but it looks like another pesky spammer has decided to target yet another Google product in order to lure users to their piss poor website, in this case the owner of Q82019309.com is the culprit. Before I go on, don’t visit that site. I haven’t visited it myself, though a user on the Google Product Forums said they received the following message:
Reported Attack Page!
This web page at q82019309.com has been reported as an attack page and has been blocked based on your security preferences.
Attack pages try to install programs that steal private information, use your computer to attack others, or damage your system.
Some attack pages intentionally distribute harmful software, but many are compromised without the knowledge or permission of their owners.
A quick search for “Q82019309.com” in Google will also show you that these search result crawl errors are getting indexed, so it’s best to take care of the problem now before it gets worse.
How To Deal With This Soft 404 Crawl Error Spam
I haven’t had a chance to test anything out yet as I just barely noticed this today, thought I do have a few ideas on how this problem could be solved. Once I’ve successfully solved the problem, I’ll update this article stating exactly what I did to remove the soft 404 page errors from my account and what steps I’ve taken to prevent any future occurrences of this spam.
What Worked For Me
While I’ve listed multiple solutions below, the best solution to this problem so far is to combine solution 1 and solution 2.
Solution 1: Disallow Search In Robots.Txt
After some research, I found an article on a site called Woorkup about soft 404 errors in WordPress. It appears that Brian Jackson had the same issue as me, and as such offered the solution of disallowing by adding the bold line to your robots.txt file below.
User-agent: *
Disallow: /wp-admin/
Disallow: /?s=
Disallow: /search/
Allow: /wp-admin/admin-ajax.php
I combined this with the disable search plugin, so I’ll update this article if it works.
January 17th Update: Success! You might be able to get away with just disallowing search via robots.txt, though I’ll need to test it out by disabling the WordPress plugin I installed first.
January 20th Update: I enabled my search box again yesterday by disabling the plugin I downloaded, and the old occurrences of the spam came back.
Solution 2: Disable Search On Your Site
If the users on your website never use the search function on your website, then this option would most likely be easy for you to consider. If this sounds like a solution you’re interested in, then download the Disable Search plugin.
January 15th Update: So while this prevented any new occurrences of the spam from rearing its ugly head in my Search Console account, the old occurrences still came back.
Solution 3: Fix WordPress’ Issue Directly
I was reading an article on the site Oxhow about soft 404 errors on WordPress. “Case 1” in the article describes our situation regarding WordPress’ internal search feature as being the cause of this issue (well, you know, other than the spammer), and the solution the author presented was to add the code below into your theme’s functions.php file.
function force_to_404()
{
if ( have_posts() )
{
return FALSE;
}
header( ‘HTTP/1.0 404 Not Found’ );
$GLOBALS[‘wp_query’]->is_404 = TRUE;
return TRUE;
}
function oxh_404soft(){
echo genesis_html5() ? ‘<article class=”entry”>’ : ‘<div class=”post hentry”>’;
printf( ‘<h1 class=”entry-title”>%s</h1>’, __( ‘Not found, error 404’, ‘genesis’ ) );
echo ‘<div class=”entry-content”>’;
if ( genesis_html5() ) :
echo ‘<p>’ . sprintf( __( ‘The page you are looking for no longer exists. Perhaps you can return back to the site\’s <a href=”%s”>homepage</a> and see if you can find what you are looking for. Or, you can try finding it by using the search form below.’, ‘genesis’ ), home_url() ) . ‘</p>’;
echo ‘<p>’ . get_search_form() . ‘</p>’;
else :
?>
<p><?php printf( __( ‘The page you are looking for no longer exists. Perhaps you can return back to the site\’s <a href=”%s”>homepage</a> and see if you can find what you are looking for. Or, you can try finding it with the information below.’, ‘genesis’ ), home_url() ); ?></p>
<?php
endif;
echo ‘</div>’;
echo genesis_html5() ? ‘</article>’ : ‘</div>’;
}
// Force 404 Soft Errors to 404 Not Found Errors
add_action(‘get_header’,’oxh_soft_404′);
function oxh_soft_404(){
if(is_search()){
if ( force_to_404() ) {
remove_action( ‘genesis_before_loop’, ‘genesis_do_search_title’ );
add_action( ‘genesis_loop’, ‘oxh_404soft’ );
return; //here it will stop any further process so soft 404 will not occur
}
}
elseif(is_archive()){if ( force_to_404() ) {
add_action( ‘genesis_loop’, ‘oxh_404soft’ );
return; //here it will stop any further process so soft 404 will not occur
}
}
}
The author wrote the code below for Genesis themes, though you should be able to tweak it in order to use it with other themes.
Solution 4: Change Your Search Parameters
During my research, I found someone speculated that these spammers are targeting sites that still use WordPress’ default search parameter (‘s’). A man named Cody shared the code below on Zurb.com along with instructions in the comment tags.
// Add this code to your functions.php file in your active theme folder
// Allow WordPress to access “search” in the query string
function ds_whitelist_new_search_parameter( $allowed_query_vars ) {
$allowed_query_vars[] = ‘search’;
return $allowed_query_vars;
}
add_filter(‘query_vars’, ‘ds_whitelist_new_search_parameter’ );// populate s parameter with value of search
function ds_swap_search_parameter($query_string) {$query_string_array = array();
// convert the query string to an array
parse_str($query_string, $query_string_array);// if “search” is in the query string
if(isset($query_string_array[‘search’])){
$query_string_array[‘s’] = $query_string_array[‘search’]; // replace “s” with value of “search”
unset($query_string_array[‘search’]); // delete “search” from query string
}return http_build_query($query_string_array, ”, ‘&’); // Return our modified query variables
}
add_filter(‘query_string’, ‘ds_swap_search_parameter’);
After adding the code, you’ll need to update your search forms to use the new parameter.
Solution 5: Add “q82019309.com” To Your Spam Filter
I’m not much of a coder, but I’d imagine that you can set up a honey pot by blacklisting q82019309.com from your site and blocking any IP addresses that use it simultaneously. However, this is a short term strategy since, if this spam turns out to be anything like the Google Analytics referral spam issue that’s been going on for over a decade, other spammers with different websites will eventually use this method.
Since my ads were interrupting the codes I’ve provided in this article, I’ve disabled ads on this post (you’re welcome).
Update May 2017
This update is for those of you wondering if the soft 404 error spam ever came back on this site.
It did not.
The data in the screenshot above doesn’t show the soft 404 spam from January. Instead, what you’re looking at is data from the time when I was testing out some other potential solutions to this problem.
Thank you for your guideline,
Best solution for the moment is disable search box from your blog + Disallow Search
Also there is a plugin who can fix all the 404 pages error .
The name of the plugin : All 404 Redirect to Homepage
To download the plugin from : https://wordpress.org/plugins/all-404-redirect-to-homepage/
Thank you for your guide.
Badr
Yeah, no problem. Disabling my search box and disallowing search via robots.txt has worked for me.
That 404 plugin seems like it could be used as a temporary solution, though I personally wouldn’t use it as a long term solution as Google recommends against redirecting non-existent pages to the homepage for the reason below:
Source: https://support.google.com/webmasters/answer/181708?hl=en%3C/blockquote%3E
Disallow: /search/
Disallow: /?
And all ok ;)
Hey, that’s a good one (especially the first one)! I’ll add it to the article.
Thanks!
i tried all but not working
Hey Drill SEO,
I’m sorry to hear that. I wonder if the soft 404 errors on your site has a different cause than the one I had experienced on this site and my clients’ sites.