Ever click on a page in a search result listing and get a “404 – Page Not Found” error? It probably hasn’t happened much to you since the search engines do a fairly good job of not ranking pages with 404 errors, or even sites that have “coming soon” pages.
There are a couple of common ways you as a site owner can inadvertently generate these types of pages, and you want to make sure they are not indexed in the search engines.
The first way is probably the most common – you changed the URL and forgot to redirect the old one to the new one. So you might have changed a page from “/relevance-of-404-errors/” to “/importance-of-404-errors/”. The problem is that without permanently redirecting the old URL, it could still be visible in the search results, leading to that “404 – Page Not Found” error. Whoops.
The second way is when you simply remove pages from your website, not realizing the pages are still indexed in Google or other search engines. This is common with special promotional pages for marketing, or landing pages you might be temporarily using for paid search efforts.
The ideal 404 response:
Here /abc.html, /pqr.html and /xyz.html are pages that don’t exist.
There are two components to this:
1. Search Engine component: In terms of SEO and to avoid any implications of 404 errors in search engines (which we will discuss below) ensure that that when a page is requested which doesn’t exist the web server should return a ‘404 not found’ status code in the header.
2. Usability component: The browser should preferably render a custom 404 page. From a user’s perspective once we reach a page which doesn’t exist there should be ways of going back to the main page; without hitting the back button.
If your domain doesn’t handle number 1 you have chances of running into issues of duplicate content. The reason: If it doesn’t return a “404 not found” it means you are giving a green signal to a search engine to index the page. And since the same page is displayed whenever anyone types a URL which doesn’t exist on your domain (theoretically infinite variations are possible) this same page is indexed under multiple non-existent URL’s. This is a duplicate content issue and the search engine could possibly put a small red flag on your site. Something you definitely want to avoid.
The 404 myth:
The most common case is when someone thinks they have a valid 404 because they have a custom 404 page and their server is not returning a ‘404 not found’. This is misleading and a common scenario looks like this. In this case we are giving the search engine a green signal by returning a ‘200 OK’ to index /abc.html, /pqr.html, /xyz.html all for the same 404 page. This leads to the search engine indexing the 404 page (which we don’t want) for all the three URL’s : a potential duplicate content issue.
How to check for 404’s:
Run an analysis of your site (it takes 30 seconds) on our Free Website Analyzer; it identifies 404 errors among other SEO factors.
There is a really useful Firefox plugin called ‘Live HTTP Headers‘ where you can check the status code in the header to see if it’s a’404 not found’.