How to Fix Index Coverage Issues in Google Search Console

Fact Checking SChema on Healthcare
How to Use Fact Checking Schema Markup on Healthcare Sites
July 21, 2022
Examples of SEO Trust Signals for Healthcare
15 Examples of SEO Trust Signals for Healthcare Businesses
August 3, 2022
Show all
Index Coverage Issues

Index Coverage Issues

Google may be the ultimate arbiter of whether an online business succeeds or fails, but they aren’t making their decisions haphazardly. They set forth guidelines for how sites should function for accessibility and usability, and they provide a wealth of tools to help site owners diagnose and fix any issues that can crop up.

One of those tools is the Google Search Console (formerly Webmaster Tools), which provides dozens of different kinds of reports about pretty much anything relevant to Google’s analysis of your site. Among the many different reports they provide is the Index Coverage report.

The index coverage report is pretty important. So much so that if pages end up with errors on them, Google will even send you an email letting you know and asking you to fix them.

So, if you’ve received that kind of email, or you’re just concerned about issues you see in your report, here’s how to handle it.

What is the Index Coverage Report?

The index coverage report is a simple report that shows the number of URLs on your site that are known to Google.

What is The Index Coverage Report

These URLs are divided into four categories:

  • Valid. These are pages that have no issues, are indexed just fine, and are not causing problems.
  • Excluded. These are pages that don’t need to be indexed, like 404 pages, pages with redirects, and non-canonical URLs.
  • Valid, With Warnings. These are pages that may have issues, but the issues aren’t enough to prevent indexation, so they’re still indexed.
  • Errors. These are pages that have an error on them preventing indexation for one reason or another.

Each type has sub-categories. For example, “excluded” pages can be 404 pages, pages with redirects on them, pages specifically excluded with a “noindex” tag, and pages that have a canonical URL pointing to a different version.

There are three “phases” to Google indexing and ranking a webpage. Just because Google knows about your site (and the pages on it) doesn’t mean they’ve added it to their index and given it a rank.

  • Discovery. Google uses things like internal and external links, sitemaps, and live URL tests to discover pages they might potentially want to add to their index.
  • Crawling. Google visits the URL and does a deep dive analysis, checking for issues that might violate content or SEO policies or technical errors.
  • Indexing. This is where Google breaks down the page into component elements, does language analysis, and figures out where it would slot into the search results for various queries. Only after this is complete can your pages show up in search results.

To view your index coverage report, simply log into your Google Search Console account, choose your website property, and click the Index subheading on the left, then the Coverage report. On the report, you can click each of the four categories to see how many URLs on your site fall into each and scroll down to a table further dividing them into sub-categories.

What to Look For in the Index Coverage Report

What should you look for, and what should be concerning in an index coverage report?

First and most obviously, errors. Any page that has a glaring error on it should be diagnosed and fixed or noindexed, so it doesn’t show up as an error but as an excluded page instead.

What to Look for in Index Coverage

The different kinds of errors include:

  • Server Errors. If Google tries to index your page and the server is down, that’s obviously a problem. You’ll need to talk to your web host about this one.
  • Redirect Errors. Redirects are common online, but sometimes they can end up in long chains, chains that break, or loops. Fixing these is important for pages you need to keep available.
  • Submitted URL Blocked. If you have a page in your sitemap, but your robots.txt file blocks access to the page, this shows up as an error. Remove one of the two, depending on whether or not you want the page to be found.
  • Submitted URL NoIndexed. Same as above, except the block is in the page metadata or HTTP directive as a noindex tag.
  • 40X Errors. Soft 404s, hard 404s, 401s, 403s, and other 4XX issues all cause errors you’ll need to fix if you want the pages to be visible.

Once you’ve fixed pages with errors on them – or noindexed them – you can move on to warnings.

Pages with warnings fall into two categories. The first is pages that are blocked with robots.txt directives but are still indexed because another website linked to them. If you want the page to be visible, remove it from robots.txt. If you want it blocked, remove it from robots.txt and add a noindex tag to the page itself.

The second is pages that are indexed without content on them.

Either the page is empty, or the content is somehow blocked or cloaked to Google. For example, the content generated solely by scripts may not render when Google visits. You’ll want to, again, either noindex the page or make the content visible to Google.

Next, you can dig into your excluded pages.

This is where most of the work will happen since sites usually don’t run into actual errors outside of gross misconfigurations, but exclusions are extremely common. So, rather than cover it here, we’ll cover it in the next section.

Finally, valid pages are fine to ignore because they’re valid and have no issues with them.

You may see some that say “Indexed, not submitted in sitemap,” which simply means your sitemap doesn’t have the page on it. This usually happens if your sitemap is slower to generate than you publish, so Google can find content that you’ve published between the last time your sitemap updated and now. That’s not a problem and usually goes away when the pages are added to the sitemap.

You can check to make sure these pages are pages you want to be indexed, and if they aren’t, you can remove them from the index. This may be relevant in the case of Attachment pages, Tag pages, or System pages, but usually, a good configuration will keep those pages hidden anyway.

Handling Excluded Pages

Pages that are excluded from the index for one reason or another are all lumped together, but there are actually a ton of different reasons why they may be excluded. Sometimes they’re valid reasons, and sometimes they’re issues you want to fix. What are the different exclusions, and how can you handle them?

1. Excluded by NoIndex Tag

If the page has a noindex tag on it, it won’t be indexed. If you don’t want the page indexed, that’s fine.

Excluded by Noindex

If you do want the page indexed, you’ll need to find and remove the tag and request indexation.

2. Blocked by Page Removal Tool

Google provides a URL Removal Tool that is designed to be a quick removal option if you want to remove sensitive content from a page, remove pages that were hit by a hacker, or otherwise include something you don’t want to be indexed.

Blocked by Page Removal

You submit the URL for removal; Google purges it from their index and removes it from search results… temporarily. The removal request expires after 90 days. Unfortunately, this tool is often misused. You can read more about it here.

3. Blocked by Robots.txt

Robots.txt tells Google what pages it should ignore. This works as long as Google doesn’t have external links telling them to visit the page.

Blocked by Robots.txt

It’s pretty easy to verify that everything in your robots.txt file should be there, at least, and remove anything you want to be indexed.

4. Blocked by 401

The 401 error is “unauthorized access.” If Google tries to access a page that requires a login, for example, they’ll get an unauthorized access error.

Blocked by 401

This is usually fine, which is why it’s an excluded page; you’re usually blocking access to pages users need to log in to see or pay to access or that you just don’t want the public to see at all.

5. Discovered or Crawled but Not Indexed

These are two different statuses that basically just mean Google got partway through the process but stopped for one reason or another.

Discovered or Crawled

Usually, they’ll re-check the URL and index it eventually, so you can leave these alone unless your pages persist in this state for weeks or months.

6. Canonicalization

When duplicate content appears, Google will look for a canonical URL in your page’s metadata.

Canonicalization

You’ll end up with four possibilities.

  • Alternate page with proper canonical URL. This URL in the report has a canonical URL pointing to a different URL, and that different URL is the one that is indexed. This is fine, and how it’s supposed to work.
  • Duplicate without user-selected canonical URL. This means you have duplicate pages and don’t mark the right canonical version. You’ll want to go through and specify canonical URLs.
  • Duplicate, Google chose a different canonical URL than you did. If you have two identical pages, A and B, and you put canonicalization on B, saying that A is canon, but Google thinks that B should be canon, the URL will end up here. Usually, you want to swap the canonicalization unless you have a good reason not to (like Google somehow choosing the HTTP instead of HTTPS version for canonization).
  • Duplicate, submitted URL not canononical. This happens when you submit a URL as canon, but Google already has a different one marked as canon, and they think they know better than you, so they put the one you submitted here. Not usually an issue, really, just more canonicalization to sort out.

Canonicalization can be pretty tricky to manage, and sometimes Google gets stuck on choosing the wrong pages. This is why it’s all marked as non-errors because they’ll eventually fix it, but it can take some wrangling.

7. 404 Pages

If someone links to a page that doesn’t exist, that URL will show up here as a 404 page.

404 Pages

Unless it’s a URL you thought existed and should exist, you can mostly ignore this. It’s working as intended.

8. Pages with Redirects

If a page has a redirect on it, that page shouldn’t be indexed since users can’t land on it anyway. Again, working as intended.

Pages With a Redirect

One thing that may be concerning is that you might see some very important pages here, and you’ll freak out wondering why they aren’t indexed. For example, your homepage. The key is in the details, though. See, did you know that:

  • https://www.example.com
  • http://www.example.com
  • https://example.com
  • http://example.com

…are all the same page? This is one page, your homepage, with four URLs assigned to it. To Google, that means four duplicate pages. This isn’t a duplicate content issue (Google is smart enough to know what’s going on here), so basically, they pick the one you want to be canon and index it while putting the other three in exclusions.

The canon version should usually be the HTTPS version, and it’s up to you if you want the www or not. The other versions should be excluded to avoid duplicates in the index. Since you should have a .htaccess rule redirecting users to the right one, they’re categorized as redirects.

What Should You Do Next?

The biggest thing to be concerned about with the index coverage report is actual errors preventing indexation of important pages. Once you’ve fixed those, you can address any warnings. These should all be relatively simple issues to fix and are usually just the result of minor misconfiguration.

What Should I Do Next

From there, you may want to export a CSV of all of your excluded URLs and figure out if any of them need attention. 90% of them won’t, but now and then, you might find a few pages that have fallen through the cracks. In those cases, it’s important to sort it out.

Do you have a specific issue you can’t seem to solve? If so, let us know in the comments, and we’ll be happy to see if we can give you some tips.

David Curtis
David Curtis
David Curtis is the founder and CEO of Blue Pig Media. With twenty years of successful execution in sales, marketing and operations, for both clients and vendors, he has a bottom line ROI driven mentality rooted in metrics driven performance across highly competitive global corporate initiatives.

Leave a Reply

Your email address will not be published.