Noindex, Robots.txt, and Canonicals: The Hidden Indexing Blockers
May 23, 2026 · 5 min read
The short answer
Three technical signals keep pages out of Google: a meta noindex (or X-Robots-Tag) blocks indexing outright, a robots.txt disallow blocks crawling (the URL can still be indexed with no content), and a rel=canonical pointing elsewhere tells Google to index the other URL instead. Find each one, remove the wrong one, then resubmit the page with URL Indexer.
If a page will not get indexed, the cause is often a single line of code you forgot was there: a noindex tag, a robots.txt disallow, or a canonical pointing at a different URL. These three blockers are silent, they ship with most CMS templates, and they override any indexing request you send. Before you spend hours guessing, check them first, fix the wrong one, then resubmit the corrected page with URL Indexer's free indexing tool. This guide explains exactly what each blocker does, because they do very different things and people mix them up constantly.
Here is the key distinction up front. Crawling, rendering, and indexing are separate steps. Crawling is Google fetching the page. Indexing is Google storing it so it can show in results. A noindex tag stops indexing. A robots.txt disallow stops crawling. A canonical redirects indexing credit to another URL. Confuse them and you will apply the wrong fix.
What does the noindex tag actually do?
A noindex tag tells Google to keep a page out of its index even after crawling it, and Google obeys it reliably. It appears in one of two places: a meta robots tag in the page head, written as <meta name="robots" content="noindex">, or an X-Robots-Tag in the HTTP response header, which does the same thing for non-HTML files like PDFs. This is the most absolute blocker of the three. If Google can crawl the page and sees noindex, the page will not be indexed, full stop.
Accidental noindex is the single most common reason a healthy page never appears in search. WordPress has a "Discourage search engines" checkbox that injects site-wide noindex, often left on after a launch. SEO plugins, staging environments copied to production, and theme defaults all add it too. The fix is to remove the tag, then trigger a recrawl so Google sees the change.
Does robots.txt stop a page from being indexed?
No, robots.txt blocks crawling, not indexing, and that surprises people. A Disallow rule in robots.txt tells Google not to fetch the page content. But Google can still index the URL itself if other pages link to it, showing it in results with no description and a note that says the content cannot be shown. So robots.txt is the wrong tool for keeping a page out of the index. It is the right tool for saving crawl budget on low-value sections like faceted filters or internal search results.
The reverse problem is more damaging. An overly broad Disallow rule can block crawling of pages you do want indexed. A single line like Disallow: / blocks your entire site. Disallow: /blog/ blocks every post under that path. New pages then never get crawled, so they never get indexed, and the report in Search Console will say the page is blocked by robots.txt.
| Signal | Blocks crawling? | Blocks indexing? | Use it to |
|---|---|---|---|
| Meta noindex / X-Robots-Tag noindex | No | Yes | Keep a crawlable page out of results |
| robots.txt Disallow | Yes | No (URL can still index) | Save crawl budget on low-value paths |
| rel=canonical to another URL | No | Redirects indexing to the canonical | Consolidate duplicate or near-duplicate pages |
How do canonical tags affect indexing?
A rel=canonical tag tells Google which URL is the preferred version of a page, and Google usually indexes that canonical instead of the page it found. It looks like <link rel="canonical" href="https://example.com/preferred"> in the page head. When the canonical points to the same page, it is harmless and correct. The problem is when a page canonicalizes to a different URL by mistake, because you are then telling Google to index the other page and ignore this one.
This happens more than you would expect. A migrated site can carry over canonicals pointing at the old domain. A template can hardcode the homepage as the canonical for every page. Pagination or parameter pages can all point at page one. The result shows up in Search Console as "Alternate page with proper canonical tag" or "Duplicate, Google chose different canonical than user." The first means your tag is working as written, which may or may not be what you intended. The second means Google overrode your tag, usually because it judged the pages too similar.
How do you find which blocker is hitting a page?
Start with the page source and Google Search Console, in that order. The fastest manual check is to view the page source and search for the three signals. Then confirm with Search Console, which tells you what Google actually decided rather than what your code says.
- 1Open the live page, view source, and search for "noindex" in the head. Also check the HTTP response headers for X-Robots-Tag (browser dev tools, Network tab).
- 2Search the source for "canonical" and read the href. Confirm it points to the URL you actually want indexed.
- 3Visit yoursite.com/robots.txt and look for any Disallow rule whose path matches the page.
- 4In Search Console, run the page through the URL Inspection tool. It reports crawl status, the user-declared and Google-selected canonical, and whether indexing is allowed.
- 5Cross-reference with the page indexing report in Search Console to see the exact reason Google assigned to the URL.
If you are working through a broader case where a page simply will not index, the full diagnostic checklist in why is my page not indexed walks every cause in order, not just these three technical ones.
What do you fix, and then what?
Fix the one signal that is wrong, leave the others alone, then prompt Google to recrawl. Match the fix to the symptom so you do not introduce a new problem.
- Accidental noindex: remove the meta tag or X-Robots-Tag header. In WordPress, uncheck "Discourage search engines" under Settings, Reading.
- Over-broad robots.txt block: remove or narrow the Disallow rule so the page can be crawled. Keep blocking truly low-value paths.
- Wrong canonical: change the href to point at the page itself (self-referencing) or to the genuinely preferred URL.
- Page you want gone: keep it crawlable and add noindex, do not block it in robots.txt, or it may stay indexed.
After fixing, you have to tell Google to look again, because Google will not necessarily recrawl a corrected page on its own for days or weeks. Submit the URL with URL Indexer to send a fresh indexing-request signal, then watch the per-batch status page to confirm Google revisits and the page moves to indexed. Google still makes the final call, and confirmed indexing can take days to a couple of weeks, but resubmitting after a fix is what shortens the wait. If you want to verify the current state first, here is how to check if a URL is indexed.
Frequently asked questions
Does robots.txt prevent a page from being indexed?
No. Robots.txt blocks Google from crawling a page's content, but the URL can still be indexed if other pages link to it, appearing in results with no description. To keep a page out of the index, use a meta noindex tag and leave the page crawlable so Google can see it.
Why is my page noindexed when I never added a noindex tag?
Most accidental noindex tags come from a CMS or plugin, not from you. WordPress has a "Discourage search engines" setting that injects site-wide noindex, and SEO plugins, themes, and staging environments add it too. View the page source, search for "noindex," and remove the source that injected it.
What does "Alternate page with proper canonical tag" mean in Search Console?
It means the page declares a canonical pointing to a different URL, so Google indexed that other URL instead of this one. This is correct behavior if the canonical is intentional. If you wanted this page indexed on its own, change the canonical to point to itself.
If I remove a noindex tag, will Google index the page automatically?
Eventually, but not quickly on its own. Google has to recrawl the page to see the tag is gone, which can take days or weeks. Resubmitting the URL after the fix sends a fresh indexing-request signal so Google revisits sooner, though it still decides whether to index.
Can a canonical tag and a noindex tag conflict?
Yes, and you should avoid combining them on the same page. A canonical asks Google to consolidate signals to the canonical URL, while noindex asks Google to drop the page entirely, which sends mixed instructions. Pick one based on your goal: canonical to consolidate duplicates, noindex to remove a page from results.
Keep reading
Why Isn't My Page Indexed by Google? 9 Common Reasons
Nine common reasons a page is not indexed by Google, how to confirm each one, and the exact fix, from noindex tags to thin content and orphan pages.
Read guide →Indexing basicsHow to Check If a Page Is Indexed by Google
Four reliable ways to check if a page is indexed by Google: the site: operator, an exact-URL search, the URL Inspection tool, and index checkers.
Read guide →Fixing indexing problemsThe Page Indexing Report in Google Search Console, Explained
Indexed vs not indexed counts, what each reason status means, and how to decide which pages to fix first in the Search Console Page Indexing report.
Read guide →