Field notes

Fix Shopify Indexation Issues: The 2026 Triage Guide

August 27, 2025

The average Shopify store we audit in 2026 has roughly 4,200 URLs submitted in the sitemap and only 1,100 of them actually indexed in Google. That is a 74 percent gap between what the store is asking Google to rank and what Google is willing to rank. The gap is almost never a content quality problem. It is an indexation hygiene problem, and on Shopify it comes from the same five or six places every single time.

This guide walks through exactly where the bloat comes from, how to find it inside Google Search Console, and how to fix each cause without breaking anything that is currently ranking. No fluff, no theory. If you run a Shopify store and your Coverage report looks noisy, this is the triage sequence.

TL;DR

Shopify auto-generates dozens of URL variants per product and collection, and most of them should not be in Google's index.
The five traps that cause 90 percent of Shopify indexation waste are filter URLs, tag pages, search pages, cart and account pages, and duplicate product routes.
You cannot edit robots.txt the way you can on WordPress, but you can override it in 2026 using robots.txt.liquid.
Fix canonicals and noindex rules in theme templates, then use GSC's URL Inspection tool to validate one URL from each template before assuming a sitewide fix worked.

What Shopify indexes by default

Before you fix anything, understand what Shopify is actually telling Google to do. Out of the box, a Shopify store generates URLs from several sources:

Products. Every product has a canonical URL at /products/{handle}. But the same product is also reachable through /collections/{collection-handle}/products/{handle} for every collection it belongs to. A single product in five collections is six URLs from Google's point of view. Shopify handles this correctly by default: the <link rel="canonical"> on every collection-prefixed product URL points back to the /products/{handle} version. This is the one thing Shopify gets right without configuration.

Collections. Every collection has a URL at /collections/{handle}. So far so good. But collections also generate filter URLs (?filter.v.option.color=red), sort URLs (?sort_by=price-ascending), pagination URLs (?page=2), and tag URLs (/collections/{handle}/{tag}). Each of these can be crawled and, in some cases, indexed.

Pages and blog posts. These are straightforward. /pages/{handle} and /blogs/{handle}/{post-handle} are canonical by default.

The noise. Shopify also generates URLs for cart (/cart), checkout (/checkouts/...), customer account (/account, /account/login, /account/register), search (/search?q=...), and the catch-all /collections/all. Some of these are blocked in the default robots.txt. Some are not. Some have noindex meta tags, some do not. This inconsistency is where most of the trouble starts.

The mental model to hold: Shopify generates URLs aggressively, canonicalizes some of them correctly, noindexes some of them correctly, and leaves the rest as a mess for you to clean up. Your job is to find the mess.

The five common indexation traps

Call this "The 5-trap triage." Every Shopify store we audit has at least three of these live in production. Fix them in this order because the gains compound.

Trap 1: Duplicate collection filter URLs

This is the single biggest cause of Shopify indexation bloat. When a shopper clicks "Color: Red" on a collection page, Shopify appends ?filter.v.option.color=red to the URL. If the theme renders anchor tags for these filters (most do), Googlebot will crawl every combination: color, size, price range, availability, plus every combination of combinations.

A collection with 4 filter categories and 5 options per category generates roughly 3,000 crawlable URLs. Multiply by 20 collections and you have 60,000 URLs from a catalog of 500 products.

Default Shopify behavior. These URLs have a canonical tag pointing to the unfiltered collection page. In theory Google should consolidate them. In practice, Google crawls them all, wastes your crawl budget, sometimes indexes them anyway, and flags "Alternate page with proper canonical tag" in Coverage. Every URL Google crawls is crawl budget spent not indexing your real pages.

The fix. Add a noindex meta tag to any collection URL with filter parameters, and block the ?filter.* parameter in robots.txt.liquid. We cover the exact code in the robots.txt section below.

Trap 2: Tag pages

Shopify generates a URL for every product tag you use. If you tag a product "summer-sale," Shopify creates /collections/{handle}/summer-sale as a filtered collection view. Tag pages are thin, usually duplicate content, and they multiply fast. A store with 50 tags and 20 collections has up to 1,000 tag pages, most of them nearly empty or duplicating the parent collection.

Default Shopify behavior. Tag pages are fully indexable. No canonical override, no noindex. Shopify treats them as real pages.

The fix. Add a conditional noindex in theme.liquid for any collection page that has a tag active. The Liquid variable current_tags will be truthy on tag-filtered collection URLs. Tag pages almost never deserve to rank, and the ones that might deserve to rank (like a well-curated "best-sellers" view) should be built as proper collections instead.

Trap 3: Internal search result pages

Shopify's search URL is /search?q={query}. Every search your visitors run is a crawlable URL. If you have any internal links that point to search URLs (some themes do this with "popular searches" widgets), Google will follow them.

Default Shopify behavior. Search pages are not blocked in the default robots.txt. Shopify does render a <meta name="robots" content="noindex"> on the search template in most themes, but not all, and not consistently. Third-party search apps (like Searchanise or Boost) often override the default template and lose the noindex in the process.

The fix. Verify the noindex is on your search template. Open templates/search.liquid (or the search section file) and confirm. If you use a search app, check the app's generated template too. Then disallow /search in robots.txt.liquid for good measure. Both layers.

Trap 4: Cart, checkout, account pages

These should never be indexed. They have no SEO value, they contain user-specific content, and they pollute your site's quality signals if Google sees a bunch of thin, transactional-UX pages.

Default Shopify behavior. The default robots.txt blocks /cart, /orders, /checkout, /checkouts/, /carts, /account, and a few others. This is correct. The problem is that some themes add internal links to these pages with crawlable anchors (for example, a "View Cart" link in the header that Googlebot can follow), and some apps inject new account-related URLs that are not in the default blocklist.

The fix. Run a crawl of your site (Screaming Frog, Sitebulb, or Ahrefs Site Audit) and filter for URLs containing cart, account, checkout. Any you find that are not in the default blocklist, add them to robots.txt.liquid.

Trap 5: Duplicate product URLs through collection prefixes

We mentioned this above: the same product is reachable through /products/{handle} and /collections/{collection-handle}/products/{handle}. Shopify canonicalizes correctly by default, but some stores break this by accident.

How it breaks. Custom theme modifications sometimes overwrite the canonical tag with the current URL. A developer removes {{ canonical_url }} from theme.liquid and replaces it with {{ request.path }} or similar. Now every collection-prefixed product URL is self-canonical, and you have the same product indexed four or five times under slightly different URLs.

The fix. In theme.liquid, confirm the canonical tag uses Shopify's built-in canonical_url object:

<link rel="canonical" href="{{ canonical_url }}">

That object returns /products/{handle} even when the current URL is /collections/{x}/products/{handle}. Do not replace it with anything custom.

Canonical rules on Shopify

Canonicals on Shopify follow predictable rules once you understand them.

Rule 1: The canonical_url object is your friend. Use it. Do not overwrite it. On product pages it returns the bare product URL. On collection pages with filters it returns the unfiltered collection URL. On paginated collections it returns the current page (which is correct behavior, since paginated collections are distinct).

Rule 2: Filters canonicalize up. ?filter.v.option.color=red canonicalizes to the base collection URL. ?sort_by=price-ascending also canonicalizes up. Shopify handles this automatically.

Rule 3: Pagination is self-canonical. /collections/shoes?page=2 canonicalizes to itself, not to page 1. This is correct. If you canonicalize paginated pages to page 1, Google will drop page 2 onwards from the index, and if you have long collections, products on deeper pages will lose internal link equity.

Rule 4: Tag pages do not canonicalize up. This is the big one. A tag URL like /collections/shoes/summer self-canonicalizes. It does not point back to /collections/shoes. You have to fix this manually with noindex (from Trap 2) or with a custom canonical override in your collection template.

Rule 5: Custom theme edits break everything. If a developer has modified theme.liquid in the last two years, assume the canonical tag is broken and verify manually.

robots.txt on Shopify (what you can and can't change)

Before November 2021, Shopify's robots.txt was a locked file. You got what Shopify gave you and that was that. Since then, you can override it by creating a template called robots.txt.liquid in your theme's templates directory.

What you can do.

Add new Disallow rules for URL patterns Shopify does not block by default.
Add User-agent rules for specific bots (block AhrefsBot, allow Googlebot, etc.).
Add a Sitemap directive (though Shopify does this automatically).

What you cannot do.

Remove Shopify's default blocks. If you try to Allow /checkout, Shopify's default rules take precedence for core routes. You can add but not subtract.
Block pages that Shopify considers essential for the storefront (product pages, collection pages).

A practical robots.txt.liquid template for a well-run Shopify store in 2026 looks like this:

{% for group in robots.default_groups %}
{{- group.user_agent }}

{%- for rule in group.rules %}
{{ rule }}
{%- endfor %}

{%- if group.user_agent.value == '*' %}
Disallow: /*?filter*
Disallow: /*?sort_by*
Disallow: /*?q=*
Disallow: /search
Disallow: /*/tag/*
{%- endif %}

{%- if group.sitemap != blank %}
{{ group.sitemap }}
{%- endif %}
{% endfor %}

This adds five disallow rules on top of Shopify's defaults: filter parameters, sort parameters, search queries, the search endpoint, and tag pages. If you run programmatic landing pages or rely on any of these parameters for legitimate indexable content, adjust accordingly.

Before pushing this live, verify it in GSC's robots.txt tester against a handful of URLs you do want indexed. It is easy to accidentally block a legitimate parameter you forgot about (for example, ?variant= on product URLs).

Using Google Search Console Coverage

Google Search Console is where you find out what Google is actually doing with your URLs. The "Pages" report (formerly Coverage) is where the signal lives.

Open GSC, go to Indexing, then Pages. You will see two buckets: indexed pages and not indexed pages. Most of your attention should be on the not-indexed bucket.

The categories that matter.

"Crawled - currently not indexed." Google saw the page, decided it was not worth indexing. This is usually thin content, duplicate content, or low-quality signals. On a Shopify store, this bucket is where tag pages, thin collection pages, and low-value blog archives end up. Triage by exporting the URLs and checking for patterns. If 80 percent of the URLs contain /collections/ with a tag, you have a tag page problem.

"Discovered - currently not indexed." Google knows about the URL but has not crawled it yet. Usually a crawl budget issue. If this number is large and growing, your site is asking Google to crawl more than it wants to. The fix is reducing URL count (the traps above), not pushing Google to crawl harder.

"Page with redirect." URLs that 301 somewhere else. Normal. Review for patterns (old product URLs redirecting to new ones should be clean). Abnormal if you see redirect chains (URL A redirects to URL B which redirects to URL C). Fix chains manually by pointing A directly to C.

"Duplicate without user-selected canonical." Google found two or more versions of the same page and does not know which to use. On Shopify, this often means your canonical tag is broken. Go to URL Inspection, test one of the URLs, and see what Google thinks the canonical is. If it is different from what your HTML says, debug the theme.

"Duplicate, Google chose different canonical than user." Same as above but Google made a choice. Sometimes correct, sometimes not. If Google is consolidating a product variant URL to the main product (correct), leave it. If Google is consolidating a real page to a collection page (wrong), fix the canonical in the theme.

"Alternate page with proper canonical tag." This is usually fine. These are pages you correctly canonicalized somewhere else. Shopify generates a lot of these from collection-prefixed product URLs.

"Blocked by robots.txt." Verify these are URLs you intended to block. If not, fix your robots.txt.liquid.

"Excluded by noindex tag." Same check. These should be cart, account, search, filters, tag pages. If you see a collection you care about, something is wrong with your theme.

Weekly triage rhythm. Open Pages report. Sort by impression count. Investigate any URL with impressions that is in a "not indexed" bucket. Those are pages Google thinks people want but Google is refusing to serve. That is where your fastest wins are.

When to use noindex vs canonical vs disallow

These three controls do different things and solve different problems. Using the wrong one is a common mistake.

Use canonical when. Two URLs show substantially the same content and you want Google to pick one. Canonicals are a hint to Google, not a directive. Google can ignore them if it thinks you are wrong. Canonicals also pass most link equity from the non-canonical URL to the canonical one, which is why they are better than noindex for duplicate-but-valuable URLs.

Use noindex when. A URL should not be in Google's index, but you still want visitors to reach it and internal links from it to flow. Noindex is a directive, not a hint. Google will respect it. Noindexed pages still pass link equity for a while, then eventually Google stops crawling them much. Use for: tag pages, filter pages, search result pages, login pages, thin blog archives.

Use robots.txt disallow when. You want to stop Google from crawling a URL at all. This is different from noindex. Disallowed URLs can still appear in the index (as URL-only entries with no description) if they have external links pointing to them. Disallow also prevents Google from seeing your noindex tag on those pages, so disallow and noindex together on the same URL is a logical conflict.

The right sequence for filter URLs. Add noindex first in the template. Let Google crawl and see the noindex. Once those URLs drop out of the index (two to six weeks), add the disallow to robots.txt to save crawl budget going forward. If you disallow first, Google cannot read the noindex, and any URLs already indexed will stay indexed longer.

Problem	Default Shopify behavior	Fix
Filter URLs (`?filter.v.option.color=red`)	Canonicalize to parent collection, but crawled and sometimes indexed	Add `noindex` in collection template for any URL with filter params, then disallow `/?filter` in robots.txt.liquid
Tag pages (`/collections/shoes/summer`)	Fully indexable, no canonical override	Add conditional `noindex` in theme.liquid when `current_tags` is truthy
Internal search (`/search?q=...`)	Noindexed in most themes, not blocked in default robots.txt	Verify noindex on search template, disallow `/search` in robots.txt.liquid
Cart, account, checkout	Blocked in default robots.txt	Verify. Add any app-injected account URLs not covered by defaults
Duplicate product URLs via collection prefix	Canonicalized to `/products/{handle}` by default	Confirm `{{ canonical_url }}` is used in theme.liquid and not overwritten
Sort URLs (`?sort_by=price-ascending`)	Canonicalize up	Add disallow to robots.txt.liquid for `?sort_by*` to save crawl budget
Pagination (`?page=2`)	Self-canonical (correct)	Leave alone. Do not canonicalize to page 1
Product variant URLs (`?variant=12345`)	Canonicalize to product page	Leave alone. Shopify handles correctly

What to do this week

Open GSC Pages report, export the full not-indexed list, and group URLs by pattern (filter, tag, search, account, other).
Run one collection URL and one product URL through URL Inspection. Confirm Google's reported canonical matches your intended canonical.
Open theme.liquid in your theme editor. Verify the canonical tag uses {{ canonical_url }} with no modifications.
Create or update templates/robots.txt.liquid with filter, sort, and search disallows. Test in GSC's robots.txt tester before publishing.
Add a conditional noindex for tag pages in your collection template. One line of Liquid, high impact.
Re-submit the sitemap from GSC after changes and monitor Pages report weekly for the next month.

FAQ

Q: How long does it take for Google to drop noindexed URLs from the index?

A: Two to six weeks typically. Google has to re-crawl each URL, see the new noindex tag, and then re-process. You can speed it up for a specific URL by submitting it through URL Inspection and requesting indexing (which, for noindexed pages, triggers a re-crawl and removal). Do not batch-request; it does nothing for more than a few URLs.

Q: Should I use the disavow tool for bad backlinks pointing to indexation-trap URLs?

A: No. The disavow tool is for manual penalties and only in rare cases. Backlinks pointing to filter URLs or tag pages are mostly harmless. Fix the canonical or noindex on the destination URL and the problem solves itself.

Q: My Shopify store has 10,000 product URLs in the sitemap but only 2,000 indexed. Is this normal?

A: It depends on product quality. If the 8,000 missing URLs are variant products, discontinued items, or products with thin descriptions copied from the manufacturer, yes, that gap is expected. Google is increasingly reluctant to index thin product pages. The fix is either better product content (original copy, unique imagery, reviews) or accepting that not every SKU needs to rank.

Q: Can I just delete tag pages instead of noindexing them?

A: Not really. Tag pages are generated by Shopify automatically from product tags. If you remove a tag from a product, that tag URL returns a 404 (or 301 depending on your setup). You can selectively remove tags you do not want, but most stores want some tags for merchandising purposes. Noindexing the resulting URLs is the cleaner solution.

Q: Does fixing indexation actually improve rankings?

A: Indirectly. Cleaning up thousands of thin URLs sends better quality signals to Google about your site overall, and reclaimed crawl budget lets Google re-crawl your real pages more often. We typically see a 15 to 30 percent lift in indexed page impressions within 60 days after a thorough indexation cleanup on Shopify stores over 1,000 products. Smaller stores see less, because smaller stores rarely have crawl budget problems to begin with.

Indexation work is not glamorous, but it compounds. Every noisy URL you remove makes the remaining URLs slightly more valuable in Google's eyes. If you want help with a full audit, our Shopify SEO service and Shopify development team run this triage on every engagement. For adjacent reading, see our Shopify speed optimization playbook, the full D2C ecommerce SEO guide for 2026, and our content SEO strategy for Shopify.

One-page resource

Get the Vendor Recovery Checklist.

The 12 steps every displaced maker should take in the next 30 days. Delivered in your inbox.