Fix library

How to fix duplicate content

The Brimm team · grounded in the docs

Ordinary duplicate content does not trigger a penalty. The real damage is quieter: every duplicate URL splits ranking signals across copies and burns crawl on pages Google has already seen. The fix is to consolidate every variant into 1 canonical URL using permanent redirects and rel=canonical. Google's guide on consolidating duplicate URLs says the point is to let Googlebot “spend time crawling new (or updated) pages on your site, rather than crawling duplicate versions of the same content.”

The symptom: the same page lives at five URLs

You search for your own page and the wrong version shows up. Search Console reports "Duplicate without user-selected canonical" on pages you consider important. A URL with a tracking parameter bolted on is the one Google chose to index, and the clean one is nowhere. Or the ranking simply feels weaker than the content deserves, because links that should all point at one page are pointing at four near-identical copies of it.

None of these are dramatic failures. That is what makes duplication easy to ignore. Every copy loads fine and reads fine. The cost only shows up in aggregate: the crawler spends its visit re-reading pages it already knows, and the signals that would make one strong page get divided among several weak ones.

The cause: your site multiplies URLs on its own

Almost nobody writes the same page twice on purpose. Duplicates come from the plumbing:

www and non-www. If example.com/pricing and www.example.com/pricing both return the page, that is two URLs for one document.
http and https. If the old http version still answers instead of redirecting, every page on the site just doubled.
URL parameters. Tracking tags, session IDs, and sort orders (?utm_source=, ?ref=, ?sort=price) mint a new URL for the same content every time someone shares a link.
Print and alternate views. A /print/ version or a mobile-specific path is a full copy of the page at a second address.
Staging and dev copies. A staging.example.com left open to crawlers duplicates the entire site.

Google resolves this by picking one version itself, a process the documentation calls canonicalization. If you do not tell it which URL you prefer, it guesses. Sometimes it guesses the parameterized mess.

The fix: consolidate every variant into one URL

Pick one canonical form for the whole site, one protocol and one host, and enforce it. The docs rank redirects as the strongest signal you can send: “A strong signal that the target of the redirect should become canonical.” So redirect what you can, and annotate what you cannot.

# 1. Permanent (301) redirects for protocol and host variants
http://example.com/*       301 →  https://www.example.com/*
https://example.com/*      301 →  https://www.example.com/*

<!-- 2. rel="canonical" on pages that must keep variant URLs -->
<!-- (parameters, print views): point every variant at the clean URL -->
<link rel="canonical" href="https://www.example.com/pricing" />

Then make the rest of the site agree with the choice. Internal links should point at the canonical URL, not a variant, because inconsistent internal linking is itself a canonicalization signal in the wrong direction. Your sitemap should list only canonical URLs. And a staging site should not be reachable by crawlers at all. Put it behind authentication. A noindex on staging works until someone copies the template to production with the tag still in it, which we see more often than you would hope.

Do not use robots.txt to hide duplicates. A blocked page cannot be read, so its canonical annotation cannot be seen, and the signals never consolidate. Let Google crawl the variant, read the annotation, and fold it into the preferred URL.

Be honest: there is no duplicate content penalty

The "duplicate content penalty" is SEO folklore, and we will not sell you a fix for a punishment that does not exist. Google's own documentation frames consolidation as housekeeping, not damage control: you consolidate so signals merge into one URL and so the crawler spends its budget on new pages instead of known copies. Your www and non-www variants are not getting your site demoted.

There is a real line, though, and it sits somewhere else. Google's spam policies cover scraped content: republishing other sites' pages without adding value, or "spinning" them through synonyms or automated rewrites, can violate policy and can get a site removed. That is not the same problem as your print page duplicating your article page. One is plumbing. The other is taking someone else's work. If your duplication is internal, consolidate it and move on. If your content strategy is copying, no canonical tag will save it, and the honest fix is written up in our guide on thin content.

Verify: measure the overlap yourself

First check the plumbing by hand. Request the http, non-www, and trailing-slash variants of your homepage and confirm each answers with a single 301 to the canonical. View source on a parameterized URL and confirm the canonical link element points at the clean address. Then check Search Console's page indexing report and watch the "duplicate" buckets shrink over the following weeks.

Then measure the part hand-checks miss: how much your own pages overlap each other. Brimm's corpus check crawls your pages together and measures the real shared text between them, so a boilerplate-heavy template or a near-identical city-page series shows up as a number instead of a hunch. Paste your link into the full Brimm audit and we will map the variants, read the redirects and canonicals the way a crawler does, and print the consolidation gaps in fix order. For the wider context on how engines choose what to keep, start with the fix library or our guide to answer engine optimization.