Fix library

How to fix XML sitemap errors

The Brimm team · grounded in the docs

You fix XML sitemap errors by serving one reachable file of at most 50,000 URLs and 50MB uncompressed, listing only canonical pages that return 200, and stamping honest dates. Google's sitemap documentation is blunt about the date field: “Google uses the lastmod value if it's consistently and verifiably accurate.” Identical or faked dates get discounted, and the same doc says outright that priority and changefreq are ignored.

The symptom: the sitemap exists and helps nothing

Search Console shows “Couldn't fetch,” or it fetches fine and the numbers still look wrong: hundreds of URLs submitted, a fraction indexed, new pages sitting in “Discovered, currently not indexed” for weeks. Or the sitemap simply is not there. The CMS was migrated, the URL changed, and the address in robots.txt now returns a 404 that nobody has looked at since launch.

A sitemap is a small thing, and a broken one fails quietly. Google does not need it to find your site, so nothing visibly breaks. What you lose is the fast lane: the one file where you tell crawlers exactly what exists and what changed, so new and updated pages get picked up promptly instead of whenever a crawler happens by.

The cause: the file lies, or nobody can reach it

Sitemap failures come in two families. The first is plumbing. The file 404s because it moved, the robots.txt Sitemap: line points at the old address, the file is blocked, malformed XML, or too big. Google caps a single sitemap at 50,000 URLs and 50MB uncompressed. Past either limit you must split it and list the pieces in a sitemap index file.

The second family is dishonesty, usually accidental. The file lists URLs that redirect, pages carrying a noindex, 404s, and parameter duplicates. Every one of those entries tells a crawler “index this” about a page that says the opposite when fetched, and every contradiction teaches the crawler to trust your file a little less. The most common lie is the lastmod field. Site generators love stamping every URL with the build time, so all 400 pages claim they changed at the same second of the same day. No site works that way, and a date field that is obviously mechanical is a date field a crawler learns to skip.

The fix: one clean file of true statements

Put the sitemap at a stable URL, reference it in robots.txt, and submit it in Search Console. Then make every line in it true:

Only canonical URLs. No redirects, no parameter variants, no pages carrying noindex. If a URL should not be indexed, it does not belong in the file. Canonical hygiene is its own job, covered in our guide to canonical tag errors.
Only URLs that return 200. Crawl your own sitemap and remove anything that answers with a redirect or an error.
Real lastmod values. The date should change when the page meaningfully changes, and only then. Wire it to your content's actual updated-at field, not to the build timestamp.

<!-- A sitemap entry that tells the truth -->
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/blue-widgets</loc>
    <lastmod>2026-07-02</lastmod>
    <!-- no <priority>, no <changefreq>: Google ignores both -->
  </url>
</urlset>

Reference it from robots.txt with a full absolute URL on its own Sitemap: line, and make sure that address is the one that actually serves the file today, not the one from two platforms ago. If your robots.txt has deeper problems than a stale sitemap line, our guide on robots.txt blocking AI crawlers covers the file end to end.

The myth to kill: priority and changefreq do nothing

This one is folklore with a version number. The sitemap spec from 2005 defined <priority> and <changefreq> fields, and a generation of SEO tools has been generating them ever since. Google's current documentation states plainly that it ignores both values. Setting every page to priority 1.0 does not make Google crawl it harder. Declaring changefreq daily does not summon the bot each morning. The fields are dead weight, and any audit or plugin that scores you on tuning them is scoring you on nothing. We say this because our whole model is doc-grounded checks, not inherited rituals. Spend the effort where the doc says it counts: an accurate lastmod. That field is used exactly to the degree it proves honest, and it is the only scheduling signal in the file that earns anything.

Verify: fetch it like a crawler, then watch the report

Fetch the sitemap URL yourself and confirm a 200 with valid XML. Crawl the URLs in it and confirm every one answers 200 and none carries a noindex. Check robots.txt points at the live address. Then submit it in Search Console and watch the Sitemaps report: it will show the fetch status, the URL count it discovered, and any parsing errors, in plain terms.

Then verify it the way the engines will. Re-run an audit on the live URL and confirm the sitemap reads as reachable, well-formed, and honest. You can do this by hand, or paste your link into the full Brimm audit and we will fetch your sitemap the way a crawler does, sample its URLs, and flag dead entries, mechanical lastmod stamps, and a missing robots.txt reference. Honest dates in the sitemap only matter if the pages behind them are honestly current, which is the subject of our guide on stale content, and the rest of the fix library covers the failures beyond that.