Part of the SiteMap Series
You have built your sitemap. Maybe a plugin generates it on the fly. Maybe you wrote the XML by hand. Either way, the file now exists somewhere on your server, and that is where most people stop thinking about it. The build is done, the audit is done; it’s time to move on.
But existing on the server is not the same as being discoverable. A sitemap that crawlers cannot find is no better than no sitemap at all, and it might be worse, because you assume the work is done when it actually is not. Most sites get the file built correctly and then stumble on hosting details, robots.txt declarations, or server quirks that quietly prevent search engines and AI crawlers from reading the file.
This lesson covers where your sitemap should live, how to declare it properly so crawlers find it, the HTTPS and protocol consistency rules that catch real sites out, the hosting quirks worth knowing about on different platforms, the verification steps that confirm everything is working, and the common discoverability mistakes that show up repeatedly across audits.
The standard sitemap location
Sitemaps live at the root of your domain by convention. The default path is /sitemap.xml, and search engines and AI crawlers look here first when they have no other information about where the sitemap might be.
“Domain root” means the top level of your site’s URL structure. If your site lives at https://example.com/, the sitemap should be accessible at https://example.com/sitemap.xml. Not in a subfolder, not on a CDN domain, not behind any routing logic that requires authentication. Just sitting at the root.
If you are using an SEO plugin, the path will often differ from the convention. Yoast and Rank Math generate the sitemap at /sitemap_index.xml. WordPress 5.5 and later use /wp-sitemap.xml when no SEO plugin is active. All in One SEO uses /sitemap.xml. These non-default paths are fine as long as the URL is reachable and your robots.txt declares the location. The convention is just where crawlers look first when no other path has been specified.
What you cannot do is host the sitemap on a different domain or subdomain. A sitemap at sitemap.example.com is on a separate origin in protocol terms, and most crawlers will not trust it for URLs at example.com. One sitemap per origin, hosted at that origin. If your site spans multiple domains, each domain needs its own sitemap declared in its own robots.txt.
Declaring your sitemap in robots.txt
The robots.txt file is the standard way to declare your sitemap’s location. Crawlers fetch robots.txt before they crawl anything else, see the Sitemap directive, and use it to find your sitemap regardless of where it lives.
The basic declaration looks like this:
Sitemap: https://example.com/sitemap.xml
A few rules matter. Use the absolute URL including the protocol (https://). Relative paths are not allowed in the Sitemap directive even though they work elsewhere in robots.txt. One directive per sitemap, one sitemap per line. The directive can appear anywhere in the robots.txt file; the order does not affect anything.
For sites with multiple sitemaps, you have two choices. List each one separately:
Sitemap: https://example.com/sitemap-pages.xml
Sitemap: https://example.com/sitemap-blog.xml
Sitemap: https://example.com/sitemap-products.xml
Or, preferably, point to a sitemap index file that references all the individual sitemaps:
Sitemap: https://example.com/sitemap_index.xml
The second approach is cleaner because adding or removing child sitemaps later does not require updating robots.txt. The index file handles the references, and robots.txt just points to the index.
Technically, crawlers will eventually find your sitemap at the conventional /sitemap.xml location even without the declaration. The declaration is still worth including. It catches sites where the sitemap lives at a non-standard location; it speeds up discovery, and it is part of the sitemaps.org protocol covered back in Module One, Lesson 6: Sitemaps.org Protocol Explained. Two lines of robots.txt to save crawlers a guess is a fair trade.
HTTPS and protocol consistency
The sitemap should be served over HTTPS, and the URLs inside it should match. Protocol mismatches cause more problems than people realise, and they happen most often during migrations from HTTP to HTTPS that were never quite finished.
Three rules apply. The sitemap itself must be served over HTTPS. The URLs listed inside the sitemap must use HTTPS, assuming your site is HTTPS-only. The robots.txt declaration must point to the HTTPS sitemap URL.
If your site still has both HTTP and HTTPS versions because of an incomplete migration, make sure the HTTPS version has the canonical sitemap, HTTP requests for the sitemap redirect to HTTPS, and robots.txt on both versions declares the HTTPS sitemap URL. Crawlers will work it out from these signals, but every signal that points to HTTPS makes the canonical version clearer.
Mixed protocol within the sitemap (some URLs HTTP, some HTTPS) is a sign of a deeper site configuration problem. Fix the site canonicalisation first. The sitemap will follow naturally once the underlying URLs are clean.
Hosting quirks worth knowing about
Most hosts serve XML files without any special configuration. Some do not, and knowing which category your host falls into saves debugging time later.
- Managed WordPress hosting (Kinsta, WP Engine, and SiteGround managed plans) usually works out of the box. The sitemap plugin handles routing; no server configuration is needed. The one thing to check is caching: if you have a caching plugin or a service like WP Rocket or Cloudflare APO, make sure the sitemap is not cached too aggressively. Sitemaps need to be updated when content updates, and a stale cache breaks that.
- Static site hosts like Netlify, Vercel, and Cloudflare Pages all serve
.xmlfiles as long as the file is in your build output. The catch is that the file needs to actually be in your build. If you are using a generator (next-sitemap for Next.js, gatsby-plugin-sitemap for Gatsby, or similar tools elsewhere), check that the build process runs successfully and the resulting sitemap ends up in the deployed output directory. A silently failing build is the most common reason a static site’s sitemap goes missing. - Shared hosting on traditional cPanel-style setups (SiteGround basic plans, Bluehost, Hostinger) is usually fine for sitemap hosting. Upload the file to the right directory (usually
public_htmlorwww), and the server will serve it. The most common problem on shared hosts is forgetting which folder the actual site root is, especially when subdomains or addon domains are involved. - CDN considerations apply if you use Cloudflare, Fastly, or similar. The sitemap will be cached at edge servers. Set a reasonable cache TTL (one hour to one day is sensible). Confirm the CDN does not strip the Content-Type header. Verify the sitemap is actually being served, not cached as a 404 from an initial failed fetch.
- Headless CMS setups generate the sitemap at build time. The whole pipeline (content change triggers rebuild, rebuild regenerates sitemap, new sitemap deploys to host) needs to work end to end. The failure mode is usually a build step that succeeds-but-quietly-fails, where the build completes but the sitemap is stale because the generation step had an error nobody noticed.
Verifying your sitemap is actually accessible
Before submitting your sitemap to search engines in the next lessons, verify it is reachable and the file is what you think it is. Three quick checks catch most problems.
1. Browser Check
The browser check is the simplest. Open the sitemap URL in a new tab. You should see the raw XML or a styled rendering of it. The status code in the address bar should be 200, not an error page. The content should be the sitemap you expect, with the URLs you want crawlers to know about.
2. Header Check
The headers check confirms the server is identifying the file correctly. Open your browser’s developer tools, switch to the Network tab, and refresh the sitemap URL. Look at the response headers. The status should be 200. The Content-Type should be application/xml or text/xml. If the Content-Type is text/html or text/plain, your server is treating the .xml file as the wrong format, and crawlers will fail to parse it even though browsers render it.
3. Incognito Check
The incognito check confirms there is no authentication or session-based access getting in the way. Open the sitemap URL in a private browsing window. If it asks for a login or returns an error, something on the site is gating access to the sitemap. Crawlers behave like incognito users with no cookies, so whatever they see is the same as what you see in incognito.
Common discoverability mistakes
The same handful of mistakes appear repeatedly across audits. They are worth checking explicitly even if everything looks fine on the surface, because most of them produce broken sitemaps that still appear to work from inside your normal browser session.
1. Sitemap blocked by robots.txt
The sitemap file itself can be blocked from crawlers by an over-aggressive robots.txt rule. This happens most often when a developer adds Disallow: /sitemap.xml thinking it would prevent the sitemap from being indexed (it does not; sitemaps were never going to be indexed as content), or when a broader rule like Disallow: / from a staging configuration accidentally leaks into production. Always check your robots.txt against your sitemap URL to confirm nothing blocks access.
2. Wrong domain in sitemap URLs
The sitemap lives at https://example.com/sitemap.xml but lists URLs at https://www.example.com/, or the other way around. Or staging URLs end up in a production sitemap because a database was cloned without updating the URL references. Or HTTP URLs appear in the sitemap when the site is fully on HTTPS. The URLs inside the sitemap must match the canonical version of your site exactly, including the protocol and the www-or-not-www choice.
3. Sitemap behind authentication
The sitemap is gated behind basic auth, a login wall, or some other access control. Common in staging environments. Crawlers see the auth prompt instead of the sitemap and report “Couldn’t fetch” in Search Console. Test the sitemap in an incognito browser window to catch this. If it requires a login when no cookies are present, every crawler that visits will hit the same wall.
4. Server returning the wrong content type
The server treats .xml files as text/html or text/plain instead of application/xml. Browsers render the file anyway because they are forgiving. Crawlers are not forgiving and refuse to parse it as a sitemap. Check the response headers using your browser’s developer tools or curl -I from the command line. If the Content-Type is wrong, the fix is usually in your server config or a CDN rule.
5. Sitemap not regenerating
The sitemap exists, but it contains old data and never updates. The cause is usually a caching layer holding a stale version, a broken build process for static sites, or a plugin that is silently failing on regeneration. Compare the lastmod date on the sitemap file itself (visible in the server response headers or via curl) with the dates inside the file. If they are months apart, something in the regeneration pipeline is broken even though the file appears to exist normally.
Where this leaves us
You now have a sitemap that is built, hosted at the right location, declared in robots.txt, served correctly over HTTPS, and verified accessible to crawlers. The file is sitting where it needs to sit, identified by the headers it needs to send, and reachable from anywhere on the open web.
What is still missing is telling search engines about it directly. Crawlers will eventually find your sitemap through the robots.txt declaration, but direct submission to Search Console and Webmaster Tools speeds up discovery, surfaces errors in a structured way, and lets you monitor how each search engine processes the sitemap over time. The next two lessons cover submission to Google Search Console and to Bing Webmaster Tools (which also feeds many of the AI Search era crawlers).
Up next: Submitting Your Sitemap to Google Search Console →
This is Module 2: Lesson 6 of The Sitemap Series, a Technical SEO series on sitemaps from first principles, built for the AI Search era.