How to Build a Website Sitemap Manually with XML

Total control, total responsibility. How to write a sitemap by hand for the sites that need it.

Victor Ijomah
By
Victor Ijomah
Victor Ijomah
Technical SEO Specialist
Victor Afamefuna Ijomah is a UK-based Technical SEO Specialist focused on how Google and AI engines like ChatGPT, Perplexity, and AI Overviews decide what gets discovered,...
- Technical SEO Specialist
Highlights
  • The minimum viable sitemap is an XML declaration, a urlset element with the protocol namespace, and at least one url block with a loc element inside.
  • Include lastmod for each URL but skip changefreq and priority because Google and the AI Search era crawlers ignore them both.
  • URLs with special characters like ampersand need entity encoding to keep the XML valid; an unescaped ampersand breaks the whole file.
  • Sitemap index files are only needed when you exceed 50,000 URLs or 50MB, which manual builders practically never do.
  • The maintenance burden is sustainable for small static sites but becomes unsustainable for sites with frequent updates, at which point a generator tool or a plugin is the better path.

Part of the SiteMap Series

You picked the manual route from Lesson 3: Choosing How to Build a Website Sitemap. Maybe your site is static and a plugin would be overkill. Maybe you are on a headless CMS where the standard sitemap plugins do not apply. Maybe you are on a custom-coded site where no plugin route exists at all. Whatever brought you here, you are going to write the sitemap by hand.

Manual XML gives you total control over what goes into the sitemap and how. No plugin defaults to fight against, no template assumptions to override. The trade-off is total responsibility. Every new page, every removed page, every meaningful content update has to be reflected in the file you maintain. The maintenance burden is real, but for the right kind of site, the precision is worth it.

This lesson walks through writing the XML from scratch, adding URLs as your site grows, splitting into multiple files if you ever need to (most manual builders never will), getting the file onto your server, and the maintenance approach that keeps a hand-written sitemap useful over time.

Writing the minimum viable sitemap

The smallest valid sitemap is shorter than you might expect. Three things are required: an XML declaration line, a urlset root element with the protocol namespace, and at least one url block with a loc element inside. Everything else is optional.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
  </url>
</urlset>

The first line is the XML declaration. It tells anything reading the file that this is XML and that the character encoding is UTF-8. Without this line, the file is technically not a valid XML document, and some parsers will refuse to process it.

The second line opens the urlset root element with the sitemaps.org namespace declaration. The namespace URL is the protocol version, and even though the protocol has been stable for nearly twenty years, the namespace is still required. Crawlers use it to confirm the file follows the expected structure.

Inside urlset, you put url blocks. Each url block describes one page on your site. The only mandatory element inside is loc, which holds the full URL of the page, including the protocol (https://). Relative URLs are not allowed.

The closing </urlset> tag ends the file.

Save this as plain text with a .xml extension. The filename does not strictly matter, but /sitemap.xml is the convention and the default location search engines look for when no other path is specified.

Adding more URLs to your sitemap

To list more pages, repeat the url block. A real sitemap has dozens or thousands of url blocks, one for each page you want crawlers to know about.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2026-06-04</lastmod>
  </url>
  <url>
    <loc>https://example.com/about/</loc>
    <lastmod>2026-05-20</lastmod>
  </url>
  <url>
    <loc>https://example.com/services/</loc>
    <lastmod>2026-04-15</lastmod>
  </url>
  <url>
    <loc>https://example.com/contact/</loc>
    <lastmod>2026-03-10</lastmod>
  </url>
</urlset>

Each url block needs exactly one loc element, and lastmod is the one optional element worth including. lastmod tells crawlers when the page was last meaningfully changed. Use the ISO 8601 date format (YYYY-MM-DD) for a simple date or the full timestamp format (YYYY-MM-DDThh:mm:ss±hh:mm) when precision matters.

Skip changefreq and priority. Module One Lesson 9: Does Google Actually Use Priority and Changefreq in XML Sitemaps covered why Google ignores both, and the same logic applies to the AI Search era crawlers. The elements take up space in the file, they signal nothing useful, and they make the file harder to maintain. The protocol allows them, but using them is busywork that produces no benefit.

One thing to watch for is URLs that contain special characters. The ampersand (&) needs to be encoded as &amp;. The same goes for less-than (&lt;), greater-than (&gt;), single quote (&apos;), and double quote (&quot;). Most modern URLs are clean, but query parameters can introduce these characters, and an unescaped ampersand will break the XML and invalidate the whole file.

When to split into multiple sitemap files

The sitemaps.org protocol caps individual sitemap files at 50,000 URLs or 50MB uncompressed, whichever you hit first. For most manual builders, neither limit is relevant. Handwriting 50,000 url blocks is not something anyone actually does. If you find yourself approaching that scale, manual is probably the wrong choice for your site, and Lesson 3: Choosing How to Build a Website Sitemap is worth a second look.

If you do need to split, the structure is a sitemap index file that references the individual sitemaps. The index uses a different root element (sitemapindex instead of urlset) and points to each child sitemap rather than to pages.

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-pages.xml</loc>
    <lastmod>2026-06-04</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-blog.xml</loc>
    <lastmod>2026-06-04</lastmod>
  </sitemap>
</sitemapindex>

Inside the index, each sitemap block points to a regular sitemap file (like the ones you wrote earlier). The sitemap index itself contains no page URLs, just references to other sitemaps. Module One Lesson 7: Anatomy of an XML Sitemap (with Example) walked through this structure in detail, including how crawlers fetch the index first and then follow the references to the actual sitemap files.

File this section as something to come back to if your manual site grows past the point where one file is enough. Most manual builders never reach that point.

How to Host your website’s sitemap file

Writing the XML is half the job. The file has to actually live on your server in a location that browsers and crawlers can access.

Upload the file to your domain root. If your site lives at https://example.com/, the sitemap should be accessible at https://example.com/sitemap.xml. This is the convention, and search engines look here first when no other location has been specified.

Verify the file is accessible by visiting the URL in your browser. You should see the raw XML or a styled version of it (some browsers render XML with collapsible elements). A 404 error means the file is not where you think it is, or your server is not serving .xml files correctly.

Add a Sitemap directive to your robots.txt file pointing to the sitemap location:

Sitemap: https://example.com/sitemap.xml

This is the proper standardised declaration covered back in Module One, Lesson 6: Sitemaps.org Protocol Explained. Crawlers will find the sitemap with or without this line if they check the conventional /sitemap.xml location, but the declaration is part of the protocol and worth including.

The next lesson goes deeper on hosting considerations, including HTTPS requirements, host quirks for static site platforms like Netlify and Vercel, and the verification step that catches the most common mistakes. For now, the basic upload-and-declare flow is enough to get your hand-written sitemap online.

The maintenance burden you are signing up for

Manual XML works as long as you maintain it. The moment the file falls out of sync with your real site, the sitemap becomes a liability rather than an asset. Crawlers learn that the sitemap is unreliable, and over time they trust the URLs inside it less, even the ones that are still accurate.

Every new page you publish needs a new url block in the file. Every page you remove needs the corresponding url block deleted. Every meaningful content update needs the lastmod date refreshed to match. There is no automation watching for you.

For a small static site with a dozen or so pages that change a few times a year, this is sustainable. You set a reminder to review the sitemap every quarter, you update it when you make significant changes, and the burden is real but manageable. For a blog with weekly publishing, the manual approach will exhaust you within a month. Every new post is a separate edit, and the maintenance overhead eventually pushes you back to a plugin-based or generator-based approach.

Generator tools sit between fully manual and fully automated, and they are worth knowing about even if you have committed to the manual route. Screaming Frog can crawl your site and export a clean XML sitemap as a one-off, which you then take ownership of and edit further. Online generators like xml-sitemaps.com do the same for free up to a few hundred URLs. You run them whenever your site changes significantly, download the resulting file, and host it yourself.

This is still manual in the sense that you control what gets included and when the sitemap updates. You skip the line-by-line writing, but the maintenance burden becomes “remember to re-run the generator” rather than “remember to edit the file”. For sites in the middle ground between fully static and fully dynamic, this hybrid approach is often the most sustainable path.

Where this leaves us

You now have a hand-written sitemap. You have it saved as XML, hosted on your server, and declared in your robots.txt file. You have a maintenance plan, however ambitious or modest. What you do not yet have is search engines actually knowing the sitemap exists in a reliable way, and a few hosting considerations you might be missing.

The next lesson covers where your sitemap should live in detail, including the conventions you have already touched, the robots.txt declaration in depth, HTTPS requirements, server configuration considerations, and the verification steps that catch the most common mistakes before they become problems. It applies whether you built your sitemap manually, through a plugin, or with a generator tool.

Up next: Where Your Sitemap Should Live →


This is Module 2: Lesson 5 of The Sitemap Series, a Technical SEO series on sitemaps from first principles, built for the AI Search era.

Share This Article
Victor Ijomah
Technical SEO Specialist
Follow:
Victor Afamefuna Ijomah is a UK-based Technical SEO Specialist focused on how Google and AI engines like ChatGPT, Perplexity, and AI Overviews decide what gets discovered, understood, and cited. He holds an M.Sc in Digital Marketing from the University of Chester and is the editor of The Technical SEO Library, a publication on crawl systems, schema, entity SEO, AI crawler management, and the technical foundations of visibility in the AI Search era.
Leave a Comment