I spent 3 hours debugging why Google Search Console kept showing "Couldn't fetch" on a sitemap that looked completely normal in my browser. The XML rendered fine. Every URL in it worked when I clicked them one by one. The actual problem was one unescaped & buried in product entry #4,312. That single character was enough to invalidate the entire file, not just that one row.
By the end of this post, you'll know how to generate a sitemap in TypeScript that can't break that way, how to scale it past the 50,000-URL limit, and how to validate it before Google ever sees it.
Why a Hand-Rolled Sitemap Breaks Silently
Most sitemaps start the same way: map over an array of routes, template-string them into XML, write the result to a file. It works fine right up until your content includes data your team doesn't fully control: product names, blog titles, anything with an ampersand, a quote, or an angle bracket in it.
XML parsers don't gracefully skip a malformed character. They reject the whole document. Once that happens, Search Console reports every URL in that sitemap as unfetched, including the ones that had nothing to do with the actual problem.
Here's the version that looks fine and isn't:
// sitemap.ts: works in dev, fails in prod
const routes = ['/', '/about', '/products/salt-&-pepper-shakers'];
const xml = `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
${routes.map(r => `<url><loc>https://example.com${r}</loc></url>`).join('\n')}
</urlset>`;
console.log(xml);
Result: this compiles and prints output that looks like valid XML. Paste it into any XML validator and it fails on the &. That character is reserved in XML and has to be written as &. One bad product name and your whole sitemap is invalid.
Building a Sitemap That Escapes Properly
The fix is straightforward: escape every dynamic value before it goes into the XML, and include the fields Google actually still uses. Worth knowing upfront: Google has said for years that it largely ignores priority and changefreq. lastmod is the field worth getting right.
// sitemap-generator.ts
import { writeFileSync } from 'fs';
interface SitemapUrl {
loc: string;
lastmod?: string;
changefreq?: 'always' | 'hourly' | 'daily' | 'weekly' | 'monthly' | 'yearly' | 'never';
priority?: number;
}
function escapeXml(value: string): string {
return value
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"')
.replace(/'/g, ''');
}
function buildSitemap(hostname: string, urls: SitemapUrl[]): string {
const entries = urls
.map(({ loc, lastmod, changefreq, priority }) => {
const fullLoc = escapeXml(`${hostname}${loc}`);
const tags = [
lastmod ? `<lastmod>${lastmod}</lastmod>` : '',
changefreq ? `<changefreq>${changefreq}</changefreq>` : '',
priority !== undefined ? `<priority>${priority}</priority>` : '',
].join('');
return `<url><loc>${fullLoc}</loc>${tags}</url>`;
})
.join('\n');
return `<?xml version="1.0" encoding="UTF-8"?>\n<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">\n${entries}\n</urlset>`;
}
const urls: SitemapUrl[] = [
{ loc: '/', lastmod: '2026-06-01', changefreq: 'weekly', priority: 1.0 },
{ loc: '/products/salt-&-pepper-shakers', lastmod: '2026-05-28' },
];
writeFileSync('./public/sitemap.xml', buildSitemap('https://example.com', urls));
Result: the ampersand now comes out as &, the file validates cleanly, and Search Console stops flagging that route. For a five-page site or a small blog, this is genuinely enough. You don't need a dependency for this.
Scaling Past 50,000 URLs Without Crashing Your Build
The sitemap protocol caps a single file at 50,000 URLs or 50MB uncompressed. That limit hasn't moved in years, so it's safe to build automation around it. Once your catalog or CMS crosses that line, you need multiple sitemap files plus one index file that points to all of them.
// sitemap-index.ts
import { writeFileSync } from 'fs';
function chunk<T>(items: T[], size: number): T[][] {
const chunks: T[][] = [];
for (let i = 0; i < items.length; i += size) {
chunks.push(items.slice(i, i + size));
}
return chunks;
}
function buildSitemapIndex(hostname: string, filenames: string[]): string {
const entries = filenames
.map((file) => `<sitemap><loc>${hostname}/${file}</loc></sitemap>`)
.join('\n');
return `<?xml version="1.0" encoding="UTF-8"?>\n<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">\n${entries}\n</sitemapindex>`;
}
const allUrls: SitemapUrl[] = await fetchAllProductUrls(); // could be 200,000+
const batches = chunk(allUrls, 45000); // stay safely under the 50k cap
const filenames: string[] = [];
batches.forEach((batchUrls, i) => {
const filename = `sitemap-${i}.xml`;
writeFileSync(`./public/${filename}`, buildSitemap('https://example.com', batchUrls));
filenames.push(filename);
});
writeFileSync('./public/sitemap-index.xml', buildSitemapIndex('https://example.com', filenames));
Result: a 200,000-product catalog now produces several small sitemap files instead of one giant file that either gets rejected outright or times out mid-fetch.
Where I Stopped Writing This From Scratch Every Project
The functions above cover most real-world cases. What they don't cover: keeping memory flat when you're streaming from a database cursor instead of an in-memory array, image/video sitemap extensions, validating entries in CI before they ship, and running on edge runtimes that don't have fs or path at all.
Last time I needed this for a catalog with six figures of product pages, I reached for @power-seo/sitemap instead of rewriting these utilities again. It's the same chunking and escaping logic above, packaged with the parts that get tedious to maintain.
Streaming keeps memory flat when pulling from a cursor instead of building one giant string:
import { streamSitemap } from '@power-seo/sitemap';
const urlCursor = db.collection('products').find().stream(); // async iterable
const stream = streamSitemap('https://example.com', urlCursor);
for await (const chunk of stream) {
response.write(chunk);
}
response.end();
And validating entries before they ever reach a deploy:
import { validateSitemapUrl } from '@power-seo/sitemap';
const result = validateSitemapUrl({ loc: '/blog/post-1', priority: 1.8 });
if (!result.valid) {
console.error(result.errors); // ['priority must be between 0.0 and 1.0']
process.exit(1); // fail the build instead of shipping a bad value
}
Result: catching a priority of 1.8 (the spec caps it at 1.0) in CI costs you a failed build. Shipping it costs you nothing visible immediately, which is exactly the problem: it just quietly erodes how much Google trusts your file over time. It's published on npm as @power-seo/sitemap, has no runtime dependencies beyond its own core package, and avoids Node-specific APIs internally, which matters if you're deploying to Cloudflare Workers or Vercel Edge instead of a traditional server.
My Pre-Deploy SEO Checklist for Sitemaps
Here's the short version: the actual SEO checklist I run before shipping anything sitemap-related now:
-
Escape every dynamic value: One unescaped
&,<, or'can invalidate the whole file, not just that entry. -
Stop optimizing
priorityandchangefreq: Google has said for years it largely ignores both. Spend that energy on accuratelastmoddates instead. - Build the 50,000-URL split logic before you need it: Retrofitting a sitemap index after Search Console starts throwing errors is a worse afternoon than building it up front.
- Validate in CI, not in production: A bad URL caught at build time is a five-second fix. The same bug caught a week later is a week of degraded crawl trust you can't get back instantly.
If you want to try this approach, here's the repo: https://ccbd.dev/blog/seo-sitemap-library-for-typescript-a-complete-guide-to-power-seositemap.
Have you ever shipped a sitemap that silently broke without you noticing? What caught it for you: Search Console, an annoyed client, or pure luck? And if you've used a different TypeScript sitemap tool, I'd genuinely like to know how it compares. Drop your sitemap war stories in the comments.