Skip to main content
← Back to blog

Sitemap index for multi-tenant SaaS: how Google discovers your tenants without you lifting a finger

The one mechanism in the sitemaps protocol designed for scale — and almost nobody uses it.

A multi-tenant SaaS has an SEO problem Google's tutorials never mention: every tenant is a new subdomain, and Google doesn't know.

The default workflow you find in Search Console is: go in, click "Add a new sitemap," paste the URL, hit Submit. Works for one site. Works poorly for five. Past that, every new user signup becomes manual friction — somebody has to remember to submit their sitemap, each time, so Google indexes it this week instead of next.

If your business model depends on volume (a free blog, an open-source editor, a landing-page builder), this process isn't just annoying — it's a growth bottleneck. Every step between "user creates their site" and "site shows up in Google" kills conversions. And nobody has time to submit each tenant by hand.

The good news: there's an official sitemaps mechanism built exactly for this. It's called sitemap index. The bad news: it's so poorly documented that most SaaS end up building fragile alternatives (cron scripts, onboarding hacks) when a clean, direct solution exists.

This is what we did at Agentikas.

The real problem: Google doesn't auto-discover subdomains

The belief that "Google finds everything on its own" is partly wrong. Google discovers URLs through a few paths:

  1. External links — some site points at your subdomain. May never happen for a new tenant.
  2. Internal links from already-indexed pages — if your main domain links to maria.your-saas.com, Google follows and crawls. But only when it revisits your domain, and only for explicitly linked subdomains.
  3. Search Console submissions — manual, covers only what you give it.
  4. Sitemap: directive in the subdomain's robots.txt — Google reads it, but only once it already knows the subdomain (chicken-and-egg).

None of these scale. The closest to "automatic" is option (2), but it requires every tenant to be linked from an indexed page. In practice, SaaS list tenants on "explore other blogs" or "discover" pages that feature a subset (featured, recent), not the full catalog. Unlisted tenants stay invisible.

The solution: sitemap index

A sitemap index is an XML file with a specific structure listing other sitemaps. It doesn't contain page URLs — it contains pointers to more sitemaps. Example:

<?xml version="1.0" encoding="utf-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://your-saas.com/sitemap.xml</loc>
    <lastmod>2026-04-21T11:42:23.017Z</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://maria.your-saas.com/sitemap.xml</loc>
    <lastmod>2026-04-20T09:15:00.000Z</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://juan.your-saas.com/sitemap.xml</loc>
    <lastmod>2026-04-19T14:00:00.000Z</lastmod>
  </sitemap>
  <!-- ... one per tenant -->
</sitemapindex>

The critical difference from a normal sitemap is that you submit this URL ONCE in Search Console and Google discovers every sub-sitemap itself. When you add a new tenant, your endpoint regenerates the index including it, Google re-reads it on the next crawl, and the new tenant enters the indexing pipeline — nobody touches anything.

It's the only mechanism in the sitemaps protocol designed specifically for scale. And hardly anyone uses it.

Building it in Next.js

The sitemap index must be a dynamic route, not a static file — because every new tenant changes the XML. In Next.js App Router:

// app/sitemap-index.xml/route.ts
import { NextResponse } from "next/server";
import { getPublicTenants } from "@/data/tenants";

export const dynamic = "force-dynamic";
export const revalidate = 3600;

export async function GET() {
  const tenants = await getPublicTenants();
  const now = new Date().toISOString();

  const entries = [
    `  <sitemap>
    <loc>https://your-saas.com/sitemap.xml</loc>
    <lastmod>${now}</lastmod>
  </sitemap>`,
    ...tenants.map(t => `  <sitemap>
    <loc>https://${t.slug}.your-saas.com/sitemap.xml</loc>
    <lastmod>${t.updatedAt}</lastmod>
  </sitemap>`),
  ];

  const xml = `<?xml version="1.0" encoding="utf-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
${entries.join("\n")}
</sitemapindex>`;

  return new NextResponse(xml, {
    headers: {
      "Content-Type": "application/xml; charset=utf-8",
      "Cache-Control": "public, s-maxage=3600, stale-while-revalidate=7200",
    },
  });
}

The key pattern is the combination of two things:

  1. Query the database on every render (getPublicTenants() does a select * from tenants where public = true without in-memory cache).
  2. Edge-level cache with short TTL (s-maxage=3600) so we don't hit the DB on every request.

With a 3600-second TTL, a new tenant shows up in the index 0 to 1 hour after creation. An hour is acceptable for most SaaS — Google won't crawl the new sitemap in less than that anyway.

The two-level rule

Google allows ONE level of nesting in sitemaps. You can have:

  • sitemap-index.xml → N sitemap.xml (urlsets) ✅
  • sitemap.xml (urlset) with URLs directly ✅

But NOT:

  • sitemap-index.xmlanother-index.xmlsitemap.xml

This matters if your SaaS scales to tenants with thousands of pages each. The official per-sitemap limit is 50,000 URLs and 50MB. If a tenant exceeds that, its sitemap should split into multiple — not via its own index, but as sibling sitemaps all hanging from the root index. In practice, very few tenants hit that volume, so the simple index → urlset structure covers 99% of cases.

robots.txt: the safety net

The sitemap index in Search Console is the "push" channel — you tell Google where to look. There's also a "pull" channel: the Sitemap: line in each subdomain's robots.txt. When Google crawls maria.your-saas.com for any reason (external link, sitemap index, scheduled recrawl), it reads robots.txt and discovers the sitemap there.

Defensive config declares BOTH:

In your-saas.com/robots.txt:

Sitemap: https://your-saas.com/sitemap.xml
Sitemap: https://your-saas.com/sitemap-index.xml

In <tenant>.your-saas.com/robots.txt (each subdomain):

Sitemap: https://<tenant>.your-saas.com/sitemap.xml

With this, any of the three paths (direct index submit, discovery via links to the root domain, discovery via the subdomain itself) leads Google to the right sitemap. Resilience through redundancy.

The TTL trade-off

Shortening the edge-cache TTL makes new tenants appear faster, but increases DB load. For a SaaS with 10 tenants and 100 requests/day to the index, trivial. For 10,000 tenants and 100,000 requests/day, every extra request counts.

The alternative is on-demand cache purge: when your "create blog" endpoint gets a successful POST, make an extra call to Cloudflare's API (or any CDN) invalidating the sitemap-index. Pros: tenant in the index within seconds. Cons: you couple your creation flow to the CDN (if it fails, the sitemap goes stale but nothing critical breaks).

We picked the simple one-hour TTL. When metrics tell us it matters (Google slow to index new tenants, users complaining), we'll switch to on-demand purge. YAGNI first, optimization later.

What Google actually does with the index

Google doesn't process the sitemap index instantly. The real flow is roughly:

  1. You submit sitemap-index.xml once in Search Console
  2. Google fetches it within minutes to hours
  3. Enqueues each listed sub-sitemap for later crawl
  4. Crawls each sub-sitemap in its own order (hours to days)
  5. Enqueues each URL from the sub-sitemap for indexing
  6. Indexes URLs according to its own internal priority (days to weeks)

On each re-crawl of the index, Google uses each <lastmod> entry to decide whether that sub-sitemap is worth re-crawling. So the lastmod of each entry should reflect when that tenant's content actually changed — not simply "now." If you always write new Date().toISOString(), Google flags you as noisy and reduces re-crawl frequency.

Correct pattern: lastmod of each entry = max(updated_at) of that tenant's content. If nothing in maria.your-saas.com changes for a week, its <lastmod> doesn't move, and Google won't re-crawl its sitemap unnecessarily.


Agentikas is open source. The sitemap index this post describes is live at agentikas.ai/sitemap-index.xml and the source is in github.com/agentikas/agentikas-blog. The PR that introduced it is #40, and the buildSitemapIndex helper lives in @agentikas/blog-ui — reusable in any Next.js.

Comments

Loading comments…