How do you diagnose why URLs listed in a sitemap show Discovered currently not indexed for months despite having unique high-quality content?

The question is not why Google has not indexed these pages yet. The question is whether Google has decided not to index them or simply has not gotten to them. “Discovered – currently not indexed” is one of the most ambiguous status messages in Search Console because it covers two entirely different situations: pages Google has discovered but lacks crawl demand to fetch, and pages Google has fetched but decided are not worth indexing yet stored under this status rather than a more specific exclusion reason. The diagnostic workflow must distinguish between these two states because they require different remediation strategies.

Distinguish between unfetched and fetched-but-not-indexed using server logs

Search Console does not differentiate between URLs that have never been crawled and URLs that have been crawled but not indexed. The “Discovered – currently not indexed” label applies to both. Server log analysis provides the critical distinction that Search Console cannot.

Extract all Googlebot requests from server access logs for the affected URL set. For each URL in the “Discovered – currently not indexed” list, check whether Googlebot has ever requested it and what HTTP status code was returned:

URL never appears in Googlebot logs: The page has not been crawled. Google knows it exists (from the sitemap or a link) but has not allocated crawl demand to fetch it. This is a crawl demand problem.
URL appears in logs with a 200 response: Google crawled the page, received valid content, evaluated it, and decided not to index it. This is a quality or authority threshold problem.
URL appears in logs with a non-200 response (5xx, timeout, blocked): Google attempted to crawl but failed. This is a technical access problem that presents as a discovery issue.

The log analysis should cover at least 90 days to account for Google’s re-crawl intervals. A URL not fetched in 90 days has very low crawl priority. A URL fetched once in 90 days with a 200 response was evaluated and rejected.

For sites without server log access, the URL Inspection tool provides a partial alternative. Inspecting a “Discovered – currently not indexed” URL shows whether Google has a cached crawl of the page. If the inspection shows “URL is not on Google” with no last crawl date, the page was likely never fetched. If it shows a last crawl date, the page was fetched and rejected.

One important caveat: the “Discovered – currently not indexed” status can fluctuate. Google’s John Mueller has confirmed that pages may move in and out of this status as Google’s systems re-evaluate priorities. A page that was “Discovered – currently not indexed” this week may be crawled and indexed next week without any intervention, or it may remain in this status indefinitely.

Crawl demand deficiency: the URL exists in the sitemap but lacks supporting signals

When server logs confirm a URL has never been fetched by Googlebot, the root cause is almost always insufficient crawl demand signals. The sitemap alone provides a discovery hint but does not generate enough scheduling priority for Google to allocate a crawl slot when competing against millions of other queued URLs.

Crawl demand for a specific URL is influenced by multiple signals that the sitemap cannot provide:

Internal link presence and depth. A URL with no internal links pointing to it relies entirely on the sitemap for discovery. Google’s crawling systems give higher scheduling priority to URLs reachable through the link graph because link presence correlates with page importance. Check whether the affected URLs receive internal links from other indexed pages. If internal link count is zero or near-zero, the URL is effectively orphaned despite its sitemap presence.

External backlinks. URLs with external links from other domains receive higher crawl demand because the external links signal third-party endorsement. For new pages with zero backlinks, crawl demand is derived entirely from internal signals and sitemap presence, which may be insufficient.

Site-wide quality assessment. Google evaluates new URLs partially based on the quality of similar URLs already indexed from the same domain. Onely’s research confirms that Google classifies URLs by pattern and assesses quality at the pattern level. If the site has many thin or duplicate pages in the same URL pattern, new pages matching that pattern inherit lower crawl priority regardless of their individual quality.

Crawl rate availability. On large sites where Googlebot already uses its full crawl rate limit processing existing URLs, new URLs compete for limited remaining capacity. If the site has 500,000 existing indexed URLs and a daily crawl rate of 10,000 requests, new URLs from the sitemap enter a queue behind existing URL maintenance crawls.

The signal audit for each unfetched URL should check: (1) internal link count from indexed pages, (2) external backlink count, (3) position in the site’s URL hierarchy (click depth from homepage), and (4) whether the URL pattern has a history of quality issues.

Quality threshold failures and site-level crawl budget constraints on indexation

When server logs confirm Googlebot fetched the page with a 200 response, the page was evaluated by Google’s indexing system and did not meet the threshold for inclusion. This outcome occurs even with unique, high-quality content when other factors suppress indexing eligibility.

Insufficient authority signals. A new page on a new or low-authority domain faces a higher indexing threshold than the same content on an established, high-authority domain. Google’s indexing system evaluates whether adding the page to the index provides value relative to existing indexed pages covering the same topic. If established competitors already cover the topic comprehensively, a new page from a lower-authority domain may not pass the marginal value threshold.

Topical isolation. A high-quality page that exists in isolation — with no supporting content on the same topic elsewhere on the site — lacks topical cluster signals. Google evaluates topical depth at the site level. A single excellent article on a topic the site has no other content about receives less indexing confidence than the same article on a site with 20 related articles forming a topical cluster.

Duplicate or near-duplicate detection. Google’s duplicate detection operates at the content level, not just the URL level. If the page’s content is substantially similar to content already indexed (either on the same site or on a different, higher-authority site), Google may suppress indexing of the duplicate. This can occur even when the content was independently created, if the factual coverage overlaps significantly with existing indexed pages.

Site-level quality concerns. If a site has a high proportion of low-quality or thin pages in its index, Google may apply a higher indexing threshold to new pages from that site. The May 2025 Google indexing purge specifically targeted “poor performing” pages, and the quality threshold for new page indexation appears to have increased. Improving the quality ratio of the overall site — by removing or noindexing low-quality pages — can improve the indexing likelihood for new high-quality pages.

On sites with more than 100,000 URLs, crawl budget becomes a binding constraint. Google allocates a finite number of daily crawl requests per host, and existing URLs consume the majority of this budget for maintenance crawling (re-fetching to check for updates).

Botify’s data shows that refresh crawling (Googlebot re-crawling known pages) typically consumes 75-95% of total crawl budget, leaving only 5-25% for discovery of new URLs. On a site with a daily crawl rate of 15,000 requests, this means only 750-3,750 requests per day are available for new URL discovery and indexation.

If the site published 500 new URLs this month and has 2,000 “Discovered – currently not indexed” URLs in the backlog, the daily discovery budget of 750-3,750 requests is shared across all of them. At the low end, it takes 3-4 days just to crawl the new URLs from this month, and the backlog grows faster than it is processed.

Diagnosing crawl budget constraint requires examining the Crawl Stats report in Search Console:

Total crawl requests per day relative to the total URL count. If the ratio is below 1% (fewer than 1,000 daily requests for a 100,000-URL site), crawl budget is constrained.
Response time distribution. High average response times (above 500ms) indicate the server is limiting crawl rate, reducing the budget available for new URL discovery.
Crawl request distribution by URL segment. If server logs show that 80% of Googlebot requests go to URL patterns with low indexing value (faceted navigation, pagination, parameter variations), the budget is being consumed by low-priority URLs at the expense of new content.

Remediation strategies ranked by effectiveness for each root cause

For crawl demand deficiency (URL never crawled):

Add internal links from high-authority pages. Linking from the homepage, category pages, or high-traffic blog posts to the unfetched URLs signals importance and creates a crawl pathway independent of the sitemap. This is the single most effective intervention, with observed crawl pickup within 1-2 weeks.
Ensure accurate sitemap lastmod. If the sitemap’s lastmod signals are not trusted by Google (due to historical inaccuracy), the sitemap provides minimal scheduling benefit. Fix lastmod accuracy across the entire sitemap.
Request indexing via URL Inspection. Manual indexing requests through the URL Inspection tool bypass the normal scheduling queue. This works for individual URLs but is not scalable beyond 10-20 URLs per day.
Build external links. Even a few links from relevant external sites create crawl demand signals that accelerate discovery.

For quality threshold failure (URL crawled but not indexed):

Strengthen the topical cluster. Add supporting content around the same topic to build topical authority. A single page in isolation faces a higher indexing threshold than a page within a cluster of related content.
Improve content differentiation. If the content overlaps significantly with existing indexed pages (on other sites), add unique data, original analysis, or proprietary insights that differentiate the page.
Improve site-wide quality ratio. Noindex or remove low-quality pages to improve the proportion of high-quality pages in the site’s index. This raises Google’s confidence in new pages from the same domain.
Add structured data and rich content signals. While structured data does not directly influence indexing decisions, it adds quality signals that can marginally improve indexing eligibility.

For crawl budget constraint (site-level throttling):

Implement crawl budget prioritization. Block or noindex low-value URL patterns to free crawl capacity for new content.
Reduce server response time. Faster TTFB increases the crawl rate limit, providing more daily requests for new URL discovery.
Clean up redirect chains and soft 404s. Both consume crawl budget without producing indexed pages, reducing the capacity available for new content.

Does the “Discovered, currently not indexed” status mean Google has determined the page is low quality?

This status does not indicate a quality judgment. It means Google knows the URL exists but has not allocated crawl resources to fetch it yet. The URL sits in the crawl queue without sufficient demand signals to prioritize it above other pending URLs. Low quality can be one reason Google deprioritizes a URL, but the same status appears for high-quality pages on sites with crawl budget constraints, insufficient internal linking, or new domains that have not yet built enough crawl demand.

Does manually requesting indexing through Search Console resolve persistent “Discovered, currently not indexed” status?

Manual indexing requests can trigger a one-time priority crawl, but they do not address the underlying cause. If the page lacks sufficient demand signals (internal links, external links, sitemap presence with accurate lastmod), it may return to the “Discovered” state after the manual crawl. Sustained indexation requires building the demand signals that keep the URL in Google’s active crawl rotation. Manual submission is a diagnostic tool, not a permanent solution.

Does the number of URLs in “Discovered, currently not indexed” status correlate with overall site crawl budget health?

A growing count of “Discovered” URLs relative to total site URLs indicates either insufficient crawl demand for new content or crawl budget being consumed by low-value URL patterns elsewhere on the site. For sites under 10,000 pages, any substantial count in this status suggests a specific technical or quality issue. For sites with millions of pages, a percentage-based threshold is more meaningful: more than 30% of sitemap-listed URLs in “Discovered” status signals a systemic crawl budget problem requiring structural intervention.

Sources

Onely. “How to Fix ‘Discovered – Currently Not Indexed’ in GSC.” https://www.onely.com/blog/how-to-fix-discovered-currently-not-indexed-in-google-search-console/
Search Engine Land. “Understanding and Resolving ‘Discovered – Currently Not Indexed.'” https://searchengineland.com/understanding-resolving-discovered-currently-not-indexed-392659
Embarque. “How We Resolve ‘Discovered – Currently Not Indexed’ on Google Search Console.” https://www.embarque.io/post/how-we-resolve-discovered-currently-not-indexed-on-google-search-console
Google Developers. “Troubleshoot Google Search Crawling Errors.” https://developers.google.com/search/docs/crawling-indexing/troubleshoot-crawling-errors
Botify. “Crawl Budget: How Many Pages Search Engines Will Crawl on Your Site.” https://www.botify.com/insight/crawl-budget-how-many-pages-search-engines-will-crawl-on-your-site-how-to-optimize-it

How do you diagnose why URLs listed in a sitemap show Discovered currently not indexed for months despite having unique high-quality content?

Distinguish between unfetched and fetched-but-not-indexed using server logs

Crawl demand deficiency: the URL exists in the sitemap but lacks supporting signals

Quality threshold failures and site-level crawl budget constraints on indexation

Remediation strategies ranked by effectiveness for each root cause

Sources

Vega SEO Talks

Leave a Reply Cancel reply

Distinguish between unfetched and fetched-but-not-indexed using server logs

Crawl demand deficiency: the URL exists in the sitemap but lacks supporting signals

Quality threshold failures and site-level crawl budget constraints on indexation

Remediation strategies ranked by effectiveness for each root cause

Sources

Related posts:

Vega SEO Talks

Leave a Reply Cancel reply