Loading
Please wait while your experience is prepared...
Please wait while your experience is prepared...
seo / Jun 2, 2026 / 9 min
249 articles published on april 24. google indexed all of them, then dropped 247 within 2 weeks. the git history and gsc data made the cause clear.
I was working on a content publishing project when I noticed the indexed page count in Search Console dropping sharply. The domain had been live a few weeks. Around 300 articles had been published before I joined. The drop started about 10 days after launch and kept going.
I spent a day investigating with a Claude session that had full access to the Git history and Search Console data. The results were precise enough to be useful beyond this specific site.
Two distinct batches of content on the same domain, published within weeks of each other:
Bulk batch: 249 articles published on April 24 in 5 manual commits, before the content engine was running. Dates were then modified multiple times over the following days. Google indexed all 249. Two weeks later, 247 were gone.
Engine batch: articles published by the content engine from late April onwards, in runs of 3-5 articles at a time, with real publication dates. Every article the engine has published is still indexed.
Same domain. Same technical setup. Same Next.js site. The indexing rate difference is explained entirely by how the two batches were published.
Google Search Console showed 247 pages in "Crawled: currently not indexed" state. No manual actions, no penalties. The pages were accessible, returned 200, and passed the URL inspection tool's check. Google had crawled them and chosen not to index them.
The GSC impressions chart told the full shape: peak of 4,222 impressions per day on April 27 when all 249 were indexed. Down to 509 by May 20. Down to 260 by June 1. A 94% drop driven entirely by going from 249 indexed pages to 2.
The Git history made this precise:
| Commit | Articles added |
|---|---|
| "Add 20 articles: TCO, pricing, strategy guides" | 20 |
| "Articles registry: add 6 new strategy articles" | 6 |
| "Add 28 articles: strategy, compliance, security" | 28 |
| "Add 50 articles, updated case studies" | 50 |
| "Add 144 new articles across 13 content clusters" | 144 |
| "Add on-device AI education article" | 1 |
| Total | 249 |
All 5 commits on April 24. Google discovered and crawled them April 25-26: a brand new domain that had published 249 articles in a single night.
Google's helpful content documentation lists "mass-produced content by automation" as a quality signal. 144 articles in a single commit: regardless of individual quality: reads as exactly that.
Google gives new pages a trial window of roughly 7-10 days where it shows them in results and measures engagement. CTR during this window determines which pages survive.
GSC data for the non-indexed pages during April 28 to May 5:
| Page | Impressions | Clicks | CTR |
|---|---|---|---|
| native-offline-vs-pwa-field-teams | 2,059 | 1 | 0.05% |
| on-device-ai-vs-cloud-api | 958 | 1 | 0.10% |
| mobile-apps-frontline-workers | 591 | 1 | 0.17% |
| app-store-rejection-prevention | 551 | 1 | 0.18% |
Normal CTR at position 6-7 is around 3%. These pages were getting 0.05-0.18%, roughly 20-60x below normal. The pages that survived had CTRs of 1-5% during the same window.
The titles were the problem. "Mobile App Development for US Construction Companies: Field Apps, Safety Compliance, and AI 2026" describes the article. It does not give someone searching "construction field service app" a reason to click. Google showed the page to thousands of people, almost no one clicked, and the evaluation failed.
The most striking finding came from joining the Git commit history against GSC's index status per cluster:
| Cluster | Indexed | Total | Rate |
|---|---|---|---|
| Offline / Connectivity | 2 | 5 | 40% |
| US Industry Verticals | 7 | 23 | 30% |
| Agency Selection / Outsourcing | 12 | 48 | 25% |
| Vendor / Team Management | 8 | 36 | 22% |
| AI Features / Strategy | 3 | 14 | 21% |
| On-Device AI | 3 | 15 | 20% |
| Tech Stack / Framework | 5 | 27 | 18% |
| Cost / Pricing / ROI | 4 | 29 | 13% |
| Field Service / Logistics | 2 | 17 | 11% |
| Compliance and Security | 1 | 24 | 4% |
Every cluster shows the same pattern. Google indexed 1-3 articles per cluster and dropped the rest. Within each cluster, the articles targeting the highest search-demand queries survived. The rest were treated as redundant.
The compliance cluster is the clearest example: 24 articles published simultaneously, 1 survived (HIPAA: the highest-demand query in the cluster). FERPA, FedRAMP, financial services compliance, insurance compliance: all dropped. Google picked one winner for the cluster and stopped indexing the others.
This is not a content quality problem. Several of the dropped articles were well-researched. Google did not evaluate them individually. It evaluated the cluster as a group, picked the strongest match for the cluster's primary query, and pruned the rest as near-duplicates.
The publication dates were changed 4 times in 48 hours after publishing, while Google was actively crawling the new domain:
| Date | Commit | What changed |
|---|---|---|
| Apr 25 | f9c9bd0 | All dates pushed back to Feb 2025 |
| Apr 25 | c9141e2 | Same day: dates rolled back to Jan 2026 |
| Apr 26 | 95b275b | Dates changed to Oct 2025 spread |
| May 1 | b0334f1 | Second batch: all backdated to 2025-2026 |
The intent was to make the content library look like it had publishing history rather than a site that launched 249 articles overnight. Google crawled the pages on April 25, then re-crawled them on April 26 and saw different datePublished values in the Article JSON-LD. Then saw them change again the following day. Across 249 pages simultaneously, this pattern is detectable at the domain level.
Google's helpful content guidelines list "manipulating freshness dates artificially" as a quality signal. The commit history shows this happened to the entire batch within 48 hours of the initial crawl.
The control case is in the same Git history. A run on May 4-6 published 14 articles on the same domain: small batch, no date changes after publishing. 13 of 14 are indexed. 93% rate versus 0.8% for the April batch.
The domain authority did not change between April 24 and May 4. The content quality is comparable. The only variables are batch size and date stability.
From the investigation, the failure modes are straightforward to avoid:
These are not speculative. Each one maps directly to a measurable cause in the data above. I've been building a content engine that operates on exactly these constraints. I'll write about how that's designed separately.
does google index all articles from a content pipeline on launch day?
yes, but only temporarily. google gives new content a honeymoon window of roughly 7-10 days where it indexes and ranks everything to measure engagement. during this window it collects click-through rate data. if pages get shown in search results but users do not click, google treats that as a signal the pages do not satisfy search intent. at the end of the window, pages with very low CTR get marked as 'crawled, currently not indexed' and removed from results. the initial index is a trial, not a stable state.
what is google cluster pruning and how does it cause deindexing?
when multiple articles targeting the same keyword cluster are published simultaneously, google evaluates them as a group. it identifies the article with the strongest relevance and authority signals and treats it as the representative for that cluster. the others get marked as near-duplicates and dropped. this is not a penalty: the dropped articles are accessible and have no manual action against them. google simply decided one article is enough to represent the cluster. the compliance cluster in our data had 24 articles published simultaneously and 23 were dropped.
does changing publication dates after google crawls an article hurt indexing?
yes, especially if done multiple times. when google first crawls a page it records the publication date from the article JSON-LD. if that date changes in a subsequent crawl, google sees a mismatch. changing dates 4 times in 48 hours while google is actively crawling is detectable as a systematic pattern across the whole domain. google's helpful content documentation lists 'manipulating freshness dates artificially' as a quality signal. using the real publication date: the date the article was actually deployed: avoids this entirely.
if 247 articles got deindexed, can they be recovered?
some can. the ones that are genuinely unique topics (not cluster variations) are recoverable through google search console url inspection and request indexing, batched at 10-15 per day. google re-evaluates and may index them if the domain has accumulated more authority since the original drop. the cluster variants are harder: google already has an indexed article representing that cluster, and the near-duplicate will keep losing the same evaluation unless it is made meaningfully different. for those, the practical options are rewriting to target a distinct angle, or consolidating into the indexed article with a 301 redirect.