seo / Jun 2, 2026 / 9 min

why 247 of 299 articles got deindexed in 2 weeks

249 articles published on april 24. google indexed all of them, then dropped 247 within 2 weeks. the git history and gsc data made the cause clear.

I was working on a content publishing project when I noticed the indexed page count in Search Console dropping sharply. The domain had been live a few weeks. Around 300 articles had been published before I joined. The drop started about 10 days after launch and kept going.

I spent a day investigating with a Claude session that had full access to the Git history and Search Console data. The results were precise enough to be useful beyond this specific site.

The setup

Two distinct batches of content on the same domain, published within weeks of each other:

Bulk batch: 249 articles published on April 24 in 5 manual commits, before the content engine was running. Dates were then modified multiple times over the following days. Google indexed all 249. Two weeks later, 247 were gone.

Engine batch: articles published by the content engine from late April onwards, in runs of 3-5 articles at a time, with real publication dates. Every article the engine has published is still indexed.

Same domain. Same technical setup. Same Next.js site. The indexing rate difference is explained entirely by how the two batches were published.

What the data showed

Google Search Console showed 247 pages in "Crawled: currently not indexed" state. No manual actions, no penalties. The pages were accessible, returned 200, and passed the URL inspection tool's check. Google had crawled them and chosen not to index them.

The GSC impressions chart told the full shape: peak of 4,222 impressions per day on April 27 when all 249 were indexed. Down to 509 by May 20. Down to 260 by June 1. A 94% drop driven entirely by going from 249 indexed pages to 2.

Cause 1: 249 articles in one day

The Git history made this precise:

Commit	Articles added
"Add 20 articles: TCO, pricing, strategy guides"	20
"Articles registry: add 6 new strategy articles"	6
"Add 28 articles: strategy, compliance, security"	28
"Add 50 articles, updated case studies"	50
"Add 144 new articles across 13 content clusters"	144
"Add on-device AI education article"	1
Total	249

All 5 commits on April 24. Google discovered and crawled them April 25-26: a brand new domain that had published 249 articles in a single night.

Google's helpful content documentation lists "mass-produced content by automation" as a quality signal. 144 articles in a single commit: regardless of individual quality: reads as exactly that.

Cause 2: CTR during the honeymoon window

Google gives new pages a trial window of roughly 7-10 days where it shows them in results and measures engagement. CTR during this window determines which pages survive.

GSC data for the non-indexed pages during April 28 to May 5:

Page	Impressions	Clicks	CTR
native-offline-vs-pwa-field-teams	2,059	1	0.05%
on-device-ai-vs-cloud-api	958	1	0.10%
mobile-apps-frontline-workers	591	1	0.17%
app-store-rejection-prevention	551	1	0.18%

Normal CTR at position 6-7 is around 3%. These pages were getting 0.05-0.18%, roughly 20-60x below normal. The pages that survived had CTRs of 1-5% during the same window.

The titles were the problem. "Mobile App Development for US Construction Companies: Field Apps, Safety Compliance, and AI 2026" describes the article. It does not give someone searching "construction field service app" a reason to click. Google showed the page to thousands of people, almost no one clicked, and the evaluation failed.

Cause 3: cluster pruning

The most striking finding came from joining the Git commit history against GSC's index status per cluster:

Cluster	Indexed	Total	Rate
Offline / Connectivity	2	5	40%
US Industry Verticals	7	23	30%
Agency Selection / Outsourcing	12	48	25%
Vendor / Team Management	8	36	22%
AI Features / Strategy	3	14	21%
On-Device AI	3	15	20%
Tech Stack / Framework	5	27	18%
Cost / Pricing / ROI	4	29	13%
Field Service / Logistics	2	17	11%
Compliance and Security	1	24	4%

Every cluster shows the same pattern. Google indexed 1-3 articles per cluster and dropped the rest. Within each cluster, the articles targeting the highest search-demand queries survived. The rest were treated as redundant.

The compliance cluster is the clearest example: 24 articles published simultaneously, 1 survived (HIPAA: the highest-demand query in the cluster). FERPA, FedRAMP, financial services compliance, insurance compliance: all dropped. Google picked one winner for the cluster and stopped indexing the others.

This is not a content quality problem. Several of the dropped articles were well-researched. Google did not evaluate them individually. It evaluated the cluster as a group, picked the strongest match for the cluster's primary query, and pruned the rest as near-duplicates.

Cause 4: date manipulation while google was crawling

The publication dates were changed 4 times in 48 hours after publishing, while Google was actively crawling the new domain:

Date	Commit	What changed
Apr 25	f9c9bd0	All dates pushed back to Feb 2025
Apr 25	c9141e2	Same day: dates rolled back to Jan 2026
Apr 26	95b275b	Dates changed to Oct 2025 spread
May 1	b0334f1	Second batch: all backdated to 2025-2026

The intent was to make the content library look like it had publishing history rather than a site that launched 249 articles overnight. Google crawled the pages on April 25, then re-crawled them on April 26 and saw different datePublished values in the Article JSON-LD. Then saw them change again the following day. Across 249 pages simultaneously, this pattern is detectable at the domain level.

Google's helpful content guidelines list "manipulating freshness dates artificially" as a quality signal. The commit history shows this happened to the entire batch within 48 hours of the initial crawl.

What the data says about prevention

The control case is in the same Git history. A run on May 4-6 published 14 articles on the same domain: small batch, no date changes after publishing. 13 of 14 are indexed. 93% rate versus 0.8% for the April batch.

The domain authority did not change between April 24 and May 4. The content quality is comparable. The only variables are batch size and date stability.

From the investigation, the failure modes are straightforward to avoid:

Publish one article per keyword cluster per run. Google's cluster evaluation happens when multiple articles targeting the same primary intent appear simultaneously. Give it time to evaluate one before the next arrives.
Use the real publication date. The date the article actually deploys. No backdating to simulate publishing history.
Write titles for the specific query someone would type. CTR during the 7-10 day honeymoon window determines survival. A title that describes the article instead of answering the query will fail that evaluation.
Keep runs small. 3-5 articles, not 249. The honeymoon window needs time to do its job before the next batch competes for the same crawl attention.

These are not speculative. Each one maps directly to a measurable cause in the data above. I've been building a content engine that operates on exactly these constraints. I'll write about how that's designed separately.

frequently asked questions

does google index all articles from a content pipeline on launch day?

yes, but only temporarily. google gives new content a honeymoon window of roughly 7-10 days where it indexes and ranks everything to measure engagement. during this window it collects click-through rate data. if pages get shown in search results but users do not click, google treats that as a signal the pages do not satisfy search intent. at the end of the window, pages with very low CTR get marked as 'crawled, currently not indexed' and removed from results. the initial index is a trial, not a stable state.

what is google cluster pruning and how does it cause deindexing?

when multiple articles targeting the same keyword cluster are published simultaneously, google evaluates them as a group. it identifies the article with the strongest relevance and authority signals and treats it as the representative for that cluster. the others get marked as near-duplicates and dropped. this is not a penalty: the dropped articles are accessible and have no manual action against them. google simply decided one article is enough to represent the cluster. the compliance cluster in our data had 24 articles published simultaneously and 23 were dropped.

does changing publication dates after google crawls an article hurt indexing?

yes, especially if done multiple times. when google first crawls a page it records the publication date from the article JSON-LD. if that date changes in a subsequent crawl, google sees a mismatch. changing dates 4 times in 48 hours while google is actively crawling is detectable as a systematic pattern across the whole domain. google's helpful content documentation lists 'manipulating freshness dates artificially' as a quality signal. using the real publication date: the date the article was actually deployed: avoids this entirely.

if 247 articles got deindexed, can they be recovered?

some can. the ones that are genuinely unique topics (not cluster variations) are recoverable through google search console url inspection and request indexing, batched at 10-15 per day. google re-evaluates and may index them if the domain has accumulated more authority since the original drop. the cluster variants are harder: google already has an indexed article representing that cluster, and the near-duplicate will keep losing the same evaluation unless it is made meaningfully different. for those, the practical options are rewriting to target a distinct angle, or consolidating into the indexed article with a 301 redirect.

more writing

writing

seo / Jun 2, 2026 / 9 min

why 247 of 299 articles got deindexed in 2 weeks

249 articles published on april 24. google indexed all of them, then dropped 247 within 2 weeks. the git history and gsc data made the cause clear.

I spent a day investigating with a Claude session that had full access to the Git history and Search Console data. The results were precise enough to be useful beyond this specific site.

The setup

Two distinct batches of content on the same domain, published within weeks of each other:

Same domain. Same technical setup. Same Next.js site. The indexing rate difference is explained entirely by how the two batches were published.

What the data showed

Cause 1: 249 articles in one day

The Git history made this precise:

Commit	Articles added
"Add 20 articles: TCO, pricing, strategy guides"	20
"Articles registry: add 6 new strategy articles"	6
"Add 28 articles: strategy, compliance, security"	28
"Add 50 articles, updated case studies"	50
"Add 144 new articles across 13 content clusters"	144
"Add on-device AI education article"	1
Total	249

All 5 commits on April 24. Google discovered and crawled them April 25-26: a brand new domain that had published 249 articles in a single night.

Google's helpful content documentation lists "mass-produced content by automation" as a quality signal. 144 articles in a single commit: regardless of individual quality: reads as exactly that.

Cause 2: CTR during the honeymoon window

Google gives new pages a trial window of roughly 7-10 days where it shows them in results and measures engagement. CTR during this window determines which pages survive.

GSC data for the non-indexed pages during April 28 to May 5:

Page	Impressions	Clicks	CTR
native-offline-vs-pwa-field-teams	2,059	1	0.05%
on-device-ai-vs-cloud-api	958	1	0.10%
mobile-apps-frontline-workers	591	1	0.17%
app-store-rejection-prevention	551	1	0.18%

Normal CTR at position 6-7 is around 3%. These pages were getting 0.05-0.18%, roughly 20-60x below normal. The pages that survived had CTRs of 1-5% during the same window.

Cause 3: cluster pruning

The most striking finding came from joining the Git commit history against GSC's index status per cluster:

Cluster	Indexed	Total	Rate
Offline / Connectivity	2	5	40%
US Industry Verticals	7	23	30%
Agency Selection / Outsourcing	12	48	25%
Vendor / Team Management	8	36	22%
AI Features / Strategy	3	14	21%
On-Device AI	3	15	20%
Tech Stack / Framework	5	27	18%
Cost / Pricing / ROI	4	29	13%
Field Service / Logistics	2	17	11%
Compliance and Security	1	24	4%

Cause 4: date manipulation while google was crawling

The publication dates were changed 4 times in 48 hours after publishing, while Google was actively crawling the new domain:

Date	Commit	What changed
Apr 25	f9c9bd0	All dates pushed back to Feb 2025
Apr 25	c9141e2	Same day: dates rolled back to Jan 2026
Apr 26	95b275b	Dates changed to Oct 2025 spread
May 1	b0334f1	Second batch: all backdated to 2025-2026

What the data says about prevention

The domain authority did not change between April 24 and May 4. The content quality is comparable. The only variables are batch size and date stability.

From the investigation, the failure modes are straightforward to avoid:

Publish one article per keyword cluster per run. Google's cluster evaluation happens when multiple articles targeting the same primary intent appear simultaneously. Give it time to evaluate one before the next arrives.
Use the real publication date. The date the article actually deploys. No backdating to simulate publishing history.
Write titles for the specific query someone would type. CTR during the 7-10 day honeymoon window determines survival. A title that describes the article instead of answering the query will fail that evaluation.
Keep runs small. 3-5 articles, not 249. The honeymoon window needs time to do its job before the next batch competes for the same crawl attention.

frequently asked questions

does google index all articles from a content pipeline on launch day?

what is google cluster pruning and how does it cause deindexing?

does changing publication dates after google crawls an article hurt indexing?

if 247 articles got deindexed, can they be recovered?

more writing

ABHK®

why 247 of 299 articles got deindexed in 2 weeks

The setup

What the data showed

Cause 1: 249 articles in one day

Cause 2: CTR during the honeymoon window

Cause 3: cluster pruning

Cause 4: date manipulation while google was crawling

What the data says about prevention

frequently asked questions

ABHK®

Loading

ABHK®

why 247 of 299 articles got deindexed in 2 weeks

The setup

What the data showed

Cause 1: 249 articles in one day

Cause 2: CTR during the honeymoon window

Cause 3: cluster pruning

Cause 4: date manipulation while google was crawling

What the data says about prevention

frequently asked questions