Brad Holmes web developer, designer and digital strategist.

Dad, husband and dog owner. Most days I’m trying to create fast, search-friendly websites that balance UX, Core Web Vitals, and digital strategy from my studio in Kettering, UK.

If you’re here, you either found something I built on Google or you’re just being nosey. Either way, this is me, the work, the thinking, and the bits in between.

Brought to you by Brad Holmes

Search Console

How Junk URLs Quietly Kill Organic Growth

Brad Holmes By Brad Holmes
14 min read

Search Console creates the illusion that Google is monitoring your site.

It is not.

Your site is not the Field of Dreams. It is not a case of build it and they will come. Search Console is showing you how much attention Google is actually giving you.

Every report is a visibility ledger. It shows where Google spends time, what it keeps revisiting, and what it quietly ignores. It shows which parts of your site are trusted enough to be refreshed often and which parts are left to decay between crawls.

This is why two sites with similar content can behave completely differently.

I don’t care what people say crawl budget is not a myth. That budget controls three things:

  • How often your pages are visited
  • How quickly Google trusts changes
  • How stable your rankings are over time

You can see it directly inside Crawl Stats and Index Coverage.

When crawl budget is diluted, your important pages are crawled less often. That slows reindexing, weakens freshness signals, and reduces how aggressively Google tests your pages in competitive results.

Google is like any other company. It does not have endless resources and it cannot give every website the same level of attention. A small hobby site about knitting mittens for sick kittens is not going to receive the same crawl priority as a large recipe platform that publishes constantly and drives real search demand. Not because one is better than the other, but because one justifies more attention. Search Console is showing you where your site sits inside that attention economy.

This is where growth quietly stalls.

Most sites are not losing because their content is bad. They are losing because their best pages are not being revisited often enough to accumulate trust.

What To Look For In Search Console

Query strings, author pages, and feed URLs are some of the most common ways crawl budget gets quietly burned.

Most CMS platforms generate them automatically. They look harmless because they are part of the system. But they often create large numbers of low-value URLs that Google keeps discovering and revisiting.

Individually they do not look like a problem. Collectively they consume a meaningful share of your crawl capacity.

The result is that Google spends more time discovering and re-discovering junk URLs than revisiting your real commercial pages. Your important pages stay indexed, but they are refreshed less often, tested less aggressively, and compound more slowly than they should.

This is one of the most common structural reasons mature sites stall even when content quality is strong.

organic growth

Pages → Indexed and sort by “Excluded”.

If large portions of your crawl activity are being spent on:

  • Filtered URLs
  • Internal search pages
  • Tag archives
  • Tracking parameters
  • Duplicated category paths

your crawl budget is being wasted.

Those URLs are stealing crawl priority from your real commercial and routing pages.

How To Fix Crawl Budget Dilution (Without Nuking Your Site)

This is the minimum-force, maximum-impact fix stack. These are structural controls, not SEO tricks.

1. Kill Parameter Crawl at the Edge

What to do

Do this at CDN or server level. Do not reach for a WordPress plugin or an application-layer fix. That sits too far down the request chain and allows Google to continue discovering and crawling junk URLs before any rules are applied. You want this intercepted as early as possible so those URLs are never treated as crawlable pages in the first place. This is where you get the largest reduction in crawl waste and the fastest recovery in crawl priority.

Block or canonicalise:

  • ?filter=
  • ?sort=
  • ?page=
  • ?utm_
  • ?ref=
  • Any non-core query strings

Why: Google will discover infinite URL variations. You must end that discovery loop. If you do not stop it at the edge, Search Console cannot save you.

Cloudflare Workers (best for “edge-first” control)

This is my favourite way but not everyone uses cloudflare:

  • 301 redirects marketing/tracking params away (keeps one clean canonical URL)
  • Blocks known crawl-waste params (filter/sort/page etc.) with 410
export default {
  async fetch(request) {
    const url = new URL(request.url);

    // Params that create infinite URL variants
    const wasteParams = new Set(["filter", "sort", "page", "orderby", "facet", "q", "s"]);

    // Params that are safe to strip (marketing only)
    const stripParams = ["utm_source", "utm_medium", "utm_campaign", "utm_term", "utm_content", "gclid", "fbclid", "ref"];

    // If URL contains waste params, return 410 to discourage recrawl
    for (const key of url.searchParams.keys()) {
      if (wasteParams.has(key)) {
        return new Response("Gone", { status: 410 });
      }
    }

    // If URL contains strip params, redirect to clean URL
    let stripped = false;
    for (const p of stripParams) {
      if (url.searchParams.has(p)) {
        url.searchParams.delete(p);
        stripped = true;
      }
    }

    if (stripped) {
      url.search = url.searchParams.toString();
      return Response.redirect(url.toString(), 301);
    }

    return fetch(request);
  },
};


If you do not want to 410 anything, swap the 410 block for a redirect-to-clean URL instead. you should also consider adding no authorised flags to things like account pages like logins, password recoveries

Nginx (server-level)

Option A: 301 strip tracking params

# Redirect URLs containing utm_* or gclid/fbclid/ref to the clean URL
if ($args ~* "(^|&)(utm_source|utm_medium|utm_campaign|utm_term|utm_content|gclid|fbclid|ref)=") {
  return 301 $scheme://$host$uri;
}

Option B: 410 filter/sort/page style params

# Treat these as crawl-waste
if ($args ~* "(^|&)(filter|sort|orderby|facet|page|s|q)=") {
  return 410;
}

Apache .htaccess (server-level)

Strip tracking params with a 301

RewriteEngine On

# If query contains utm_* or gclid/fbclid/ref, redirect to the same path without query string
RewriteCond %{QUERY_STRING} (^|&)(utm_source|utm_medium|utm_campaign|utm_term|utm_content|gclid|fbclid|ref)= [NC]
RewriteRule ^ %{REQUEST_URI}? [R=301,L]

410 common crawl-waste params

RewriteCond %{QUERY_STRING} (^|&)(filter|sort|orderby|facet|page|s|q)= [NC]
RewriteRule ^ - [G,L]

If you got this far and your mind is blown its all code and you do not know what your doing please do not start messing around with stuff. speak to your host or someone that understands making mistakes at this point is a bad move. you can easily reach me from my contact form if your really stuck.

2. Noindex All Internal Search Results

Noindexing internal search results does not make Search Console go completely quiet. You will still see discovery and crawl noise around those URLs, especially on WordPress and other CMS platforms that expose search endpoints by default.

That noise is intentional.

You are not trying to hide these URLs. You are teaching Google that they are not indexable content. Google will still discover them, but it will stop trying to treat them as part of your searchable surface area. Over time this dramatically reduces how much crawl attention they consume compared to leaving them indexable.

The goal is not silence.

The goal is containment.

What to do

Add noindex, follow to:

  • /search/
  • ?s=
  • Any internal results system

Why: Internal search URLs multiply geometrically and get re-discovered forever. They are the single most common crawl budget leak.

3. Collapse Author, Tag, and Archive Pages

This is not a rule. It is a test.

Some sites genuinely use author, tag, and archive pages as real routing layers. Most do not. Most of them exist because the CMS generates them, not because the site architecture requires them.

If those pages own a clear intent, attract links, and rank independently, keep them and treat them as real hubs.

If they do not, they are not content. They are crawl sinks.

Do not assume either way. Test it.

Noindex or redirect a controlled group, then watch crawl stats, index coverage, and ranking stability on your real pages over the following weeks. If crawl frequency on your money pages increases and rankings stabilise, you have your answer.

This is how you collapse crawl waste without guessing.

What to do

Choose one:

  • Noindex them
  • Or redirect them to primary category/topic pages
  • Or canonicalise them into the main content hub

Do not leave them floating.

Why: These pages look like content but do not own intent. They create duplicate topic clusters and split authority.

4. Block Feed Endpoints

Feeds are one of the most pointless crawl drains still left on modern CMS platforms.

They are legacy distribution systems that made sense before social, before APIs, and before modern syndication. Today they rarely serve a real discovery purpose, but they continue to generate constantly refreshing URLs that Google treats as crawlable HTML.

For that reason, I do not stop at noindex.

I block them at the server layer.

If a URL should never be indexed, it should never be crawled repeatedly either. Feeds do both. They change constantly, they get rediscovered endlessly, and they provide no indexable value.

Removing them at the server or edge layer eliminates a permanent crawl leak rather than just telling Google to ignore the output.

This is one of the fastest ways to reduce crawl noise on CMS-driven sites.

What to do

Noindex or block:

  • /feed/
  • /rss/
  • /atom/
  • Any legacy syndication endpoints

Why: Feeds get crawled constantly and return low-value HTML. They consume crawl without adding index value.

5. Canonicalise Duplicated Category Paths

This is also architectural, not universal. Some sites intentionally use category and topic paths as curated routing layers with their own context, copy, and internal linking logic. In those cases, multiple category paths may be valid and should be treated as real hubs.

If your category paths are simply uncurated content feeds generated by the CMS, they are not hubs. They are duplicate discovery layers. That is where canonicalising and collapsing them makes sense. Test it in controlled groups and watch crawl stats and ranking stability on your real pages before rolling it out site-wide.

What to do

If:

  • /blog/category/seo/
  • /category/seo/
  • /topics/seo/

Pick one canonical version and redirect the rest.

Why: Multiple category paths fragment crawl frequency and link equity.

6. Reduce Crawl Depth for Money Pages


Depth is critical. If you have read this far, you are already looking at your site architecture more seriously than most people ever do. Google allocates crawl priority based on perceived structural importance, and structural importance is defined by depth. Pages that sit one or two clicks from the homepage are treated as core assets. Pages that are buried behind pagination, filters, and tag trails are treated as secondary, even if they are commercially important. If your money pages live deep in the site tree, they will always be crawled later, tested less aggressively, and compound more slowly than pages that are structurally close to the root.

What to do

Ensure:

  • Core commercial and routing pages are within 1–2 internal clicks from the homepage
  • They are not buried inside pagination, filters, or tag trails

Why: Google allocates crawl priority by perceived structural importance. If a page is buried, it is treated as secondary.

7. Shrink Your Sitemap to Only Real Pages


This looks obvious on paper, but it is one of the most consistently abused parts of modern sites. Many sitemaps are nothing more than a full export of the CMS database. They include tag pages, filtered URLs, archives, feeds, and internal search results. That turns the sitemap into a list of everything that exists rather than a description of what actually matters. A sitemap should represent your real site architecture. If a page is not part of your routing or commercial layer, it does not belong in your sitemap.

What to do

Only include:

  • Money pages
  • Routing pages
  • Core topic cluster pages

Remove:

  • Tags
  • Feeds
  • Filters
  • Archives
  • Search results

Why: A sitemap should describe your site architecture, not your CMS exhaust fumes.

8. Watch Crawl Stats After Fixes

This is not a one-time clean-up. It becomes part of your ongoing workflow. Crawl stats are your confirmation layer. After changes, you should see overall crawl volume drop, crawl frequency on your real commercial pages increase, index coverage stabilise, and ranking volatility reduce. That is how you know crawl attention has been redirected back onto the pages that actually drive growth.

Within 2–4 weeks you should see:

  • Crawl volume drop
  • Crawl frequency on money pages increase
  • Index coverage stabilise
  • Rankings become less volatile

This is the confirmation layer.


This is not the complete picture of Search Console, and it is not meant to be. It is the small part that actually affects how your site compounds over time.

Search Console contains dozens of reports, warnings, and enhancement prompts, and most of them are hygiene rather than leverage. They help keep your site technically tidy, but they do not change how much attention Google gives you or how aggressively your pages are tested. Crawl priority, evaluation-stage page experience, and query-to-page alignment sit underneath all of that.

They shape what gets revisited, what gets trusted, and what is allowed to scale. If those layers are working properly, most of the remaining reports become background noise rather than growth drivers.


Frequently Ask Questions

Does crawl budget matter for small sites, or only large ones?

Yes, it matters for small sites, but it shows up differently. On smaller sites the issue is rarely total crawl volume. It is crawl distribution. A handful of junk URL patterns can quietly dominate discovery and keep real pages from being revisited often enough to compound properly.

Can too many low-value URLs actually slow down ranking improvements?

Yes. When Google spends most of its crawl time discovering and re-discovering low-value URLs, it revisits your important pages less often. That delays how quickly changes are trusted and how aggressively pages are tested in competitive results.

Is noindex enough, or do URLs need to be blocked completely?

Noindex tells Google not to index a page, but it does not stop repeated crawling. If a URL pattern creates infinite variants, blocking or collapsing it at the server or edge layer produces much stronger crawl recovery.

Why do impressions increase while clicks stay flat on some sites?

That pattern often indicates crawl dilution and query mismatch. Google is testing pages less frequently and across fragmented intent clusters, so pages never accumulate enough trust to move into dominant positions.

How do I know if my site has already hit a crawl ceiling?

You will see rising numbers of excluded URLs, long tails of pages crawled once and never revisited, unstable rankings on your most important pages, and slower-than-expected compounding even when content quality is strong.

What you should read next

Brad Holmes

Brad Holmes

Web developer, designer and digital strategist.

Brad Holmes is a full-stack developer and designer based in the UK with over 20 years’ experience building websites and web apps. He’s worked with agencies, product teams, and clients directly to deliver everything from brand sites to complex systems—always with a focus on UX that makes sense, architecture that scales, and content strategies that actually convert.

Thanks Brad, I found this really helpful
TOP