Common Crawl WebGraph Rankings • 2023-2025
Harmonic Centrality measures how close a domain is to all other domains in the web graph. A domain with a low HC rank (better) can reach many other domains through fewer link hops.
HC is particularly good at identifying domains that serve as central hubs in the web's link structure — sites that are well-connected and easily reachable from anywhere on the web.
PageRank measures a domain's authority based on the quality and quantity of links pointing to it. Domains linked by other high-authority sites get higher PageRank scores.
Originally developed by Google founders, PageRank identifies domains that are trusted and referenced by other important websites — a signal of credibility and influence.
Both metrics are computed by Common Crawl from their WebGraph dataset, which analyzes billions of links across the web.
18.2 million unique domains
Top 10 million per time period
5 periods: 2023-2025
Instant lookups via Cloudflare KV
94-163 million domains per period
607 million total records
Raw .gz files (~2-3 GB each)
Available at commoncrawl.org
This tool indexes the top 10 million domains per time period (by HC rank), resulting in ~18.2 million unique domains. If your domain isn't found, it likely ranks below #10,000,000 in all indexed periods.
Domains outside the top 10M per period are in the "long tail" — they exist in Common Crawl's full dataset but aren't indexed here for performance reasons.
Common Crawl releases new WebGraph data every month. This tool includes:
'23 Mid (Mar-May-Oct) • '23 Late (Sep-Nov-Feb) • '24 Early (Feb-Apr-May) • '24 Mid (May-Jun-Jul) • '25 Late (Oct-Nov-Dec)
Domain rankings fluctuate as the web evolves. New links are created, old ones disappear, and new domains emerge. A domain gaining quality backlinks will see improved ranks, while losing links causes ranks to drop.
👑 Elite = Top 100 | 🥇 Top 1K = #101-1,000 | 🥈 Top 10K = #1,001-10,000
🥉 Top 100K = #10,001-100,000 | 📈 Top 1M = #100,001-1,000,000 | 📊 Long Tail = #1,000,001+
Special thanks to Greg Lindahl, CTO of Common Crawl, for sharing information about their new Host Index data product. I'm currently working on integrating this feature!
The Host Index is a new Common Crawl data product containing one row per web host with rich aggregated data: