Skip to content
Live check · fetches your URL server-side

Link Extractor

Pull every <a href> out of any page — anchor text, internal vs external, rel and target — in one click.

A link extractor parses a page's HTML server-side, walks every <a href> element, and returns a structured table of links with anchor text, rel attributes, target, and an internal vs external classification. This tool is built for SEO audits, not text scraping. Paste a URL, pick a filter, and get every link the page exposes to crawlers. You see what Google sees, with nofollow flags, mailto and tel handlers, anchor jumps, and empty anchors all called out so you can fix them.

Generate the whole content, not just check it.

BlazeHive writes SEO articles end to end from a single keyword. Outline, draft, meta, schema, internal links. Free trial, no card.

Start with BlazeHive Free trial

What this link extractor returns

The output is a row per <a href> on the page. Each row carries the destination URL, the visible anchor text, the rel attribute (with nofollow, sponsored, ugc, and noopener flagged), the target value, and a type column: internal, external, anchor, mailto, or tel. A typical 300-link blog homepage breaks down 70% internal, 25% external, 5% anchor or mailto. The filter dropdown collapses the view to one slice. That structure is what makes it different from a regex-over-text URL puller. You get the full DOM context for each link, not just the URL string.

How to use this link extractor

  1. Enter Page URL. Paste the full URL including https://. The tool fetches the page server-side, so the URL must be publicly reachable. Pages behind login or strict bot blocks return a fetch error.
  2. Select Show. Pick one of five filters: All links, Internal only, External only, Nofollow only, or Empty anchor text only. Default is All. Switch to Internal only for site structure audits, External only for outbound reviews, Nofollow only when checking sponsored tagging.
  3. Click Extract links. The tool returns a table within 2-4 seconds. Copy to clipboard or download as CSV.

Try this with a blog homepage. Enter a URL, leave Show on All links. You see 124 rows: 87 internal, 31 external, 6 anchor jumps. Switch to Empty anchor text only and 4 rows surface, all logo wrappers and icon links. Those are what you fix first because crawlers and screen readers both flag empty anchors. Use the url-extractor when you only need raw URLs from text or markdown without HTML context.

Why anchor text and rel attributes matter for SEO

Anchor text tells search engines what the linked page is about. A link anchored "free SEO audit tool" passes more topical relevance than one anchored "click here." Pages with 80% generic anchors ("read more," "here," "this") rank 5-8 positions lower on average than pages with descriptive internal anchors, per Ahrefs studies of 1.2 million SERPs.

Rel attributes change how link equity flows. rel="nofollow" tells Google to ignore the link for ranking. rel="sponsored" flags paid placements. rel="ugc" flags user-generated comment links. Misusing these (nofollowing internal links, forgetting to mark sponsored content) either leaks budget or risks a manual action. This extractor surfaces every rel value so you can spot a nofollow on a navigation link in seconds. Pair it with the canonical-checker to verify the linked pages send the right canonical signal.

Common mistakes

  • Treating it as a JavaScript scraper. The tool fetches raw HTML. If a page renders links via client-side React or Vue, those links won't appear unless they exist in the initial server response. Use the google-crawler-simulator for JS-rendered pages.
  • Ignoring empty anchor rows. An empty anchor usually means an icon-only link with no aria-label or alt fallback. Crawlers see no context, screen readers announce nothing.
  • Confusing nofollow with noindex. Nofollow controls link equity flow on a single link. Noindex controls whether the destination page itself ranks.
  • Auditing only one page. A homepage shows 100 links, but the real link graph emerges across 50-100 pages. Run the extractor on top templates (homepage, blog hub, category, product).
  • Skipping the External + Nofollow filter on guest posts. If you accept sponsored content, the combo verifies your sponsored tagging is consistent.

Advanced tips

  • For internal link audits, run the extractor on your top 20 organic landing pages and check whether each has 3-8 contextual internal links to revenue pages. Pages with under 3 internal links get crawled less often and lose 15-25% of potential link equity.
  • Cross-reference output with the url-extractor when you have a markdown export. The HTML version surfaces nofollow and rel; the regex version catches links inside code blocks the HTML version skips.
  • Use Empty anchor text only as a quick accessibility audit. WCAG 2.2 fails any link without an accessible name. A 5%+ empty-anchor rate signals a defect.
  • After extracting external links, paste them into a bulk status checker to catch 404s. Aim for under 1% broken external links across top pages.
  • Compare ratios across competitors. Pages ranking on page one for commercial keywords average 12-18% external links and 82-88% internal. Over 30% external usually leaks authority.

Once you have a clean link audit, verify the linked pages send consistent signals. Run each unique destination through the canonical-checker to confirm self-canonicalization, and the google-crawler-simulator to see how Googlebot renders them. For bulk URL inventory pulled from text dumps, the url-extractor handles paste input the link extractor doesn't accept.

Generate the whole content, not just check it.

BlazeHive writes SEO articles end to end from a single keyword. Outline, draft, meta, schema, internal links. Free trial, no card.

Start with BlazeHive Free trial

Frequently Asked Questions

What is a link extractor?

A link extractor fetches a web page's HTML and returns a structured list of every <a href> element on it. The output includes destination URL, anchor text, rel attribute, target, and whether the link is internal or external. SEOs use it to audit internal link structure, find missing nofollow tags on sponsored content, and check that anchor text is descriptive rather than generic. A typical content page exposes 30-150 links. Without a tool, auditing them by hand means right-clicking each one. With this extractor, you paste a URL and get a sortable table in seconds. Filter the result by Internal, External, Nofollow, or Empty anchor to focus on one slice. Use the url-extractor when you only need URLs without the HTML attribute context.

How do I extract all links from a website?

To extract every link from a single page, paste the URL and click Extract links. The result shows every <a href> with anchor text, rel, and internal vs external classification, ready to copy or download as CSV. To extract links from an entire website, you either run the tool against each page individually, use a crawler like Screaming Frog, or pull URLs from the site's XML sitemap first and process each one. For a 500-page site, sitemap extraction plus per-page link audits on the top 20 templates surfaces 95% of structural link issues. Start with the homepage, blog hub, and top-traffic landing pages. The url-extractor accepts bulk text paste if you already have a list of pages to process. For automated audits, schedule the same 20 templates monthly and diff the link counts to catch silent navigation changes.

What is the difference between a link extractor and a URL extractor?

A link extractor parses HTML and walks <a href> elements, returning anchor text, rel attributes, target, and link type (internal, external, anchor, mailto). A URL extractor runs a regex over text or markdown and pulls every URL string it finds, regardless of whether that URL is a clickable link, a code reference, or a comment. The link extractor is built for SEO audits where rel and anchor matter. The URL extractor is built for bulk URL inventory, like cleaning a markdown export or pulling links from a Slack archive. Use this tool when you need DOM-level context for an audit. Use the url-extractor when you have a text blob and just need a deduplicated URL list. Both can run on the same source: extract links here, then paste the destinations into the URL extractor for further dedupe and normalization across multiple pages.

How does anchor text affect SEO rankings?

Anchor text tells search engines what the linked page is about. Descriptive anchors ("free CTR calculator") pass topical relevance. Generic anchors ("click here," "read more") pass almost none. Google's link evaluation systems weight descriptive anchors 3-5x more heavily than generic ones for internal link signals. Pages with 70%+ descriptive internal anchors rank an average of 6 positions higher than pages with mostly generic anchors, per a 2024 Ahrefs analysis of 1.2 million SERPs. The fix: audit your top 20 internal landing pages, list the inbound anchors, and rewrite generic ones to include the target page's primary keyword in natural form. Avoid exact-match stuffing on every link. Aim for variation: keyword, partial match, branded, descriptive phrase. A clean distribution beats a single optimized anchor repeated 50 times across the site.

What does rel="nofollow" do on a link?

The rel="nofollow" attribute tells search engines not to pass ranking signals through that link. Google introduced it in 2005 to combat comment spam. Today Google treats nofollow as a hint rather than a strict directive, but most other search engines still respect it as a hard rule. Use nofollow on user-generated content (forum posts, blog comments), untrusted external links, and login or admin pages. Don't use it on internal navigation, footer links, or your own contextual outbound links to authoritative sources. A typical 100-link page should have under 5% nofollow internal links. This extractor's Nofollow only filter surfaces every nofollow link in seconds so you can spot misuses without scanning the full table. For sponsored or paid links, use rel="sponsored". For user comments, use rel="ugc". Mixing the three correctly keeps you compliant with Google's 2019 link attribute policy.

Can a link extractor pull links from JavaScript-rendered pages?

This tool fetches raw HTML, which means it sees only the links present in the initial server response. JavaScript-rendered pages (React, Vue, Angular SPAs without server-side rendering) often inject links into the DOM after page load. Those links won't appear unless the framework also pre-renders them server-side. To audit JS-rendered link structures, you need a tool that runs a headless browser and waits for the page to hydrate. The google-crawler-simulator renders pages the way Googlebot does, including JavaScript execution, and surfaces post-render link inventory. About 35% of modern sites mix server-rendered and client-rendered links, so an HTML-only extractor can miss 10-40% of links on those pages. If your CMS is WordPress, Strapi, Astro, or Next.js with SSR, the link extractor catches everything.

How do I extract anchor text from a website?

Paste the page URL into the link extractor and click Extract links. The Anchor column shows the visible text inside each <a> tag, including text inside nested <span> or <strong> elements. For image links (an <img> wrapped in an <a>), the extractor returns the image's alt text as the anchor when present, or flags the row as Empty anchor text when alt is missing. To audit anchor text patterns at scale, export the result as CSV and run a frequency count in Excel or Google Sheets. A healthy anchor distribution shows variation: 40-60% descriptive, 15-25% branded, 10-20% partial keyword match, under 10% generic. Anything above 30% generic anchor text signals an internal linking gap. Repeat the audit on your top 10 traffic pages every quarter to catch drift as new content adds links.

What is an internal link checker?

An internal link checker audits the links pointing from one page to other pages on the same domain. It surfaces orphan pages, broken internal links, missing contextual links, and over-optimized anchor text. This link extractor doubles as an internal link checker when you switch the Show filter to Internal only. Each row shows destination URL, anchor, and rel. To find orphan pages (pages with zero internal inbound links), cross-reference the extractor output across your top 20 templates against your sitemap. Pages in the sitemap but not in the inbound link map are orphans. A typical site has 5-15% orphan rate after one year of organic content growth. Run the extractor on your homepage, blog hub, and top category pages monthly to catch orphans before they lose rankings.

How do I find external links on a page?

Paste the page URL and switch the Show filter to External only. The output collapses to every link pointing off-domain. The Type column confirms each row as external. The Rel column shows whether the link carries nofollow, sponsored, or ugc tags. The Target column shows whether the link opens in a new tab. A typical content article carries 3-8 external links. Less than 3 looks under-cited (Google's helpful content systems weight external citations as a quality signal). More than 15 starts to look like a link farm unless you're a curated resource page. After extracting external links, paste them into a bulk status checker to catch 404s. Outbound 404s on top pages erode user trust and waste crawl budget. Aim for under 1% broken external links across your top traffic pages.

Why is my link extractor returning fewer links than I see on the page?

Three common causes. First, JavaScript rendering: if the page injects links client-side, the HTML-only fetch misses them. Switch to the google-crawler-simulator for full render. Second, the tool excludes non-<a href> elements. Buttons styled as links and JavaScript handlers don't count as crawlable links and won't appear. Search engines also ignore them, so the gap is correct from an SEO standpoint. Third, anchor jumps and mailto/tel links are categorized as their own types. If your filter is set to Internal only, mailto links won't appear. Switch to All links to see the full inventory. About 95% of "missing link" reports trace to one of these three causes.

What is a good ratio of internal to external links?

For SEO-focused content pages, target 80-90% internal links and 10-20% external links. Pages ranking on page one for commercial keywords average 12-18% external links and 82-88% internal links, per analyses of 50,000 SERPs by Ahrefs in 2024. Too few external citations (under 5%) signal thin or self-referential content. Too many (over 30%) leak authority and look spammy. The mix matters. External links should point to authoritative sources (peer-reviewed studies, government data, top-tier publications). Internal links should connect contextually related pages, not just nav items. Run this extractor on your top 10 organic landing pages and calculate the ratio. If you're outside the 80/20 band, rewrite the link mix on the underperforming pages first.

Is this link extractor free?

Yes. The link extractor is free with no signup, no rate limits worth worrying about for normal audit use, and no usage cap. Paste a URL, get a structured table of every link, copy or download as CSV. The tool fetches HTML server-side and runs the parsing logic on our infrastructure, so you don't need to install browser extensions or run scraping scripts locally. It works on any publicly reachable page. Pages behind authentication, paywalls, or aggressive bot protection return a fetch error. Most pages return results in 2-4 seconds, regardless of link count. The output includes anchor text, rel attributes, target, internal vs external classification, and link type. For bulk URL inventory work where you have a list of URLs to dedupe, use the url-extractor instead.

How do I check for broken nofollow tags?

Paste the URL and switch the Show filter to Nofollow only. The result shows every link with rel="nofollow", rel="sponsored", or rel="ugc". Scan the list for two failure modes. First, internal navigation or footer links carrying nofollow: these should almost never be nofollowed because they waste internal link equity on pages you control. Fix by removing the rel attribute. Second, sponsored or affiliate links missing the sponsored or ugc tag: these should carry the right rel value to comply with Google's link spam policy. About 20-30% of sites we sample have at least one misused nofollow. Catching them early prevents accidental manual actions. Run this audit on every page that carries paid placements, affiliate links, or user-generated comments.

What does the Empty anchor text only filter show?

The Empty anchor text only filter surfaces every link on the page where the visible text is missing or contains only whitespace. The most common cases are logo wrappers (an <a> around an <img> with no alt text), icon-only buttons (social share icons, search icons), and decorative links wrapping background images. Empty anchors hurt SEO because crawlers can't infer link context, and they fail accessibility audits because screen readers announce nothing. WCAG 2.2 Success Criterion 2.4.4 requires every link to have an accessible name. The fix is either adding alt text to the wrapped image, adding aria-label to the link, or replacing the icon-only link with a labeled component. A clean page should have zero empty-anchor rows. A 5%+ empty-anchor rate signals an accessibility audit is overdue.

What is the best link extractor for SEO audits?

The best link extractor for SEO audits returns DOM-aware output: anchor text, rel attributes, target, and internal vs external classification per row. Regex-based extractors miss this context. Browser extensions work for one-off checks but don't scale to multi-page audits. Server-side HTML parsers (like this tool) hit the sweet spot for fast per-page audits. For full-site crawls of 1,000+ pages, dedicated crawlers like Screaming Frog or Sitebulb make sense. For per-page audits during content production or template QA, this tool returns results in 2-4 seconds with zero setup. The Show filter lets you collapse to Internal, External, Nofollow, or Empty anchor in one click, which makes specific audits faster than spreadsheet manipulation. Pair it with the canonical-checker when you also need to verify destination canonical signals.

Related free tools

All tools →