What this link extractor returns
The output is a row per <a href> on the page. Each row carries the destination URL, the visible anchor text, the rel attribute (with nofollow, sponsored, ugc, and noopener flagged), the target value, and a type column: internal, external, anchor, mailto, or tel. A typical 300-link blog homepage breaks down 70% internal, 25% external, 5% anchor or mailto. The filter dropdown collapses the view to one slice. That structure is what makes it different from a regex-over-text URL puller. You get the full DOM context for each link, not just the URL string.
How to use this link extractor
- Enter Page URL. Paste the full URL including
https://. The tool fetches the page server-side, so the URL must be publicly reachable. Pages behind login or strict bot blocks return a fetch error. - Select Show. Pick one of five filters: All links, Internal only, External only, Nofollow only, or Empty anchor text only. Default is All. Switch to Internal only for site structure audits, External only for outbound reviews, Nofollow only when checking sponsored tagging.
- Click Extract links. The tool returns a table within 2-4 seconds. Copy to clipboard or download as CSV.
Try this with a blog homepage. Enter a URL, leave Show on All links. You see 124 rows: 87 internal, 31 external, 6 anchor jumps. Switch to Empty anchor text only and 4 rows surface, all logo wrappers and icon links. Those are what you fix first because crawlers and screen readers both flag empty anchors. Use the url-extractor when you only need raw URLs from text or markdown without HTML context.
Why anchor text and rel attributes matter for SEO
Anchor text tells search engines what the linked page is about. A link anchored "free SEO audit tool" passes more topical relevance than one anchored "click here." Pages with 80% generic anchors ("read more," "here," "this") rank 5-8 positions lower on average than pages with descriptive internal anchors, per Ahrefs studies of 1.2 million SERPs.
Rel attributes change how link equity flows. rel="nofollow" tells Google to ignore the link for ranking. rel="sponsored" flags paid placements. rel="ugc" flags user-generated comment links. Misusing these (nofollowing internal links, forgetting to mark sponsored content) either leaks budget or risks a manual action. This extractor surfaces every rel value so you can spot a nofollow on a navigation link in seconds. Pair it with the canonical-checker to verify the linked pages send the right canonical signal.
Common mistakes
- Treating it as a JavaScript scraper. The tool fetches raw HTML. If a page renders links via client-side React or Vue, those links won't appear unless they exist in the initial server response. Use the google-crawler-simulator for JS-rendered pages.
- Ignoring empty anchor rows. An empty anchor usually means an icon-only link with no aria-label or alt fallback. Crawlers see no context, screen readers announce nothing.
- Confusing nofollow with noindex. Nofollow controls link equity flow on a single link. Noindex controls whether the destination page itself ranks.
- Auditing only one page. A homepage shows 100 links, but the real link graph emerges across 50-100 pages. Run the extractor on top templates (homepage, blog hub, category, product).
- Skipping the External + Nofollow filter on guest posts. If you accept sponsored content, the combo verifies your sponsored tagging is consistent.
Advanced tips
- For internal link audits, run the extractor on your top 20 organic landing pages and check whether each has 3-8 contextual internal links to revenue pages. Pages with under 3 internal links get crawled less often and lose 15-25% of potential link equity.
- Cross-reference output with the url-extractor when you have a markdown export. The HTML version surfaces nofollow and rel; the regex version catches links inside code blocks the HTML version skips.
- Use Empty anchor text only as a quick accessibility audit. WCAG 2.2 fails any link without an accessible name. A 5%+ empty-anchor rate signals a defect.
- After extracting external links, paste them into a bulk status checker to catch 404s. Aim for under 1% broken external links across top pages.
- Compare ratios across competitors. Pages ranking on page one for commercial keywords average 12-18% external links and 82-88% internal. Over 30% external usually leaks authority.
Once you have a clean link audit, verify the linked pages send consistent signals. Run each unique destination through the canonical-checker to confirm self-canonicalization, and the google-crawler-simulator to see how Googlebot renders them. For bulk URL inventory pulled from text dumps, the url-extractor handles paste input the link extractor doesn't accept.