Skip to content
Live check · fetches your URL server-side

Google Crawler Simulator

See raw HTML vs. what Googlebot indexes after JS execution — side by side.

Googlebot does not see your page the same way a browser does. It fetches HTML, executes JavaScript, waits for network requests to complete, then indexes the final rendered output-but only if rendering finishes within the crawl budget. This Google crawler simulator fetches any URL as Googlebot Desktop, Googlebot Mobile, Googlebot-Image, Bingbot, or GPTBot, shows the raw HTML and the JavaScript-rendered HTML side by side, lists blocked resources, reports render-time warnings, and displays the visible text that actually gets indexed.

Generate the whole content, not just check it.

BlazeHive writes SEO articles end to end from a single keyword. Outline, draft, meta, schema, internal links. Free trial, no card.

Start with BlazeHive Free trial

What a Google crawler simulator actually does

A crawler simulator sends an HTTP request to your URL with a user-agent string matching the bot you select, fetches the response, and records the status code, headers, and raw HTML. Then it loads the page in a headless browser-Chrome with JavaScript enabled-waits for the DOM to stabilize, captures the final rendered HTML, and diffs it against the initial HTML to show what JavaScript changed.

It extracts the visible text-what a bot sees after stripping HTML tags, CSS, and scripts-because that is the content Google indexes. It flags resources that failed to load: images, fonts, CSS files, or JavaScript bundles blocked by CORS, 404s, or server errors. It checks for robots meta tags, X-Robots-Tag headers, and canonical tags that might prevent indexing even if the page loaded successfully.

Three categories of problem appear in every crawl test. The first is content missing from the raw HTML that only appears after JavaScript runs. If your hero headline or product description is client-rendered, Googlebot might not see it during the initial fetch. The second is render timeout. If JavaScript takes longer than five seconds to finish, Googlebot may index the incomplete page. The third is blocked resources. If your CSS or critical JavaScript files return 403 or 404, the page renders broken, and Googlebot sees a broken layout.

How to use this Google crawler simulator

  1. Paste the page URL into Page URL. Use the canonical version-https, www if applicable, no UTM parameters unless you are testing how parameters affect rendering.
  2. Pick a User-agent from the dropdown. Googlebot Desktop is the default. Googlebot Mobile simulates mobile-first indexing with a mobile viewport. Googlebot-Image tests image-specific crawling. Bingbot tests Bing's crawler. GPTBot simulates OpenAI's training crawler.
  3. Hit Simulate crawler. You get four sections: raw HTML, rendered HTML, visible text, and a resource log showing which files loaded or failed.
  4. Compare the Raw HTML and Rendered HTML tabs. If the rendered version has content missing from the raw, that content is JavaScript-injected. If render time exceeds five seconds, we show a warning.
  5. Check the Blocked resources list. Any resource that returned a non-200 status is flagged. If critical CSS or JavaScript is blocked, the page likely renders broken for Googlebot.
  6. Scroll to Visible text. This is what Google indexes. If your target keyword appears here, Google can rank the page for it. If it does not, the keyword is invisible.

Try simulating a single-page app built with React or Vue. The raw HTML often contains an empty <div id="root"></div> and a script tag. The rendered HTML shows the full page after JavaScript runs. If the render takes eight seconds because of slow API calls, we warn that Googlebot might time out and index the empty shell.

Why raw HTML vs rendered HTML matters

Google's indexing pipeline has two phases. The first is the initial fetch, where Googlebot downloads the raw HTML. The second is rendering, where Googlebot executes JavaScript in a headless Chrome instance and captures the final DOM. Rendering happens hours or days after the initial fetch, and not every page gets rendered. Pages with fast load times, strong internal linking, and no JavaScript errors are prioritized.

Three practical consequences.

Content in raw HTML indexes faster. If your H1, meta description, and first paragraph are in the initial HTML, Googlebot can index them immediately. If they appear only after JavaScript runs, indexing waits for the render queue. On a site with 10,000 pages, that delay can be days or weeks.

JavaScript errors block indexing. If your page throws a console error during render, Googlebot might see a blank page. Our simulator executes the page and captures console logs. An error like "Uncaught TypeError: Cannot read property 'map' of undefined" can prevent the entire page from rendering.

Render budget is finite. Google allocates a crawl budget and a render budget per site. If rendering your homepage takes 10 seconds, Google might render it less often than competitors whose pages render in two seconds. We report render time so you know if you are over budget.

Blocked resources and indexing impact

A blocked resource is any file-CSS, JavaScript, image, font-that the page tried to load but received a 4xx or 5xx status code, or that was blocked by robots.txt or CORS policy. Googlebot ignores the file and continues rendering, but the missing file can break layout or functionality.

Critical CSS files control layout. If styles.css is blocked by robots.txt, Googlebot renders the page with no styles, meaning content might be hidden by default CSS states-accordions collapsed, tabs hidden, modals off-screen. The content exists in the DOM but is not visible, so Google might not index it.

Critical JavaScript files control interactivity and data fetching. If app.js is blocked, client-side routing breaks, and links inside the app do not work. If api-client.js is blocked, your product page cannot fetch product data, so Googlebot sees a loading spinner instead of product details.

Images and fonts are less critical. A missing image does not break indexing, but it might hurt user experience signals if the page layout shifts or placeholders appear. A missing font falls back to system fonts, which is usually fine for indexing.

Our simulator lists every resource, its URL, status code, and type. If a resource failed, we show the error. If it was blocked by robots.txt, we flag it. Use this list to fix blocks at the server level or in your robots.txt file.

Mobile-first indexing and viewport

In 2026, Google uses mobile-first indexing for all sites. That means Googlebot Mobile is the primary crawler, and the mobile version of your page determines rankings even for desktop searches. If your mobile page hides content behind a "Read more" toggle or removes sidebar widgets, Googlebot does not see that content, and it does not count toward rankings.

Testing with Googlebot Mobile as the user-agent shows what the mobile crawler sees. We render the page with a 375px viewport-iPhone SE width-so you see the mobile layout. If your CSS hides elements at mobile widths, they are missing from the rendered HTML. If your JavaScript lazy-loads images or text as the user scrolls, and Googlebot does not scroll, that content is invisible.

Two fixes are common. The first is server-side rendering or static site generation, where the full content is in the raw HTML regardless of viewport. The second is ensuring that mobile CSS does not set display: none on important content. Use opacity: 0 or position: absolute; left: -9999px for accessibility, but even those can hurt indexing if overused.

Robots meta tags and X-Robots-Tag headers

Even if a page loads successfully, a <meta name="robots" content="noindex"> tag or an X-Robots-Tag: noindex HTTP header tells Googlebot not to index it. Our simulator checks for both and reports them in the summary.

Common values are noindex (do not add to search results), nofollow (do not follow links on this page), noarchive (do not cache), nosnippet (do not show a snippet in results), and none (equivalent to noindex, nofollow). If your staging site accidentally goes live with noindex tags still present, you lose all search traffic. Checking before launch catches this.

The HTTP header takes precedence over the HTML tag if both are present and differ. A page with <meta name="robots" content="index"> but X-Robots-Tag: noindex will not be indexed. Our simulator shows both so you can spot conflicts.

Common mistakes

  • Rendering the entire page client-side. If the raw HTML is empty and everything appears after JavaScript, indexing is slow and fragile. Move critical content into the initial HTML via server-side rendering or prerendering.
  • Blocking JavaScript or CSS in robots.txt. Google needs these files to render the page. Disallow: *.js or Disallow: *.css breaks rendering. Only block these if you have a strong reason, and re-check with the simulator afterward.
  • Ignoring console errors. A single uncaught exception can halt rendering. Check the console log in the simulator output and fix errors before deploying.
  • Testing only with a browser, not with Googlebot. Browsers are more forgiving than Googlebot. A page that works in Chrome might fail in headless Chrome due to missing polyfills or user-agent checks. Simulate Googlebot to see the real experience.
  • Assuming Googlebot scrolls. It does not. Lazy-loaded content triggered by scroll events is invisible unless you implement Intersection Observer or load everything on initial render.
  • Not testing after framework updates. A Next.js or Gatsby version bump can change how static generation works. Re-check rendering after updates to confirm content is still in the raw HTML.

Advanced tips

  • Test the same URL with Googlebot Desktop and Googlebot Mobile. If content differs, mobile-first indexing may rank the page differently than you expect.
  • Compare render time across pages. If your homepage renders in 2 seconds but product pages take 8 seconds, identify the slow API call or heavy script and optimize it.
  • Check the Visible text section for keyword presence. If your target keyword is in the HTML source but not in the visible text, it might be hidden by CSS or JavaScript, meaning it does not count toward rankings.
  • Use the simulator after deploying a new feature. A checkout flow, live chat widget, or analytics script can break rendering if it throws errors. Catching it post-deploy prevents indexing drops.
  • If blocked resources are found, cross-check with the robots.txt checker to confirm whether robots.txt is the cause. If not, check server logs for 403 or CORS errors.
  • Combine this tool with the website metadata checker to confirm that title, meta, and schema are present in the rendered HTML, not just the raw source.

After simulating, if you find that JavaScript is required for critical content, consider moving to server-side rendering or static generation. If blocked resources are the issue, update your robots.txt with the robots.txt file generator. If you want to see how all on-page SEO factors-rendering, metadata, canonicals, internal links-stack up, use the SEO checklist for a 20-point audit.

Generate the whole content, not just check it.

BlazeHive writes SEO articles end to end from a single keyword. Outline, draft, meta, schema, internal links. Free trial, no card.

Start with BlazeHive Free trial

Frequently Asked Questions

What is Googlebot?

Googlebot is the web crawler Google uses to discover, fetch, render, and index pages across the internet. It comes in two main types: Googlebot Desktop (simulates a desktop browser) and Googlebot Smartphone (simulates a mobile browser, which Google uses for mobile-first indexing). When Googlebot crawls your site, it follows links, reads your robots.txt file to see what is allowed, fetches the HTML, executes JavaScript if needed, and extracts text and metadata. Googlebot does not see your site the way a human does. It cannot interact with forms, click buttons that require user input, or bypass paywalls. It respects crawl budget (the number of pages it will fetch per session, based on your site's authority), so large sites may not get every page crawled. Googlebot identifies itself with a user-agent string that includes "Googlebot". You can verify Googlebot requests by reverse DNS lookup. Our simulator lets you see what Googlebot sees, including visible text, blocked resources, and render-time warnings.

How do I simulate a Google crawl?

Paste your Page URL into our tool, select Googlebot Desktop or Googlebot Mobile from the User-agent dropdown, and hit run. We fetch the page using the same user-agent string Googlebot uses, execute JavaScript to render the page, and extract the visible text, metadata, resources loaded, blocked resources, and robots directives. The output shows exactly what Googlebot sees: the rendered HTML after JavaScript execution, the text content Google indexes, any resources blocked by robots.txt, and warnings if render time exceeds five seconds. You also see canonical tags, meta robots, and hreflang declarations. This is critical for JavaScript-heavy sites (React, Next.js, Vue) where the initial HTML is a shell and the real content renders client-side. Compare the raw HTML view (what your server sends) to the rendered view (what Googlebot sees after executing JavaScript) to spot rendering issues. If content is missing from the rendered view, Google cannot index it. Use this tool before launching new pages or after JavaScript changes.

What is a web crawler?

A web crawler (also called a spider or bot) is a program that systematically browses the web by following links, fetching pages, and extracting data. Search engines use crawlers to discover and index content: Googlebot for Google, Bingbot for Bing, Yandex Bot for Yandex. Crawlers start with a seed list of URLs (from sitemaps or previously crawled links), fetch each page, parse the HTML to extract links, add new links to the crawl queue, and repeat. Crawlers respect robots.txt (a file that declares which paths are disallowed), follow canonical tags, and obey crawl rate limits. Not all crawlers are search engines. Some are data scrapers, research bots, or monitoring tools. Some crawlers are malicious (harvesting email addresses, scraping content without permission). You can identify crawlers by their user-agent string in server logs and block unwanted ones via robots.txt. For SEO, the most important crawlers are Googlebot, Googlebot-Image, Bingbot, and emerging AI crawlers like GPTBot, ClaudeBot, and PerplexityBot. Our tool simulates Googlebot and other major crawlers.

Why is Googlebot not crawling my site?

Five common causes: robots.txt is blocking Googlebot, your site has no internal or external links pointing to it, your sitemap is missing or broken, your pages return server errors, or you accidentally set a meta robots noindex tag. First, check robots.txt and confirm you do not have a Disallow: / rule. If you do, that blocks all crawlers. Second, confirm you submitted a sitemap to Google Search Console. If your sitemap is missing, Googlebot relies on link discovery, which can take weeks. Third, check server logs or Search Console's Crawl Stats report to see if Googlebot is receiving errors or timeouts. If your server is unstable, Googlebot reduces crawl frequency. Fourth, inspect your page source for a meta robots tag with noindex. This tells Googlebot to skip indexing. Fifth, confirm your site has internal links from the homepage. Orphan pages rely entirely on sitemaps. Use our tool to simulate a Googlebot crawl and confirm the page is accessible, renders correctly, and has no blocks.

How do I check if Googlebot can crawl my page?

Paste your Page URL into our tool, select Googlebot Desktop or Googlebot Mobile, and run the simulation. We fetch the page using Googlebot's user-agent, execute JavaScript to render it, and show exactly what Googlebot sees: visible text, metadata, blocked resources, canonical tags, and render-time warnings. If the page loads and renders successfully, Googlebot can crawl it. If we hit a 404, 403, 500, or timeout, Googlebot would hit the same error. If CSS or JavaScript files are blocked by robots.txt, we flag them. If the page takes longer than five seconds to render, we warn that this may hurt crawl budget. You can also use Google Search Console's URL Inspection tool: paste your URL, and Google fetches it live, renders it, and shows the indexed version. The advantage of our tool is speed (no login required, instant results) and comparison mode. Use this before launching new pages, after JavaScript changes, or when diagnosing indexing issues.

What is the difference between raw HTML and rendered HTML?

Raw HTML is what your server sends when a browser or crawler first requests a page, before any JavaScript executes. Rendered HTML is what the page looks like after JavaScript runs and modifies the DOM. For static sites or server-rendered sites, raw and rendered HTML are nearly identical. For client-rendered sites (React, Vue, Angular), the raw HTML is often a minimal shell and all content renders client-side after JavaScript executes. Googlebot fetches the raw HTML first, then waits for JavaScript to execute and renders the page in a headless Chrome browser. If your content only exists in the rendered HTML, it takes longer for Google to index because rendering is a second-pass operation. Our tool shows both views side by side: raw HTML (what your server sends) and rendered HTML (what Googlebot sees after executing JavaScript). If critical content is missing from the raw HTML and only appears in the rendered view, consider server-side rendering to improve indexing speed.

How do I trigger a Google crawl?

You cannot force Google to crawl on demand, but you can request indexing and make your site more crawl-friendly so Google prioritizes it. First, submit your sitemap to Google Search Console. This tells Google where all your pages are and when they were last updated. Second, use the URL Inspection tool in Search Console, paste your URL, and click Request Indexing. This pushes the URL to the front of Google's crawl queue. Third, add internal links to the new page from high-authority pages on your site because Googlebot follows links and prioritizes well-connected pages. Fourth, update the lastmod date in your sitemap.xml whenever you publish a page because Google uses this signal to prioritize fresh content. Fifth, avoid crawl budget waste by blocking low-value pages in robots.txt and using canonical tags. You can also ping Google manually by visiting google.com/ping?sitemap=yoursitemapurl after publishing new content. If your page is still not crawling after 48 hours, use our tool to simulate a Googlebot fetch.

Can Googlebot render JavaScript?

Yes, Googlebot can render JavaScript using a headless Chrome browser, but it happens in a second pass after the initial HTML fetch, which introduces a delay. Googlebot first fetches the raw HTML and scans for links, canonical tags, and meta robots directives. If the page is allowed, Google adds it to the rendering queue. A few hours to a few days later, Googlebot re-fetches the page, executes JavaScript, waits for the DOM to stabilize (up to five seconds), and indexes the rendered output. This two-pass system means JavaScript-heavy sites are slower to index than server-rendered sites. Pages that rely entirely on client-side rendering may take weeks to index fully. If your JavaScript fails to execute, Google indexes the empty shell and misses all your content. Our tool simulates this process by fetching raw HTML, executing JavaScript, and showing the rendered output. If critical content only appears after JavaScript execution, consider switching to server-side rendering or static generation to improve indexing speed.

What user-agents should I test?

Test Googlebot Desktop and Googlebot Mobile at minimum, because Google uses mobile-first indexing. If your site has different layouts or content for mobile versus desktop, test both to confirm parity. If you serve different content to mobile users, Google may index the mobile version and ignore desktop-only content. Also test Googlebot-Image if images are critical to your content (e-commerce, portfolios, galleries). Test Bingbot if Bing traffic matters to your business (it is the second-largest search engine in the US). Test GPTBot if you want to control how OpenAI crawls your content for ChatGPT training. You can block it via robots.txt if you do not want your content used. Test other AI crawlers (ClaudeBot, PerplexityBot, CCBot) if you care about AI training data or answer engines. Our tool supports all major crawlers, so you can test each one and confirm your robots.txt blocks are working. For most sites, Googlebot Mobile and Googlebot Desktop are sufficient.

What is render-time budget?

Render-time budget is the amount of time Googlebot allocates to execute JavaScript and render your page before indexing whatever it has. Industry testing suggests Googlebot waits up to five seconds for JavaScript to finish executing and for the DOM to stabilize. If your page takes longer because of slow third-party scripts or heavy rendering, Googlebot may index an incomplete version of the page or skip rendering entirely. This is especially problematic for single-page apps where the raw HTML is an empty shell. To stay within budget, reduce JavaScript bundle size (code-split, tree-shake unused code), defer or lazy-load non-critical scripts, server-render or statically generate key content, and avoid blocking the main thread with long-running scripts. Our tool measures render time and flags pages that take longer than five seconds. If your page renders in under two seconds, you are safely within budget. If it exceeds five seconds, critical content may not be indexed.

Related free tools

All tools →