LovableHTML never sets noindex or X-Robots-Tag: noindex headers
We never mess with your robots.txt or your headers. We are an SEO tool after all, making your site un-indexable serves against what you, as user, user our services.
LovableHTML does not set your pages to noindex and does not add an X-Robots-Tag: noindex header to your origin pages. If you’re seeing noindex or “blocked by robots.txt”, it’s coming from your origin site, your CDN (Cloudflare, etc.), or your website builder/host configuration.
Step 0: Identify what the crawler is complaining about
These two problems are often confused, but the fixes are different:
- “Blocked by robots.txt”: The crawler is allowed to see the page, but it is choosing not to crawl it because your
robots.txtrules disallow it. - “Excluded by ‘noindex’” (or “noindex detected”): The crawler can crawl the page, but it is instructed not to index it (via
<meta name="robots" ...>orX-Robots-Tag).
If you’re not sure, use our free tools:
Branch A — “Blocked by robots.txt”
A1) Fetch the exact robots.txt Google/Bing will fetch
robots.txt is per hostname and per protocol. This matters:
https://example.com/robots.txtcan differ fromhttps://www.example.com/robots.txthttp://andhttps://can differ if you’re redirecting in odd ways
In a terminal, run:
curl -i https://example.com/robots.txtcurl -i https://www.example.com/robots.txt
Things to look for:
- HTTP status:
200is expected. A4xx/5xxcan cause crawlers to temporarily reduce/stop crawling. - Accidental global block:
User-agent: *followed byDisallow: / - Bot-specific blocks:
User-agent: Googlebot/User-agent: Bingbotfollowed by aDisallowthat matches your important pages
A2) Confirm you’re not blocking the bots you care about
At minimum, ensure you are not unintentionally blocking:
- Google:
Googlebot - Bing:
Bingbot
Gotcha: A rule like Disallow: / under User-agent: * blocks Googlebot and Bingbot unless you have a more specific allowlist for them.
A3) If your robots.txt contains Cloudflare-managed content
If your robots.txt has Cloudflare “managed” content or Cloudflare documentation language in it, you may have enabled Cloudflare settings that alter or prepend directives.
Read and verify these:
https://developers.cloudflare.com/bots/additional-configurations/block-ai-bots/https://developers.cloudflare.com/bots/additional-configurations/managed-robots-txt/#implementation
Common failure mode: enabling a managed robots setting and assuming it “only affects AI bots”, but ending up with an overly broad User-agent: * rule or accidentally serving the wrong file on the hostname Google actually crawls.
A4) Check redirect chains (the final hostname is what matters)
Sometimes robots.txt is fine on example.com, but Google is actually crawling www.example.com (or the reverse) due to redirects/canonicals.
curl -IL https://example.com/
If your homepage redirects, repeat the robots.txt fetch for the final hostname in that redirect chain.
Branch B — “Excluded by ‘noindex’” / “noindex detected”
B1) Check for X-Robots-Tag headers
Run:
curl -I https://example.com/some-page
If you see X-Robots-Tag: noindex (or similar), it’s typically set by:
- your host/CDN configuration
- a framework default for non-production environments
- a security plugin / SEO plugin
B2) Check for a meta robots tag in the HTML
Fetch the page HTML and look for a robots meta tag:
curl -sL https://example.com/some-page | head -n 60
Then look for something like:
<meta name="robots" content="noindex">
or:
<meta name="robots" content="noindex, nofollow">
Gotcha: Many website builders add noindex automatically for staging,
password-protected, “unpublished”, or “trial” sites.
B3) Check that “noindex” isn’t coming from a different page than the one you think
This is very common when there are redirects:
https://example.com/pageredirects tohttps://www.example.com/page/- the final page has
noindex(or different HTML/head content)
Use:
curl -IL https://example.com/page
Then re-run the curl -I and HTML checks against the final URL.
Website builder / hosting gotchas (very common)
Some builders/hosts manage robots.txt for you and/or expose UI toggles like “discourage search engines”, “hide from indexing”, or “site is in staging”.
If you’re using a website builder, check their docs for:
- robots.txt management (where the file comes from, and whether you can override it)
- environment/staging flags that add
noindex - password protection / “coming soon” mode (often triggers
noindex)
Extra checks that save time
- Cache effects: CDNs can cache
robots.txt. If you just changed it, purge cache for/robots.txt(and for bothwww/apex hostnames). - Wrong hostname: Google might crawl
wwwwhile you’re checkingapex(or vice versa). Always check both. - Different protocols: If anything still serves
http://, confirm it redirects cleanly tohttps://and that bots aren’t seeing a differentrobots.txttemporarily. - Verify with a bot-like fetch: Use the crawler simulator to see the final HTML/head the crawler sees, including meta robots and key headers.