Benchmarking
What can prevent a site from being successfully checked?
Crawling a site can fail for multiple reasons.
- The site cannot be found. It may have been moved to a new URL, or it is temporarily or permanently unavailable.
- The site does not respond, it responds with a HTTP 403 error, or a timeout happens. The webserver may be blocking the crawler from accessing the site, or there may be routing issues. Please make sure to whitelist the following IPs: 176.58.123.93, 212.71.251.78, and 151.236.218.93.
- The site is a placeholder, or it does not have more than one page.
- The site may require Javascript to function correctly. This includes single-page applications, and sites under some forms of DDoS protection.
- The site may require cookies to function correctly.
- Invalid URLs collected from the site.
Checking the pages on a site may also fail for multiple reasons.
- Too many pages caused errors in the checker (we require that 90% of the URLs are checkable).
- Wrong mimetype on documents.
- Broken links and invalid URLs.
- Connection issues.
- Pages that cause memory issues or otherwise crash the checker.