SEO Crawler

SEO Crawler Features

Crawling features

Free up to five thousand URLs

You can also set a lower limit for exploratory purposes.

Custom extraction with XPath and/or CSS selectors

For custom extraction you need to enter two things in the tables above:
- Column Name: This should be a descriptive name for the page elements you want to exract. Examples could be "product_price", "blog_author", et.
- XPtah/CSS Selector: This is the selector pattern for the element(s) you want to extract.

Setting any User-agent

The names are given using human-readable device names like "iPhone 13", "Samsung Galaxy S22", etc.

You can also get any user-agent and paste it in the input box, so don't need to only use the ones provided.

Spider and/or list mode

To activate spider mode you just need to select the "Follow links" checkbox. Add more than one URL to run in list mode. You can combine both of course.

Include/exclude URL parameters

When encountering new links, should the crawler follow them if they (don't) contain the URL parameters that you chose?

Include/exclude URL regular expression

Similar to the above but using a regex to match links.

Crawl analytics features

Once you hit the Start crawling button, you will be given a URL where you can start auditing and analyzing the crawled website. The following are some of the available feature, together with a link to explore a live dashboard.

Visualize the URL structure

Using an interactive treemap chart you can see how the website's content is split.

Get the count of URLs for each directory of URLs /blog/, /sports/, etc.
Get the percentages of each of the directories.
Beyond ten directories the chart might look really cluttered, so all other directories are displayed under their own "Others" segment.

Structured data overview (JSON-LD, OpenGraph, and Twitter)

For each of the above structured data types, you can see the count and percentage of URLs that contain each tag of each type. For example, for the @context JSON-LD tag, you can see how many URLs contain it, their percentage of the crawled URLs.

Filter and export data based on whichever URLs you want

Once you see an interesting insight, you can export a subset of URLs and columns. For example, get the URL, title, and status of URLs whos status code is not 200. Get the URL, h1, and size, of pages whos size is larger than 300KB, and so on.

Count page elements' duplications if any (title, h1, h2, etc.)

For a selected page element get the counts of each element on the website. How many times was each h2, meta_desc, etc. element duplicated across the website?

N-gram analysis for any page element (set stop words, and select from 40 languages)

Select unigrams, bigrams, or trigrams
Select the page element to analyze (title, body_text, h3, etc)
Use editable default stopwords
Chose stopwords from forty languages

Link analysis

External links: See which domains the crawled website links to the most.

Internal links: Get a score for each internal URL as a node in the network (website):

Links: Total number of links to and from the page.
Inlinks: Total incoming links to the page.
Outlinks: Total outgoing links to the page.
Degree centrality: The percentage of internal URLs that are connected to this URL.
Page rank: The PageRank

Note that it is up to you to define what "internal" means using a regular expression.

For an example of those features you can explore this crawl audit and analytics dashboard