Start here

How to navigate and get started using advertools.

To enable you to make good use of advertools and not to get lost in many tutorials and articles, I think it can help to keep in mind a few principles and ideas that can help navigate this library.

Independent but designed to work together

Although you might see a lot of modules and functions, keep in mind that even though they are designed to work together and in a suggested flexible sequence, they are also designed to work independently.

For example, you might want to go through this sequence:

  1. Get an XML sitemap
  2. Crawl the website
  3. Compare sitemap and crawled URLs
  4. Analyze the crawl file

There is a special function (or module) for each of these tasks, so you can go through the full sequence, yet you might just want to run a crawl, or just fetch the XML sitemap. You can simply select any function(s) that you want to use.

Unix philosophy

This is a set of guidelines that encourage focusing on building tools that

  1. Do one thing, and do it well
  2. Are designed to play well with other tools (of the same library or other libraries)
  3. Handle text streams because it is a universal interface

There is really nothing big to “learn” about advertools, or how it “works”. Essentially, it is a bunch of functions/tools that each perform a certain task. Having said that, there are a few things to understand about specific tools and processes when you want to go deeper. For example, it would help a lot to understand the structure of the advertools crawl file, so you can better analyze it. But if you don’t want to do crawling, you don’t need to know anything about this.

In other situations, for example if you were learning a web framework, you would need to know how the framework is structured, how its interface is designed to get started.

Text streams and DataFrames

The Unix philosophy talks about text streams as a universal format. This makes it easy to create tools that would play well with other tools. It also makes things easier for users when they know that any new tool would consume and produce text streams, so they can continue using all their previous tools with minimal overhead in learning the new tool.

The Data Science text stream is essentially the “tidy” DataFrame. I highly recommend going through this important document to get a good idea of how DataFrames might go wrong, and what the ideal structure might look like, and how it helps in various data tasks, like data cleaning, visualization, machine learning, and many other tasks. The name “tidy” has been changed to “long format” in some cases, as the word “tidy” implies that other format are messy, which is not really the case, it is just a great format that works great for various data analysis tasks.

Installation

If you would like to get started immediately you can check out the simple installation instructions and see what you would like to accomplish with advertools.