Setting up Python

Python

Beginner

A beginners tutorial on setting up Python: installing it, creating and activating a virtual environment, installing some essential packages, and doing some work.

In this tutorial, we will get started from scratch to get fully set up and start doing actual tasks with Python.

Installing Python

This is a straightforward step, very similar to installing any software application. You simply need to go to the Python home page python.org, and select the version that is appropriate for your system under “Downloads”.

Then you’ll simply go through a wizard and get it finalized.

As a result of this installation, you now have a new command on the command line, called python. Just like you probably use the host command, or other tools, you just obtained the python command.

For historical reasons it is actually python3, no need to get into the details, but we’ll use it once and when our environment is activated, we’ll continue normally with python (without the “3”).

Creating a virtual environment

When you want to install a mobile app, you generally have contstraints, like the device and oeprating system. You might see that an app needs iPhone 14+ and iOS 16.1+ to properly work. This is the “environment” in which this software (the app) can work.

With Python, a virtual environment is essentially a folder on your machine. It contains two things:

A Python installation, eventually you’ll probably have several of those, for different projects and/or different versions of Python.
A set of third party packages (libraries)

Let’s now start the terminal application, and go to the command line to create our virtual environment.

On Windows:

python3 -m venv C:\path\to\new\virtual\environment

On Linux/MacOS:

python3 -m venv /path/to/new/virtual/environment

python3: The python command
-m: module, “use the module”
venv: the name of the virtual environment module
C:\path\to\new\virtual\environment: where you want the environment to be created.

You can simply run this

python3 -m venv venv

This will create the virtual environment in the current working directory, under a new directory called venv

Activating the virtual environment

We created it, now we want to activate it. When an environment is active, it means that using the command python will run this particular environment of Python that we just created:

On Windows

On cmd.exe:

C:\> <venv>\Scripts\activate.bat

On PowerShell:

PS C:\> <venv>\Scripts\Activate.ps1

Replace <venv> with the actual path of your environment.

On Linux/MacOS:

source <venv>/bin/activate

Once it is activated, the prompt should be updated to have something like this at the beginning of the prompt:

(venv312)

You can also check again to see which Python you are using, by running the which command:

which python
/Users/elias/venv312/bin/python

In this case it shows where my environment lives (I called it venv312 so I know that this has Python version 3.12.)

Installing a few Python packages (libraries)

Just like your mobile phone, even though it comes with powerful and useful native apps, it still lacks a lot of potential without an appstore. This also applies to Python, and programming languages in general.

We now want to install a few packages that will help us in our digital marketing work especially if you do SEO/SEM.

With the environment activated, run the following:

pip install jupyterlab advertools pandas plotly

Command breakdown:

pip: The command we use to install packages
install: The install command (there a few other, but we won’t cover them here.)
jupyterlab: The first package we want to install. This is the web app that allows us to interacively run Python commands, and get rich outputs like interactive HTML/JS charts and apps.
advertools: The main library for SEO/SEM
pandas: The main library for data processing, manipulation, sorting, reshaping, etc.
plotly: The library for interactive data visualization

You can install as many libraries as you want in one go, you just have to supply their names, separated by spaces, as we did above.

Starting jupyter lab

With the environment activated and after having it installed, simply run:

jupyter lab

This should open jupyter lab in your browser, and you should see an inerface like this:

Note that the panel on the left is a regular file browser that you can use like any file browser, and it shows the files on your computer. You can open many formats inside Jupyter in case you want to preview them, like PDF, CSV, images, and of course, most importantly Jupyter notebooks (with the extension .ipynb.)

Now to create a new jupyter notebook, click on the icon under “Notebook”.

You are now ready to start writing code and doing some work with Python.

Crawling a website

In order to use the libraries that we installed, we need to start them, or activate them, actually import them.

In the first code cell you can run the following by clicking on the play button on top part of your notebook (or by running SHIFT+ENTER after clicking inside the code cell).

import advertools as adv
import pandas as pd

Now we have two new libraries activated with the aliases adv and pd. These are simply shortcuts to make it faster to type.

Each one of those libraries has a bunch of functions, classes, and various objects. We can start using them with the dot notation.

We access them just like we use the right click on graphical interfaces. The right click functionality basically displays a contextual menu based on the type of object that you right-clicked. If it’s an image for example, you get “Save image as”, “Copy image address” and so on. If you right-click a string of characters you get a different set of options that are particular to that type of object.

With libraries we can access the available functions and methods of that object using the dot-notation, and in Jupyter by typing a dot after the object and hitting the TAB button:

You can now select any function you want, and we’ll use the crawl function.

adv.crawl(url_list="https://example.com", output_file="ouput.jsonl")

This function is a full-fledged crawler, and has many options that you can later explore. For now, we will minimally run it by specifying a URL(s) to crawl and the file where we want to save the crawl results output.jsonl.

Now you should have a new file with the name you chose, and we are going to use pandas to read it into a table or DataFrame.

crawldf = pd.read_json("output.jsonl", lines=True)
crawldf

	url	title	viewport	charset	h1	body_text	size	download_timeout	download_slot	download_latency	depth	status	links_url	links_text	links_nofollow	ip_address	crawl_time	resp_headers_Content-Length	resp_headers_Age	resp_headers_Cache-Control	resp_headers_Content-Type	resp_headers_Date	resp_headers_Etag	resp_headers_Expires	resp_headers_Last-Modified	resp_headers_Server	resp_headers_Vary	resp_headers_X-Cache	request_headers_Accept	request_headers_Accept-Language	request_headers_User-Agent	request_headers_Accept-Encoding	resp_headers_Accept-Ranges
0	https://example.com	Example Domain	width=device-width, initial-scale=1	utf-8	Example Domain	Example Domain	1256	180	example.com	0.103387	0	200	https://www.iana.org/domains/example	More information…	False	93.184.215.14	2024-12-28 01:52:49	648	344584	max-age=604800	text/html; charset=UTF-8	Sat, 28 Dec 2024 01:52:49 GMT	“3147526947+gzip”	Sat, 04 Jan 2025 01:52:49 GMT	Thu, 17 Oct 2019 07:18:26 GMT	ECAcc (bsb/27D1)	Accept-Encoding	HIT	text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8	en	advertools/0.16.3	gzip, deflate	nan

Now you can explore other functions and build your own workflows. Check out the documentation of the libraries you are interested in, and try playing around with the available options.

If you have gone through the whole process, you have achieved a great deal, and are at a new level where you experiment with code that you see on the internet.