| og:image | og:title | og:description | twitter:card | og:url | og:site_name | og:type | twitter:title | twitter:url | twitter:image | og:image:secure_url | twitter:image:alt | jsonld_@context | jsonld_@type | jsonld_mainEntityOfPage | jsonld_headline | jsonld_text | jsonld_url | jsonld_datePublished | jsonld_comment | jsonld_author.@type | jsonld_author.name | jsonld_author.url | jsonld_interactionStatistic.@type | jsonld_interactionStatistic.interactionType | jsonld_interactionStatistic.userInteractionCount | jsonld_mainEntity.@type | jsonld_mainEntity.name | jsonld_mainEntity.text | jsonld_mainEntity.dateCreated | jsonld_mainEntity.upvoteCount | jsonld_mainEntity.author.@type | jsonld_mainEntity.author.name | jsonld_mainEntity.answerCount | jsonld_mainEntity.acceptedAnswer | jsonld_mainEntity.suggestedAnswer | twitter:description | twitter:site | twitter:creator | og:image:width | og:image:height | jsonld_image | jsonld_mainEntity.acceptedAnswer.@type | jsonld_mainEntity.acceptedAnswer.text | jsonld_mainEntity.acceptedAnswer.dateCreated | jsonld_mainEntity.acceptedAnswer.upvoteCount | jsonld_mainEntity.acceptedAnswer.author.@type | jsonld_mainEntity.acceptedAnswer.author.name | jsonld_mainEntity.acceptedAnswer.url | jsonld_name | jsonld_startDate | jsonld_endDate | jsonld_eventAttendanceMode | jsonld_description | jsonld_location.@type | jsonld_location.url | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | https://supermetrics.com/images/supermetrics.png | Supermetrics: Turn your marketing data into opportunity - Supermetrics | Focus on growth, not data silos. Streamline your marketing data so you can take control of what matters. Start your ... | summary_large_image | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 
| 1 | https://supermetrics.com/images/supermetrics.png | Become a Supermetrics Affiliate - Supermetrics | Refer Supermetrics to others and get 20% recurring commissions from each sale. Join now! | summary_large_image | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 
| 2 | https://supermetrics.com/images/supermetrics.png | About Supermetrics - Supermetrics | Whether you’re a small business getting started on your data journey or a global enterprise working with business c... | summary_large_image | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 
How to audit structured data on a website with Python
Python
    advertools
    Structured Data
    Crawling
    Auditing
    Analysis
    Intermediate
  
    A Python script that takes a site crawled with advertools and provides mechanisms for understanding structured data; which are used (JSON-LD, Twitter, and OpenGrap), which tags are present, and what they contain.
  
Read the crawl file
Filter the columns that contain structured data
Define a function to get counts or averages of a set of columns
This function takes the following parameters:
df: A crawl DataFrameregex: The pattern to find, in this case we will use “jsonld_”, “twitter:”, and “og:”func: Either “count” to get the counts or “mean” to get the percentage for each tag
def column_values(df, regex, func):
    func_dict = {
        "mean": {"fmt": "{:.1%}" ,"colname": "% usage"},
        "count": {"fmt": "{:,}", "colname": "count"},
    }
    return(crawldf
           .filter(regex=regex)
           .notna()
           .apply("sum" if func == "count" else func)
           .sort_values(ascending=False)
           .to_frame()
           .rename(columns={0: func_dict[func]["colname"]})
           .style
           .format(func_dict[func]["fmt"])
           .bar(color='lightgray'))Display counts and percentages of all three structured data types on the website
from itertools import product
for regex, func in product(["og:", "twitter:", "jsonld"], ["mean", "count"]):
    display(column_values(crawldf, regex, func))
    print('')| % usage | |
|---|---|
| og:title | 99.3% | 
| og:image | 98.6% | 
| og:description | 94.0% | 
| og:type | 66.3% | 
| og:url | 65.6% | 
| og:site_name | 58.8% | 
| og:image:secure_url | 6.9% | 
| og:image:width | 0.0% | 
| og:image:height | 0.0% | 
| count | |
|---|---|
| og:title | 2,603 | 
| og:image | 2,586 | 
| og:description | 2,464 | 
| og:type | 1,738 | 
| og:url | 1,721 | 
| og:site_name | 1,541 | 
| og:image:secure_url | 182 | 
| og:image:width | 1 | 
| og:image:height | 1 | 
| % usage | |
|---|---|
| twitter:card | 91.2% | 
| twitter:title | 43.1% | 
| twitter:url | 43.1% | 
| twitter:image | 43.1% | 
| twitter:description | 37.9% | 
| twitter:image:alt | 15.0% | 
| twitter:site | 0.1% | 
| twitter:creator | 0.1% | 
| count | |
|---|---|
| twitter:card | 2,391 | 
| twitter:title | 1,131 | 
| twitter:url | 1,131 | 
| twitter:image | 1,131 | 
| twitter:description | 994 | 
| twitter:image:alt | 393 | 
| twitter:site | 2 | 
| twitter:creator | 2 | 
| % usage | |
|---|---|
| jsonld_@context | 5.6% | 
| jsonld_@type | 5.6% | 
| jsonld_headline | 4.3% | 
| jsonld_datePublished | 4.3% | 
| jsonld_author.url | 4.3% | 
| jsonld_author.name | 4.3% | 
| jsonld_author.@type | 4.3% | 
| jsonld_url | 3.4% | 
| jsonld_text | 3.4% | 
| jsonld_mainEntityOfPage | 3.4% | 
| jsonld_comment | 3.4% | 
| jsonld_interactionStatistic.@type | 3.4% | 
| jsonld_interactionStatistic.interactionType | 3.4% | 
| jsonld_interactionStatistic.userInteractionCount | 3.4% | 
| jsonld_mainEntity.@type | 1.2% | 
| jsonld_mainEntity.name | 1.2% | 
| jsonld_mainEntity.text | 1.2% | 
| jsonld_mainEntity.dateCreated | 1.2% | 
| jsonld_mainEntity.upvoteCount | 1.2% | 
| jsonld_mainEntity.author.@type | 1.2% | 
| jsonld_mainEntity.author.name | 1.2% | 
| jsonld_mainEntity.answerCount | 1.2% | 
| jsonld_mainEntity.suggestedAnswer | 1.2% | 
| jsonld_image | 1.0% | 
| jsonld_mainEntity.acceptedAnswer.url | 0.2% | 
| jsonld_mainEntity.acceptedAnswer.@type | 0.2% | 
| jsonld_mainEntity.acceptedAnswer.text | 0.2% | 
| jsonld_mainEntity.acceptedAnswer.dateCreated | 0.2% | 
| jsonld_mainEntity.acceptedAnswer.upvoteCount | 0.2% | 
| jsonld_mainEntity.acceptedAnswer.author.@type | 0.2% | 
| jsonld_mainEntity.acceptedAnswer.author.name | 0.2% | 
| jsonld_eventAttendanceMode | 0.0% | 
| jsonld_name | 0.0% | 
| jsonld_startDate | 0.0% | 
| jsonld_endDate | 0.0% | 
| jsonld_location.@type | 0.0% | 
| jsonld_description | 0.0% | 
| jsonld_location.url | 0.0% | 
| jsonld_mainEntity.acceptedAnswer | 0.0% | 
| count | |
|---|---|
| jsonld_@context | 147 | 
| jsonld_@type | 147 | 
| jsonld_headline | 114 | 
| jsonld_datePublished | 114 | 
| jsonld_author.url | 114 | 
| jsonld_author.name | 114 | 
| jsonld_author.@type | 114 | 
| jsonld_url | 90 | 
| jsonld_text | 90 | 
| jsonld_mainEntityOfPage | 90 | 
| jsonld_comment | 90 | 
| jsonld_interactionStatistic.@type | 90 | 
| jsonld_interactionStatistic.interactionType | 90 | 
| jsonld_interactionStatistic.userInteractionCount | 90 | 
| jsonld_mainEntity.@type | 32 | 
| jsonld_mainEntity.name | 32 | 
| jsonld_mainEntity.text | 32 | 
| jsonld_mainEntity.dateCreated | 32 | 
| jsonld_mainEntity.upvoteCount | 32 | 
| jsonld_mainEntity.author.@type | 32 | 
| jsonld_mainEntity.author.name | 32 | 
| jsonld_mainEntity.answerCount | 32 | 
| jsonld_mainEntity.suggestedAnswer | 32 | 
| jsonld_image | 25 | 
| jsonld_mainEntity.acceptedAnswer.url | 6 | 
| jsonld_mainEntity.acceptedAnswer.@type | 6 | 
| jsonld_mainEntity.acceptedAnswer.text | 6 | 
| jsonld_mainEntity.acceptedAnswer.dateCreated | 6 | 
| jsonld_mainEntity.acceptedAnswer.upvoteCount | 6 | 
| jsonld_mainEntity.acceptedAnswer.author.@type | 6 | 
| jsonld_mainEntity.acceptedAnswer.author.name | 6 | 
| jsonld_eventAttendanceMode | 1 | 
| jsonld_name | 1 | 
| jsonld_startDate | 1 | 
| jsonld_endDate | 1 | 
| jsonld_location.@type | 1 | 
| jsonld_description | 1 | 
| jsonld_location.url | 1 | 
| jsonld_mainEntity.acceptedAnswer | 0 | 
Count actual values of the selected structured data column
fig = adviz.value_counts(crawldf["og:title"], width=None)
fig.data[1].hoverinfo = 'text'
fig.layout.margin.l = 10
fig.layout.margin.r = 0
figCounting ngrams of the desired columns
| word | abs_freq | |
|---|---|---|
| 0 | - supermetrics | 866 | 
| 1 | how to | 581 | 
| 2 | | supermetrics | 574 | 
| 3 | supermetrics documentation | 393 | 
| 4 | connection guide | 190 | 
| 5 | supermetrics community | 185 | 
| 6 | supermetrics connection | 182 | 
| 7 | data warehouse | 171 | 
| 8 | metrics and | 149 | 
| 9 | and dimensions | 148 | 
| 10 | dimensions | | 146 | 
| 11 | looker studio | 135 | 
| 12 | standard data | 129 | 
| 13 | warehouse schema | 129 | 
| 14 | schema | | 129 |