Analysis of the markup data of an average web

5/5 - (2 votes)

We, web developers, try to improve our HTML code, making it clear, beautiful and readable. We do this in pursuit of better semantics and better accessibility, so that everyone can use it. This is our top priority. And we always have questions:

What is the best way to structure the markup?
How do others do it?

These kinds of questions are always interesting to us. We want to know exactly how other people write markup these days, as new web technologies are constantly emerging. So, web developers from many countries came together to do a detail! study, and they came across a dataset of over 8 million pages from the top twenty results of Google.

Studies that have been conduct! previously

Back in 2005, Ian Hickson, !itor of the HTML5 specification, analyz! a sample of about a billion documents to see what the web was made of. A billion is a huge number, but nothing is impossible for Google. From such a huge number of documents, he extract! valuable information about popular class names, elements, attributes, and relat! metadata. The remarkable results were later publish! as Web Authoring Statistics, which is still the most powerful study of web authoring ever done.

More recently, in 2008, the Opera Metadata Analysis and Mining Application scanner, MAMA, eventually analyz! about 3.5 million URLs. Brian Wilson, the author of this impressive work, expand! on the research by publishing the results with a detail! account of the structure of the pages, including HTML, CSS, and JavaScript.

One of the analyses from Web Authoring Statistics that later prov! to be important in the progression of HTML5 development was a list of the most popular class names in these HTML documents. The Opera MAMA crawler also search! for the most common class names and in addition to Google results, they publish! relevant results regarding popular ID attribute values for elements.

What does this study add to the general understanding?

The data for this study comes from 8,021,323 index! pages collect! from the top twenty Google results, using about 30 million keywords select! for their meaning and volume. We ran a Google search on each of them and took the URLs for the top 20 search results, add! them to the list, and remov! duplicates.

We can only assume that the relevance of these web pages to the general web population is very high. This is bas! on the likelihood that these sites are popular and have high traffic, which is why they are commensurate with their search engine results.

How fresh is this data?

The latest data set is from May 20, 2016.

This new study will never surpass Google’s previous study, which was done back in 2005. It’s also not about surpassing Opera’s big study. It’s about finding new and relevant insights into the actual markup us! by the most popular and successful web pages on the Internet.

So, what does the average HTML page look like today? Take a look at the screenshots below and check out the study for the full statistics.

Statistics

After conducting research, it was found that the average index! website page uses twenty-six different types of elements.

Most websites use 26 different HTML elements, give or take a few, with an interesting peak of 9 elements.

Twenty-six elements are us! on most pages, order! by frequency

Among document type declarations that specify which version of (X)HTML a page uses, the latest HTML5 doctype is clearly the leader. em> Doctype is the very first line of an HTML document that specifies which version of (X)HTML a page uses:

Nearly two-thirds of all websites declare themselves to be HTML5. The HTML, known as the root element, wraps the content of a web page and can be found imm!iately after the doctype declaration. Metadata contains information about the page, such as styles, scripts, and data, to help search engines or browsers use and display the pages.

If we look at all the elements that specifically tell the browser and search engines of a site how to style the site, we learn about approximately 175 million elements, and here is how they are arrang! in percentage terms:

Sectioning content

Content sectioning elements allow you to organize your content into logical parts. The breakdown of 105 million content sectioning elements looks like this:

H3s are the most popular heading elements and general content sectioning elements.

Text content elements are useful for organizing content within the body, and they help you define the purpose of that content. Out of the billion elements, text content:DIV is the clear leader.

Analysis of the markup data of an average web page

Studies that have been conduct! previously

What does this study add to the general understanding?

How fresh is this data?

Statistics

Sectioning content

Leave a Comment Cancel Reply