Is it just me, or is text disappearing from the web?
In the summer of 2014, I became interested in studying if it was more than my mere impression that websites were beginning to present less text to end-users. Websites such as Buzzfeed.com were gaining enormous popularity and using a communicative style that had more in common with children’s books (large graphics and short segments of text) than with the traditional newspaper column. I wondered if I could measure this change in any systematic way? In this blog post, I will outline a research study I recently published in the open-access journal Information Research that ask the following questions: is the use of text on the World Wide Web declining? If so, when did it start declining, and by how much has it declined?
A movement away from text?
I was primarily interested in what seemed to me a departure from text for what it might imply about literacy and what we ought to teach students, and more broadly about what this change meant for how humans communicate and share information, knowledge and culture. Individuals from a variety of sectors have signaled a changing relationship between literacy and text. In the education sector, a variety of concepts look to capture this change and help educators work within this transformed landscape, including ideas such as multiple literacies, new literacies, multimodal literacy, digital literacies, media literacy and visual literacy. Although each of these concepts are different, they emphasize a need for understanding and being able to operate in a communicative landscape that includes more than print literacy.
While educators explore ways to communicate multimodally – both for enhancing their own ability but also for teaching youth – Web designers and those offering expertise on Web design recommend shrinking the amount of text on Webpages. Nielsen Norman, a company that provides advice and consulting on Web user-experience, consistently recommend that Web developers reduce the amount of text online. Examples include articles such as ‘How little do users read?’, which finds that Web users read only about 20% of the words on a Webpage (Nielsen, 2008). Others articles recommend brevity. ‘Don't require users to read long continuous blocks of text,’ and ‘modern life is hectic and people simply don't have time to work too hard for their information’ (Nielsen, 1997a; Nielsen, 1997b). Other Web design experts offer similar advice. Krug (2006) – well-known for his dictum ‘don’t make me think’ – suggest that Web designers remove ‘half of the words’ on a Web page because ‘most of the words I see are just taking up space, because no one is ever going to read them,’ which makes ‘pages seem more daunting than they actually are’ (p. 45). Thus, those designing texts for the Web are encouraged to shrink their texts or risk being skipped or ignored.
Using Web Archives as the evidence-base
Given this interest in exploring if text on the web was declining, it was immediately clear to me to study this issue I would be relying on web archives. Primarily, I would rely on the Internet Archive’s WayBackMachine because it had collected such a wide scope of web pages since the 1990s.
The method devised was to select 100 poplar and prominent homepages in the United States from a variety of sectors that were present in the late 1990s and are still used today. I also decided to select homepages every three years beginning in 1999, resulting in 6 captures or 600 total homepages. The reason for this decision is that by 1999 the Internet Archive’s web archiving efforts were fully underway, and three years would be enough to show changes but not require a hugely repetitive dataset. URLs for webpages in the Internet Archive were selected using the Memento web service. Full webpages were saved as static PNG files.
Entire websites were not used, and only homepages, because entire websites are not archived by any web archive. Rather, the Internet Archive’s only captures a few levels below the top homepage, thus offering an accurate glimpse into the past across the entire web but not a total catalog of the past web.
Use of Computer Vision techniques
To detect text blocks from non-text blocks, I modified a Firefox extension called Project Naptha. This extension detects text from non-text using a computer-vision algorithm called the Stroke Width Transform. The percentage of text per webpage was calculated and stored in a database. A sample of detected text from non-text is shown in the figure below, which is 46.10% text.
Once the percentage of text for each webpage and year were computed, I used a statistical technique called a one-way ANOVA to determine ifthe percentage of text on a Webpage are not chance occurrences but rather this percentage is dependent on the year the Website was produced. I found that these percentages were not random occurrences but dependent on the year the webpage was produced (what we would call statistically significant).
And the results reveal…
The major finding is that the amount of text rose each year from 1999 to 2005, where it peaked, and it has been on a decline ever since. Thus, website homepages in 2014 have 5.5% less text than they did in 2005. This is consistent with other research that uses web archives that indicate a decrease of text on the web. This pattern is illustrated below.
So what does it all mean?
This study necessarily begs the question: what has caused this decrease in the percentage of text on the Web? Although it is difficult to make definitive conclusions, one suggestion is that the first Web boom of the late 1990s and early 2000s brought about significant enhancements to internet infrastructure, allowing for non-textual media such as video to be more easily streamed to end-users. Interestingly, the year 2005 was also the year that YouTube was launched. This is not to suggest that text was replaced with YouTube videos but rather that a rise in multiple modes of communication became more possible with its easier delivery, such as video and audio, text was gradually unseated from its primacy on the World Wide Web.
I think the study raises a number of interesting issues. If the World Wide Web is presenting less text to users relative to other elements, does this mean that the World Wide Web is becoming a place where deep reading is less likely to occur? Is deep reading just happening in other places, such as e-readers or printed books? Some research indicates this might be the case.If the early web was the great delivery mechanism of text, will the web evolve primarily into a platform for delivering audio and video?
If interested in this study, you can read it on the open-access publication Information Research.
Krug, S. (2006). Don't make me think: a common sense approach to web usability. (2nd Edition). Berkeley, CA: New Riders Publishing.
Nielsen, J. (2008). How little do users read? Fremont, CA: Nielsen Norman Group. Retrieved from http://www.nngroup.com/articles/how-little-do-users-read/
Nielsen, J. (1997a). Be succinct! (Writing for the Web). Fremont, CA: Nielsen Norman Group. Retrieved from http://www.nngroup.com/articles/be-succinct-writing-for-the-web/
Share your thoughts, feedback and questions in the comments below
Please note: to comment on a blog you need to log-in to the Disqus comment system using a Twitter, Facebook, Google or Disqus account.