**1 Introduction**

*"There are three kinds of lies: lies, damned lies, and statistics” *first proclaimed the British Prime Minister Benjamin Disraeli before the sentence was popularised by Mark Twain. Indeed, statistics frequently had itself discredited because of misunderstanding or mistrust. A lack of statistical literacy can easily lead to “misunderstandings, misperceptions, mistrusts and misgivings about the value of statistics for guidance in public and private choices” (Wallman, 1993). In today’s complexities of our information society, an understanding of statistical information and techniques has become essential both for everyday life and effective participation in the workplace, leading to calls for an increased attention to statistics and statistical literacy (see Shaughnessy and Pfannkuch, 2004; Shaughnessy, 2007; Makar and Rubin, 2009). The quality of available statistics can vary considerably so that an understanding of sampling techniques and sources of bias can help to first assess what has been done and second adopt a critical stance on statistics. Increasing the public awareness regarding the quality of the information consumed from television or newspapers is crucial due to the “overwhelming amount of unregulated, unrestricted information being thrust upon a public that is generally ill equipped to consume the information” (Rumsey, 2002 p.33). Indeed, this current phenomenon is dual: progress in the use of statistics goes hand in hand with an increase in the misuses and statistical fallacies (Hooke, 1983). A large body of literature, built by teachers, education researchers, statisticians and professional organisations[1] thus calls for improving and measuring statistical literacy, with a special focus on the student population. Begg et al. (2004), for example, underlined the societal motive behind the call for a greater emphasis on statistical literacy in school curriculum, being that students can become active and critical citizens. Callingham (2007) stressed the importance for students to adopt a critical stance about data, referred to as applying statistical literacy.

The call for statistical literacy has recently been echoed by the international community. In their “A World That Counts” report, the United Nations Secretary General’s appointed Independent Expert Advisory Group (IEAG) on the 'Data Revolution for Sustainable Development' recommended that more is done to increase global literacy. Specifically, the group called for “A proposal for a special investment to increase global data literacy. To close the gap between people able to benefit from data and those who cannot, in 2015 the UN should work with other organisations to develop an education program and promote new learning approaches to improve peoples’, infomediaries’ and public servants’ data literacy. Special efforts should be made to reach people living in poverty through dedicated programmes.” The Synthesis Report of the UN Secretary-General on the Post-2015 Agenda, “The Road to Dignity by 2030”, called itself for a transformative agenda where we “base our analysis in credible data and evidence, enhancing data capacity, availability, disaggregation, literacy and sharing”. It stressed that “the world must acquire a new ‘data literacy’ in order to be equipped with the tools, methodologies, capacities, and information necessary to shine a light on the challenges of responding to the new agenda”.

To inform this debate, the PARIS21 Secretariat established a task team for the purpose of developing and reporting on a global indicator to measure the current state and future progress in global statistical literacy. The paper presents the outcome of this consultative process and presents a novel measure of statistical literacy based on the use of and critical engagement with statistics in national newspapers. The use of text mining techniques bridges current data gaps in this area and allows the assessment of statistical literacy of an the adult population on a day-to-day basis in more than one hundred developing and developed countries.

The paper is structured as follows. Section 2 reviews the literature. Section 3 presents the text mining methodologies developed to measure statistical literacy and provides a brief overview of the keywords analytic. Section 4 describes the data and presents the results. Section 5 presents robustness checks. Section 6 concludes.

**2 Literature Review**

*Conceptualisation of the notion of statistical literacy*

The present paper contributes to a body of literature that addresses the need to give a concrete measure of statistical literacy. Despite an international consensus on the value of understanding data and improving global statistical literacy, there is no general agreement on its conceptualisation. While the need for a common definition of statistical literacy has been recognised (see Ben-Zvi and Garfield, 2004) in the literature, Batanero (2002, p.37) summarises that “we have not reached a general consensus about what are the basic building blocks that constitute statistical literacy or about how we can help citizens construct and acquire the abilities”. These definitional issues led to the building of an expanding conception of statistical literacy, from purely conceptual to a more applied concept.

Early work tries to provide a comprehensive definition of statistical literacy. Wallman (1993 p.1), for example, defines statistical literacy as “the ability to understand and critically evaluate statistical results that permeate our daily lives—coupled with the ability to appreciate the contributions that statistical thinking can make in public and private, professional and personal decisions” (see also Trewin (2005) for such kinds of broad, generic definitions). The concept directly introduces both a personal and societal need to develop statistical literacy skills. Callingham (2007) endorsed this definition, underlining it also requires an appreciation of the social context. These studies, however, suffer from the lack of methodological tools that would help to quantify levels of statistical literacy or to identity useful skills and competencies to develop. In response, Ben-Zvi and Garfield (2004) include more active components to the definition of statistical literacy. They define statistical literacy as a set of skills that students may actively use in understanding statistical information -- among them, organising data, constructing tables or being able to work with different varieties of data representations. Most of these definitions are strongly linked to the field of education, identifying statistical literacy as a primary goal and a need for statistics instruction, because “most students are more likely to be consumers of data than researchers” (Garfield and Gal, 1999, p.4). In that regards, Gal (2004, p.1) sees statistical literacy as one of the prominent pre-requisite for participation in society, the “key ability expected of citizens in information-laden societies”. His statistical literacy concept involves both cognitive and dispositional components, where some components are common with numeracy and literacy whereas others are unique to statistical literacy. This definition encompasses both critical evaluation of statistics and the ability to express one’s opinions or data-related arguments about it. Likewise, Schield (2004) and Watson (2006) see the ability to question claims in social contexts as a fundamental element to statistical literacy.

Within the nebula of definitions, the concepts of data literacy and statistical literacy are often used without distinction. Our paper adopts the division used by the Oxford Dictionary of Statistical Terms (Dodge, 2003)[2] assessing that data literacy is a sub-component of statistical literacy. Data literacy, as called for in the Synthesis Report, can therefore be seen as a component of statistical literacy, which Dodge (2003) defines as “the ability to critically evaluate statistical material and to appreciate the relevance of statistically-based approaches to all aspects of life in general”. Statistical literacy can indeed be seen as an encompassing concept, implying to be comfortable and competent with a large variety of forms and representation. Statistics is about data analysis processes, but also number sense, understanding variable and symbols, interpreting tables and graphics, mapping notions of sample, data collection methods and questionnaire design, probabilities and inferential reasoning (Scheaffer, Watkins & Landwehr, 1998). In particular, the Australian Bureau of Statistics’ Education Services considers four criteria essential for statistical literacy: data awareness; the ability to understand statistical concept; the ability to analyse, interpret and evaluate statistical information; the ability to communicate statistical information and understandings. This emphasis in the ability to understand and communicate about statistics is recurring in the recent literature: “Statistics requires the basic understanding of statistical concepts…whereas literacy requires the ability to express that understanding in words, not in mathematical formulas” (Watson and Kelly, 2003). Milo Schield (2004, p.9) for instance supports that statistical literacy is “typically more about words than number, more about evidence than about formulas”.[3]

The complexity of the statistical literacy construct, by emphasising the place of critical thinking, contextual understanding and students’ dispositions, offers a real challenge for assessment. Despite these challenges in terminology, several frameworks have attempted to model the features of statistical literacy, focusing mainly on a student population. Our indicator builds on these models to provide a reliable and more widely applicable measure of statistical literacy.

*Empirical frameworks*

Gal (2004) developed one of the first models evaluating the understanding of statistics by adults. In this model, cognitive and dispositional components interact together. In particular, statistical literacy presupposes the use of five interrelated cognitive elements: mathematical knowledge, statistical knowledge, literacy skills, knowledge of the context and critical questions. An important claim in Gal’s model is that all components leading together to adopt a statistical literacy behaviour constitute a dynamic set of knowledge and dispositions, strongly context-dependent and interrelated entities. Gal particularly examines how a person’s dispositions or attitudes toward data and statistics interact with these knowledge bases to motivate a critical thinking about statistics. Once a certain level of statistical literacy is reached, individuals would be able to automatically transfer their skills to evaluating everyday life statistical information they encounter. Gal’s model draws the important implications that anyone lacking these skills is functionally illiterate as a responsible, informed and productive citizen and worker. As suggested by Batanero (2002), Gal’s model is useful for understanding what statistical literacy involves and for helping policy makers to take decisions at a macro-level of analysis. Its strength is that Gal offers a full definition along with the necessary components to achieve statistical literacy. However, analysing statistical concepts related to this notion requires to use more specific micro-level models (and somewhat a less exigent definition).

A second model, the Statistical Literacy Construct from Watson and Callingham, (2003) builds on the Structure of Observed Learning Outcomes (SOLO) taxonomy developed by Biggs and Collins (1982) to hierarchize statistical thinking into six stages of skills, that can be viewed as a progression of levels of statistical understanding. As suggested by Callingham (2007), the boundaries between levels are not rigid. The strength of this model is that its statistical literacy scale has been widely validated by researchers, based on responses from a large number of students in Australia. At the top two levels of the Watson and Callingham, (2003) construct, students display skills matching the critical-thinking skills of the third tier of the Statistical Literacy Hierarchy in Biggs and Collins (1982). This model of measuring statistical literacy was born to solve the lack of research proposing methods to measure students’ progress -- despite statistical literacy being part of the school curriculum.

The differences in approaches between the Gal (2004) and Watson and Callingham (2003) model can partly be explained by the fact that Gal’s construct is developed for an adult population while the construct of Watson and Callingham was developed for students. These two main frameworks for statistical literacy are by no means the only ones. Wild and Pfannkuch (1999) proposed a model for statistical thinking in empirical enquiry, built upon statistics education literature and interviews with statisticians and undergraduate students. Reading (2002) relies on the SOLO taxonomy across five areas of statistics[4] to build a “profile for statistical understanding”. This methodology, as well as the one developed by Jones et al. (2000)[5], is very similar to the hierarchical model of Watson and Callingham (2003).

As a framework, we adapt ourselves the taxonomy developed in Watson and Callingham (2003). The main methodological contribution of our paper is that it develops text mining methods to measure literacy and critical thinking based on articles from RSS feeds of national newspapers. By relying on text mining techniques, our methodology targets mainly journalists and newspaper readers. This excludes the illiterate population and those without access to print or online media. Moreover, this paper contributes to the existing literature by targeting a population that is not limited to students. The original statistical literacy construct of Watson and Callingham (2003) involves six stages: Idiosyncratic, informal, inconsistent, consistent non critical, critical, and critical mathematical. This taxonomy is developed for students in grades 5 to 10. Our interest is in an adult population and therefore attention is on the top three levels of the taxonomy, as characterised in Table 1.

Table 1. Statistical literacy construct, adapted from Watson and Callingham (2003)

Level |
Brief characterization of levels |
---|---|

1: Consistent Non-critical | Appropriate but non-critical engagement with context, multiple aspects of terminology usage. |

2: Critical | Critical, questioning engagement in contexts that do not involve proportional reasoning, but which do involve appropriate use of terminology. |

3: Critical Mathematical | Critical, questioning engagement with context, using proportional reasoning particularly in chance contexts, showing appreciation of the need for uncertainty in making predictions, and interpreting subtle aspects of language. |

*Existing data sources*

There are several existing data sources and indicators that were initially considered for the purpose of this study but not implemented. The main reasons were that they are reported infrequently and/or are not comparable across countries. These indicators and their limitations are discussed below.

A first indicator considered would have built on an occupational classification related to statistics. This indicator would have provided a measure of the increase in *percentage of population working as Statisticians and related occupations* that require similar skills. The data source for this indicator would be the Demographic and Health Surveys (DHS). It would have offered the advantage of relying on data that is publicly available, dates back to the 1990’s and is conducted in approximately 90% of International Development Association (IDA) borrowing countries every five years. Mathematicians and statisticians often make for less than 10 respondents per survey and changes in this figure are more likely driven by sampling variation than by a shift in the number of mathematicians. It is therefore useful to widen the focus from statisticians to related occupations. This can be achieved with the O*NET database[6] that contains detailed information on career changers for 858 job categories, based on responses to surveys of large representative samples of workers. For every job category, the database gives the top 10 related job categories. To construct an indicator, one could have used the top 10 related categories to "Statisticians" and also the top 10 categories related to each of those ten categories, which results in a list of 53 job categories. This approach, however, would have suffered from three methodological limitations. First, DHS are conducted every five years but the indicator could have been calculated annually on a rolling basis. Further, only 20 countries use the International Standard Classification of Occupations (ISCO, 1998), which would have made it very difficult to compare occupations across countries. Second, there were general issues in terms of using this indicator for advocacy purposes. To begin with, it is not clear that a society benefits from more mathematicians, at the expense of, say, teachers or doctors. Finally, there was a conceptual issue in that we need to assume that having more mathematicians also makes the teacher and doctor more numerate.

A second indicator that was considered would have relied on a Global dataset on education achievement. It would have measured improvement in *primary and secondary school mathematics test scores*, based on The Global Achievement Data (Angrist et al, 2013). This data source is a panel for 128 countries in 5-year intervals. It links the international achievement tests PISA and TIMSS with regional ones such as SACMEQ, PASEC and LLECE to make student achievement globally comparable. A clear limitation of such an indicator is that the dataset only covers 22 of the 77 IDA/Blend countries. In addition, for 2010, there is only one data point available: all IDA/Blend countries except for the Kyrgyz Republic are missing. The data availability in 5-year intervals, lack of IDA country coverage and data for the 2010 round, would have made this indicator insufficient for use as logical framework indicator. Furthermore, the dataset was assembled as a one-off exercise and it is not clear whether data for 2015 will be available from the same source.

**3 Methodology**

A PARIS21-led Task Team has developed a first composite indicator to measure global statistical literacy. The indicator provides an indirect measure of the use of and critical engagement with statistics in the media using articles from daily RSS feeds of the top five national newspapers. The indicator has been updated annually since 2016.

*Data sources*

To measure statistical literacy empirically, we turn to references to statistics and statistical fallacies in national newspaper articles that are accessible online, in line with the work done by Watson and Callingham (2003) in terms of scaling. This is essentially for three reasons.

- First, and foremost, while there is some gap between journalists’ perception of statistics, which is reflected by statistics reported in news articles, and the demand for statistics in the audience, the writing of journalists can be seen as an image for a nation's demand for statistical facts as well as the depth of critical analysis. In any case, in most parts of the world, it largely reflects the nation’s consumption of statistical facts as well as the level of critical analysis of statistics offered to a country’s population.
- Second, newspaper articles are generally available, most of them online, which makes them representative for a country's literate population and easily accessible for text analysis.
- Lastly, alternative data sources are either not representative (e.g. Google Trends searches related to statistics; downloads of statistical software packages) or are reported infrequently and/or not comparable across countries (e.g. job categories related to statistics; regional numeracy assessments).

The indicator used is a three-dimensional composite indicator of the equally weighted percentages of national newspaper articles that contain references to statistics at statistical literacy level 1, 2 or 3, respectively, following the scale defined in Table 1. The three levels are not mutually exclusive. For each of the three levels, we obtain the share of documents that match the classification, country per country. An overall measure for statistical literacy is then obtained as the sum over the three shares. Specifically, the methodology classifies keywords used in each article into literacy levels 1 to 3 based on three corresponding keyword lists, so that for each of the 3 levels, there is a different denominator of newspaper articles that is analyzed (see below for a precise description of the keyword analysis). Each keyword list contains different terms referring to statistics and statistical fallacies, and the use of one precise category of keywords by one newspaper article allows for defining one level of statistical literacy.

The empirical instrument is still under construction and the preliminary results described here are helpful to improve the quality of measurement. To establish the validity of the measure, the classification of articles will be further validated by analysts at National Statistical Offices (NSOs).

*Text mining techniques*

This subsection summarises the keywords used in the analysis, and the sources used to define the appropriate keywords. It also provides examples of keywords defined for each level of statistical literacy. Keywords are derived from major statistical data sources and refer to wide categories of indicator, based on standard internationally adopted by NSOs, International Organisations, books, articles and glossaries specialized in statistics and statistical fallacies (examples are the OECD Glossary of Statistical Fallacies or the Glossary of Statistical Terms by the University of California, Berkeley, for English keywords; or the Glossário Inglês-Português de Estatística for Portuguese keywords). The detailed list of keywords used in the analysis, data source and preliminary results are available from in Appendix A.

The study further used the World Bank's WDI database (World Development Indicator) to extend the initial keywords list and added a blacklist of keywords to disentangle ambiguous meaning of acronyms (such as IPC for instance, which stands for both 'indice des prix à la consommation' and 'International Paralympic Committee). The reliability and validity of the keyword lists will be further tested during the implementation of validity checks (see below).

** Note:** Keywords have been translated in all four languages used for the indicator. Text mining techniques, as word stemming, were applied to all keyword lists and news articles before proceeding with the analysis. For articles, stop words were removed and characters are converted to lower case.

Level 1: Consistent, Non-Critical Use of Statistics

**Data source:** Daily, top 5 news articles for publishers who

- have registered their RSS feeds with this service,
- publish in either English, French, Spanish or Portuguese, and
- use the country's top-level domain, e.g. '.sn' for Senegal, for their website.

**Keywords:** articles are considered a good fit for this category if they contain words from one of the following lists:

__Keywords indicating data sources__- word sequences of length two, derived from list of all NSO names worldwide
- main statistical data sources, such as 'population census', 'household survey', 'geospatial data', etc (cf. Espey et al., 2015)

__Keywords indicating a statistical indicator:__- GDP, CPI, etc. based on the World Development Indicator database’s ‘Economy and Growth’ category. This list is currently being extended using additional keywords from other categories.

__Keyword list from statistical capacity building projects__

**Example:** Level 1. consistent, non-critical

__Sentence:__“The report indicates tobacco use has increased since the Kenya*Demographic Health Survey*conducted in 2008-09, which found 19 per cent of men and 1.8 per cent of women use tobacco*.”*__Source:__The Star, Kenya

Levels 2 and 3: Critical engagement with Statistics

**Data source:** Daily, top 5 news articles from a Google News search for either: 'statistics', 'data', 'study', 'research', 'report'. For publishers who

- have registered their RSS feeds with this service,
- publish in either English, French, Spanish or Portuguese, and
- use the country's top-level domain, e.g. '.sn' for Senegal, for their website.

**Keywords: **articles are considered a good fit for this category if they contain words from one of the following lists:

__Critical mathematical engagement:__**List of statistical fallacies**: based on books, articles and websites that discuss statistical biases and fallacies

__Critical non-mathematical engagement:__**List of adjectives to assess the quality of research studies**: based on synonyms and antonyms for 'accuracy', 'reliability' and 'validity' (cf. Pierce, 2008)

**Examples:**

__Level 2. critical__

**Sentence:**“Dr Barres admits a definitive*scientific*conclusion for how these epigenetic changes affect the gene is not yet*scientifically*known.”**Source:**Citizen Digital, Kenya

__Level 3. critical mathematical__

**Sentences:**“Without going to the details of the*statistics*, the final results found […].*Sample sizes*were calculated at regional level in order to estimate global acute malnutrition with a desired precision of between 2-4 percent with a design effect of 1.5.”**Source:**Daily News, Tanzania

*Limitations*

The data source has several limitations that are usefully addressed. First, and foremost, our hierarchy of statistical thinking into three stages of skills (progression of non-rigid levels of statistical understanding based on the SOLO taxonomy) creates a scale that has widely been validated empirically as a measure of statistical literacy. Nevertheless, the indicator is measuring a count of terms specifically referring to each level of literacy, whereas literacy would also need to be tested against the “appropriateness” of the terms used, in context. Therefore, the measure is conditional on the assumption that statistical terms are appropriate for the context they are used in. This assumption is essential to a fully automated process allowing a daily collection and analysis of newspapers articles.

Second, the current implementation is limited to the four most widely spoken languages globally (English, French, Spanish and Portuguese) and thereby ignores local languages. Extending the analysis would require software that allows word stemming and stop word removal in these local languages. An initial analysis of newspapers coverage nevertheless reveals that a vast majority of countries have national newspapers available through their RSS feeds and written in one or several of these four languages.

Third, newspapers and blogs are only a subset of national media. Radio and TV, however, cannot easily be captured in machine readable format. New promising tools, as the Radio Analysis tools[7] developed by Pulse Lab Kampala and the United Nations in Uganda, could maybe fill this gap in the coming years. Radio data could for instance be useful in the future to do a robustness check to see how the use of statistics differs in urban areas – that have access to (online) newspapers – from that in rural areas and illiterate populations. Moreover, automated text analysis does not cover visualized data, such as graphics and tables, an important way of presenting statistics in news media.

Finally, while based on high-level glossaries and internationally acknowledged statistical data sources, the keyword lists used for the analysis are subjective.

**4 Results**

*Scope of the indicator*

The purpose of the indicator is to set and monitor targets and report on them annually. Target countries comprise all International Development Association (IDA) borrower countries, of which 65 countries were analysed this year.

Starting from 1 Jan 2018, a total of 125 000 articles were analysed for the use of statistics. This corresponds to an average of 900 articles per country for the period until 31 December 2019.

The aggregation score for each country is simply the sum over the three dimensions (ranging from 0 to 300): three-dimensional composite indicator of the equally weighted percentages of national newspaper articles that contain references to statistics at statistical literacy level 1, 2 or 3, respectively, following the scale defined in Table 1. For each of the three levels of statistical literacy, the resulting score gives the percentage of articles that contain at least one search term from the keyword lists defined previously. The score for each level thus ranges between 0 and 100 and the maximum total score over all three levels is 300. The results in Figure 1 are presented by language groups to allow for a direct comparison between countries for which the same keyword list was applied.

There are 892 general news articles (corresponding to 5.32 percent of all articles) that cite statistics (Level 1) and the 2492 research-related articles (equivalent to 12.89 percent of all articles) that demonstrate a critical engagement with statistics (Level 2 and 3).

Results also suggest a high level of “basic” use of data in newspapers (figure 2), with important differences in the performance across regions. The statistical literacy scores also show a positive correlation with TIMMS international scores, suggesting a link between statistical literacy in schools and in the media.

**5 Conclusion**

The results presented in this paper are preliminary results. The following steps are currently undertaken to improve the robustness and external validity of the indicator.

- Additional validity checks will be implemented to test representativeness of RSS feeds articles coverage by newspapers by country using figures on average circulations (copies per day). Circulation figures are applied to weigh the contribution of newspapers to the country-level score.

[1] Critical perspective towards statistics is promoted by numerous professional organisations, among which the National Council of Teachers of Mathematics (NCTM) and national curriculum policy like The New Zaeland Curriculum (Ministry of Education of New Zealand, 2007). In particular, the International Statistical Institute has launched the *International Statistical Literacy Project*.

[2] According to the Oxford Dictionary of Statistical Terms, statistics is “the study of the collection, analysis, interpretation, presentation, and organization of data” (Dodge, 2003).

[3] Approach equally supported by Biggeri and Zuliani (1999) and Watson and Kelly (2003): “Statistics requires the basic understanding of statistical concepts…whereas literacy requires the ability to express that understanding in words, not in mathematical formulas.”

[4] Namely data collection, data tabulation and representation, data reduction, probability and interpretation and inference.

[5] Proposing four levels of thinking across four key constructs for young children’s thinking.

[6] O*NET is a labor market information tool intended to facilitate matches between jobseekers and employers in the United States.

[7] The United Nations initiative Pulse Lab Kampala is developing a tool to analyse radio content, currently tested in Uganda. The tools involved the development of speech technology for three African languages (Uganda English, Luganda and Acholi). For more information on this project, see http://radio.unglobalpulse.net/uganda/