What do the Bollywood film Swades, the Oscar winning Spotlight and the Pakistani hit Waar have in common? They all have a rating of 8.0 on the Internet Movie Database (IMDb), a website that aggregates user reviews and ratings to determine the quality of a film. If Swades, Spotlight and Waar have the same IMDb rating, does it mean they are equally good? Any person will tell you this is subjective and dependent on personal tastes and preferences; once each individual factors in their own inclinations, they can rank the three movies in any way they like.
Moreover, why is it generally considered that a film with an IMDb rating greater than or equal to 7.0 is good? Or, to use another analogy, why is an average of 50 runs per innings considered to be exceptional for a batsman? The answer is because we subconsciously study similar data points to help us establish benchmark values for poor, average or good. Once we establish these references, we then apply our own experience and preferences to contextualise statistical metrics – and in the case of films, we either rely on word of mouth or watch them.
These are basic examples of how we knowingly and unknowingly leverage statistics to derive meaningful conclusions. The ability to derive actionable insights from a set of numbers is quickly becoming a critical skill in the fourth industrial revolution and big data; advanced analytics and AI have become buzzwords. Given that upcoming systems will be built on these elements, the nature of work today requires continuous learning and the ability to respond appropriately to new information. Today, we need to speak the language of ‘data’ fluently.
Yet, despite this need, the industry is struggling to become ‘data literate’. Gartner, a leading research and advisory firm, surveyed over 250 top level data and analytics individuals from across the world and concluded that poor data literacy was the biggest obstacle in creating a data-driven culture. Despite these concerns, creating a data literacy programme was only the 12th most critical success factor for these top level analysts. To be considered data literate, one has to understand firstly how data is sourced. Secondly, what analysis was run on it and thirdly, how to contextualise data to create value.
Data fluency requires an exact understanding of what is measured. In the case of IMDb, we understand the overall score is an average of the amalgamated individual ratings. You may feel that Bollywood films have ‘inflated’ ratings compared to their English counterparts and there is a certain truth to this, as South Asian countries tend to use extreme end of scales for their ratings (think of how we give ‘full’ 5 stars for a ‘normal’ ride in Careem). The fact that we either overlook or misunderstand the definition of a metric is the biggest reason why employees struggle to contextualise data which they do not come across on a daily basis.
A good example would be to define something more complex and subjective – for example, literacy. According to the latest Pakistan Economic Survey, the national literacy rate has increased from 44% in 1998 to 62% in 2018. Literacy is defined as the “ability to read and understand simple text in any language from a newspaper or magazine, write a simple letter and perform basic mathematical calculations (counting and addition/subtraction).” However, the Annual Status of Education Report 2018 has pointed out that only 65% of Grade 5 students in urban areas and 53% in rural areas are able to do two-digit divisions.
I wonder if the readers of this magazine would set the bar for their children being literate at understanding simple text and being able to count? The former head of the Higher Education Commission, Dr Atta-ur-Rehman and physicist Dr Pervez Hoodbhoy agree on the need for a more comprehensive definition. While the literacy rate is important, it is equally important to measure the quality of the education, based on cognitive skills such as creativity and reasoning. The purpose of this example is not to point out whether the government is right or wrong, rather to illustrate that statistical metrics have limitations due to their definitions.
It is important to remember that although numbers never lie, they can mislead. Statistics can be exploited to validate erroneous claims. Boris Johnson’s famous pro-Brexit claim about the UK paying the EU “£350 million a week” was deemed as a “clear misuse of official statistics” by the UK Statistics Authority, because it failed to take into account the rebate the UK receives on this amount and the impact of the EU on Britain’s economy. Therefore, it is easy to create a sensationalist headline by stating that the £350 million could easily be redirected to the UK’s health service instead.
Likewise, organisations should be wary of the statistical metric they set as targets. According to the State Bank of Pakistan, of the 39.7 million registered branchless bank accounts, only 55% are active (i.e. at least one transaction was conducted in six months, although you may argue this should be made more stringent). A big reason for this inactivity was the fact that the agents earned a commission for each registration, resulting in account openings to maximise their earnings. This is an example of Goodhart’s Law, which says: “When a measure becomes a target, it ceases to be a good measure.” People tend to exploit any loopholes to meet their targets, sometimes to the detriment of an organisation.
Freakonomics is a podcast focusing on socioeconomic issues hosted by Stephen Dubner, the author of the bestselling book of the same name. In a recent episode, a survey revealed that only two to four percent of the podcast’s audience used trigonometry, geometry or calculus on a daily basis while the corresponding figure for using MS Excel or Google Spreadsheets was approximately 70%. The host summed up the results by saying “it’s embarrassing that we teach a math curriculum that nobody pretty much is using.” I feel this applies to Pakistan, especially in the professional set up where the proportion of regular geometry or calculus use compared to Excel would be similar to the results of the podcast survey.
The level of data fluency required varies by function. For example, marketing teams may use data to gauge the effectiveness of their communication while HR will focus more on employee engagement metrics. Gartner expects that by 2020, 50% of organisations will lack sufficient AI and data literacy skills to achieve business value, despite the fact that 80% of these organisations would have initiated competency development in data literacy in order to overcome extreme deficiencies. Considering how organisations are harnessing the power of analytics, poor data literacy will soon become an inhibitor to growth.
Sixrty-five percent of Freakonomics listeners said they wished they had learned more about how to analyse and interpret data in order to discover hidden insights. Innocuous statistical concepts like a proportion or average form the foundation for an overwhelming majority of our everyday analysis. The first step is to begin working towards an understanding of what exactly is being measured by a metric.
Ans Khurram is an insights professional working in the telecommunication industry in Pakistan. firstname.lastname@example.org