Published in May-Jun 2021

Connecting Conflicting Data Points

Ans Khurram on rationalising conflicting sources of data.

Hootsuite, a global social media management platform used by 80% of Fortune 100 companies, and We Are Social, a leading creative agency, release an annual digital report at the start of the year that is widely read in the industry. The report draws upon multiple sources that include state regulatory authorities, social media platforms and global agencies (like the UN) in order to provide worldwide findings on social media and digital trends. It includes metrics, such as what percentage of the population uses phones or the internet, as well as estimates of how many people use specific social media platforms.

The Digital in Pakistan: 2019 report stated that only 22% of Pakistanis used the internet – a figure lower than the 31% reported by the Pakistan Telecommunication Authority (PTA) during the same time period. PTA rebuked the report in an official statement: “Contrary to facts, the report depicts much lower figures of internet users and penetration (mobile and fixed) in Pakistan”. After this statement, Hootsuite’s Digital in Pakistan: 2020 published numbers that were closer to the PTA data. However, the 2021 Report reduced the number of internet users yet again, citing changes in data sources and cautioning that the previous two reports are not comparable. Digital in Pakistan: 2021 estimates that 27.5% of the population uses the internet, while PTA say it is 46%.

Why is there such a sizeable gap between these two platforms; which figure does a casual observer trust and how do we interpret these seemingly disparate numbers? As the world moves towards making data-driven decisions, individuals and organisations face this dilemma on a daily basis. In other words, is there any platform one could consider as a ‘single source of truth’? Below are a few tips on how to reconcile multiple sources of information, especially if the data is conflicting.

Focus on the Trends and Establish Benchmarks

When it comes to differences in figures, the true value is likely to lie between them. Digital in Pakistan 2021 focuses on individuals; it lists its sources and explains which metrics could be counting people twice. PTA, however, does not define who the ‘subscribers’ are in their data source (they are actually the number of connections, not people). Given that an average Pakistani user has between 1.4 and 1.9 connections, PTA data may be counting the same person twice. Personally, I always focus on the trends within different sources – if the trends correspond, it reinforces a particular behaviour. Both Digital in Pakistan 2021 and the PTA highlight the growing usage of internet in Pakistan – you cannot increase users without also increasing internet connections. It is important to note that neither data source is incorrect; they offer two different approaches to estimating internet penetration in Pakistan. It is easier to measure the number of connections than to arrive at the number of ‘unique users’ as this requires market context and nuance.

Are the Metrics Offering an Apple-to-Apple Comparison?

Always ensure that you read the definition of the metrics in question. A digital product manager once presented this query: “Our internal numbers show that users spend an average of 10 minutes on our app per month, but our third party data platform reports the corresponding metric to be about five minutes.” It turned out the product manager was quoting the average time per session and not per month, and that the platform had a separate KPI for the number of sessions per month. Lo and behold – when the two metrics were multiplied, the external platform also reported an average usage time of about 10 minutes per month. The above example may sound obvious, but it can be more complex in other cases. A big challenge for organisations is comparing internal sales data with retail measurement studies conducted by external market research partners. Geographic boundaries are usually different as third party reports follow government defined boundaries while organisations have internally defined geographies based on business needs. Another issue for organisations is that sales represent both physical and electronic channels, while retail measurement studies (as the name suggests) often focus solely on physical channels in Pakistan.

Different Sources Cater to Blind Spots and Help Build a User Journey

Every data source has its blind spots due to varying methodologies and sources; for example, internal sales numbers do not highlight competition performance while market research has limitations, as it is based on a sample. These gaps should not result in disregarding different sources of data; rather, efforts should be made to rationalise them. Given that the majority of sales are made via physical channels, retail measurement studies can provide important insights. Additionally, brand shares determined by surveys of online shoppers can be used as a proxy for electronic channels. I like combining internal KPIs with external data sources to create a ‘user journey’. You can identify an addressable market by looking up government databases or reports from trade associations. Then market research can help you understand levels of awareness and consideration of your product to identify opportunity areas. Meanwhile, you can leverage your internal data to increase customer stickiness. Although the standard definition in the telecom industry for an active customer is one activity in 90 days, cell phone companies grow by challenging this definition: they aim to engage customers by converting them into monthly, weekly and even daily active users.

Allow for Multiple Sources of Truth

As a rule of thumb, it is recommended you have at least two or three indicators to measure a goal. The Human Development Index (HDI), developed by Pakistani economist Mahbub-ul-Haq, uses metrics from health, education and the economy to measure a country’s development. Now, one can argue that ‘development’ is subjective and that the index does not cover many other issues; however, it offers a more wholesome perspective than looking at just one single metric like GDP. A combination of different sources provides a more holistic (although never perfect) output and the reason why the HDI is used as a measure by the UN Development Programme.

Data is usually packaged into silos by different teams and tools and unifying it is a tricky process. A consumer goods company may find that their sales are dropping by reviewing their internal sales reports – but to find out why, they will need feedback from their customers. Only when they connect data from two different sources (internal sales and customer sentiment) will they get the complete picture. Behaviours and trends can only be determined by combining disparate data points – the big picture is always a result of ‘multiple sources of truth’.

Ans Khurram is an insights professional working in the telecommunications industry in Pakistan.