*The many ways that people manipulate data, graphs, and other quantified information are astounding. To understand statistics of market inclusion, it is important to consider it in the context of distribution. Do we want to shift the distribution to the right such that everyone moves one step over although the distance between the lowest and highest doesn’t change? Or do we want to reduce the gap between lowest and highest? Mathematically these changes arise from very different kinds of processes and therefore the outcome we desire will call for a very different strategy. *

If you haven’t heard the saying “lies, lies and statistics,” then you haven’t been listening. The many ways that people manipulate data, graphs, and other quantified information are astounding. Different presenters can have an audience believing entirely different narratives, all backed up by the very same data. What magic is this? Nothing more than choosing a scale, variable, or pieces of the data that best illustrate a predetermined message. Data, it turns out, really can lie.

**The disappearing normal distribution **

This is especially true if the distribution of data is not a normal distribution or bell curve. A bell curve is what most people think about when they imagine how data might be distributed. Most of the data points fall near the middle, with standard deviation from this midpoint. Height would be a good example of a bell-curve distribution. Though there is some variation, the average height is a pretty good representation of how tall people really are. We all fit in the same seats, sleep on the same size beds, and fit through the same doorways.

Not all data follows this model. Imagine if instead the distribution fell into a model with a decreasing frequency distribution – called heavy tailed or long tailed distributions like the one below. The average would still be somewhere between 5 and 6 feet, but what does it mean?

**The failure of the average**

In this particular example, most people would be less than 1 foot tall. Then, among the 1-footers, there would walk a few giants – people as tall as 50 feet – and a decreasing proportion of people that fit in between these heights. The average would still be about the same, but what it means is entirely different.

In this scenario there is no one size fits all. It would be impossible for a randomly selected group to be comfortable sitting at the same sized table. The enormous giants the height of multi-storey apartment buildings would have to live in a different sort of world than their underfoot counterparts.

If you were to report the average for this kind of distribution it would suggest something very different from the scenario above. If one were to assume the average as‘representative’and build furniture for this ‘average’ size for example, you would find that only a small fraction would actually find it size appropriate. For most people the furniture would be too high to climb into and for others it would be too small to use. Your ‘average’ designed product would fail to attract sufficient customers.

Computing the measure of standard deviation would provide some hints – it would be large suggesting that there was a wide spread, but to really figure out how to serve such a population you would have to analyze this data in other ways rather than reducing it to the standard statistical measures invented for the bell curve.

**Markets follow heavy tails**

Why should we care about this type of distribution and modify our use of statistics?

Because unlike height distribution, which are bell curves, virtually all over the world the distributions of aspects of human society such as incomes, wealth, product demand, employee productivity, usage of product features and various others are long tailed and not a bell curve.

Why is this so? Because long tails naturally tend to emerge for features that are embedded in systems that are interconnected. For a dataset to distribute as a normal distribution, each element must be independent of the other. In the case of income for example, each person’s income would have to be independent of that of anyone else’s. This means that it cannot depend in anyway on the market – which represents our collective income. In our interconnected monetary system this can never be the case.

Similarly, product demand will distribute as a long tail and probably your company’s branch performance as well. If you are a bank or MFI or have some distributed operations, plot the performance across your locations and most likely you will find not a normal bell curve but a long tail.

**The agenda of inclusion**

When we consider the agenda of inclusion, we have to consider it in the context of this distribution. A shift in the average income for instance could simply mean that the tail is getting longer and a few more billionaires have emerged while the rest of the population remains where they were. So also for metrics like average expenditure on certain products or services. For whatever the situation, it is important to know: Do we want to shift the distribution to the right such that everyone moves one step over although the distance between the lowest and highest doesn’t change? Or do we want to reduce the gap between lowest and highest? Mathematically these changes arise from very different kinds of processes and therefore the outcome we desire will call for a very different strategy.

**Next time you see an average number in your company, ask to see the distribution of the data!**