What’s Bias Got To Do With It? Part I

James Bailey
DSAi
Published in
5 min readMay 17, 2021

--

Image Courtesy ofGerd Altman, Pixabay

Performing Without Bias in Data Analytics.

As we all know, cognitive bias exists in almost every social interaction, including in data science projects. We are in the midst of a public health emergency and a slow-moving economic disaster; combine this with a dash of social upheaval and cognitive bias becomes rife in areas where objective scientific results usually prevail. We are inundated with ‘fake news’ becoming increasing indiscernible from the real thing, real statistics delivered out of context, and social prejudice on those who try to seek the wood through the trees. So, in the office, we must be conscious of this environment and deliver messaging which needs to be based on emotional insights you may have into the target audience. This is not to say that you must manipulate your message to push a particular envelope, but instead attempt to steer your audience away from their preconceptions and present balanced evidence for an educated decision. Doing the former subscribes us to the chaos which threatens legitimacy of scientific findings, whereas the latter strengthens the current knowledge base and builds on it to progress our organisations. It also avoids your own bias clouding your communications and helps you identify your target audience’s cognitive bias.

Below I will outline some of the types of bias I find most interesting, and ways to combat them if you see them in your projects. We will start with my personal favourite: anchoring.

Framing bias or anchoring

As defined by the Corporate Finance Institute, framing bias occurs when people pass judgement based on the way the information is presented, as opposed to just on the facts themselves. This type of bias can not only be seen in data science projects, but in almost every sales environment. Consider earnings results, it is much easier to digest an improvement quarter-on-quarter than a shortfall in expectations! Your judgement on the earnings is being framed in the context of the previous quarter or the expectation as a result.

Back to our data science team, and we may see this phenomenon play out in the form of larger versus smaller risks. When escalating issues either with existing models or through new insights, it is important that we apply the right level of urgency and importance to each finding. This can be challenging when one issue clearly overshadows another, for example, a customer data integrity issue vs a regulatory reporting data integrity issue, which may lead to regulatory breaches. The relative scale of the two often takes managers to triage for likelihood and scale of immediate impact. Servicing the regulatory data integrity issues (low likelihood, high impact), which may lead to regulatory breaches, will avoid immediate punishment but if customer data integrity issues (low likelihood, low impact) are not addressed in time then this could deteriorate quickly. Poor customer data may lead to lower sales, poor customer experience, or even financial remediation for malpractice. This may eventuate to a downfall of profits potentially greater than the immediate pecuniary fine which would be received for a regulatory breach. The solution to this is to voice the concern and illustrate impact over time. While it is easy to imagine immediate impact, it is difficult to play the scenario out and imagine a ‘net present value’ of an investment in labour. If the final decision is made to concentrate resources on only one of these issues while neglecting the other, then at least the data scientist has made themselves clear using the evidence they are the most connected to of anyone in the business: data. This of course screams for the continuous participation and review of stress testing programs, but I will save that for another conversation.

Normalcy bias

Normalcy bias makes issues seem insignificant by framing them in the history of the process within which the issue exists. One of my pet peeves when working on a data project is a subject matter expert telling me ‘that has always been there’ or, ‘we have always done it this way’. This is the corporate equivalent of the famous high school excuse ‘it was like that when I got here’. I believe that if someone is affected by an issue with a dataset or a model, and not directly working on fixing it, then they are as accountable as the people who created the issue in the first place. The ‘new normal’ does not extend to some immutable blip in a critical data set or mistreatment of a feature by a model’s reporting output. The reason I am so passionate about this is because I am seeing a lack of documentation, or lack of wanting to document issues, affecting my role more and more. This leans on a fundamental logical fallacy where everyone else affected by it believes it to be temporary and so does nothing about it to preserve energy better spent by a taskforce. However, lack of documentation leads to proliferation of poor reporting by those who are unaware of the nuanced data/model and things are only temporary if a project is initiated to fix them. Obviously, the result often is that the issue becomes ‘temporary’ for years until a brave analyst decides to raise and remediate it once and for all.

In summary, I have made it my mission to not excuse any issue (no matter how small) as business as usual. Enabling excuses is akin to accepting mediocracy and a hinderance to the growth of your organisation through data-driven decision making. It is the opposite of promoting accountability and instead, promotes poor quality insights due to fundamental bugs which could be an easy fix. I’m sure we can all agree that when a dashboard is presented with lots of exciting visualisations, it’s tempting to peel back the covers and marvel at the data scientist’s work. If that work contains a piece of script which is longer in it’s treatment of data integrity issues than in the development of the original purpose of the dashboard, then it can be disappointing at least and disheartening at most. Each of us strives to leave our data environment in a better place than where we found it, so the next generation of data scientists can innovate and grow the firm. However, teaching others that data or model integrity issues are a way of life is not conducive to an encouraging environment.

References:

“5 tips for identifying — and avoiding — cognitive bias during a crisis”, 2020, Ragan’s PR Daily (Julie Wright),

<https://www.prdaily.com/5-tips-for-identifying-and-avoiding-cognitive-bias-during-a-crisis/>

“A. N. C. — It Matters Now More than Ever”, 2020, Wright On Communications (Grant Wright),

<https://wrightoncomm.com/aviate-navigate-communicate-matters-now-more-than-ever/>

“What is Framing Bias?”, 2021, Corporate Finance Institute,

<https://corporatefinanceinstitute.com/resources/knowledge/trading-investing/framing-bias/>

--

--

James Bailey
DSAi

Quant on the inside / Creative on the outside