Data Myths: 4 – A data scientist can tell you how much it will cost

Oh how I wish this one were actually true!

Asking data scientists to estimate the cost of a project is like asking a bricklayer to estimate the cost based on his bricks per hour when in reality you’ve got a plot of jungle, a sketch on a napkin and all the bricks are hidden.

Here’s the way this myth works its way into our thinking:  Ola has identified an opportunity to improve company efficiency by using data better.  She needs to present a project to her line management in order to get budget.  The budget holder wants to know how much the project will cost and what the business value will be.  So Ola contacts DataCo and asks them how much the project will cost.   

DataCo say they need more detail about what “the Project” is.  Ola describes to them what she can but can’t answer their technical questions about the systems, databases or infrastructure.  Really, there needs to be an initial exploratory project for a specialist to get into the data and try it out to flush out the real issues.  However, until she can present a number to the budget holder she can’t release any funds to engage in some fact finding work.  Eventually she persuades them to give them her some indicative costs form previous projects and she puts together a business case.  So a completely ficticious number gets submitted to the budget holder and from that point on everyone is about to be disappointed! 

This is because: A data scientist cannot tell you how much it will cost 

Surely any professional should be able to estimate their costs?  Absolutely not I’m afraid.  Estimating cost is an art as much as a science.  Sure a bricklayer can tell you how many bricks he lays per hour, but that only applies if he has an apprentice mixing the cement and carrying the bricks and you got drawings for the house.   

In my experience there are four big elephants that have to be squeezed into the same room: 

  1. The unknown unknowns: If this is the first time you are ever doing a particular thing (and most data integration projects are), then it’s basically research.  There will be loads of things that will need to be sorted out.  These take time, and time means money.   
  1. The politics: This has the effect of distorting our ability to be objective about 1. Without facts estimating the cost is largely guesswork.  Most people prefer to assume that their data will be in great condition rather than price in the cost of fixing it if it’s not.  Someone may have committed to a budget a while ago and is under pressure to deliver something.  A small company may be very keen to win new work. 
  1. Human tendencies:  in my experience very good technical people do not like talking about costs.  And when they give costs they tend to underestimate them and not factor in the costs of time spent in meetings and other disciplines required to support them. 
  1. Defining “done”: the costs to get something to a prototype is 10 times lower than the cost to put something into production.  Validation, training and transitioning something into support takes a lot of time and effort 

written by

Murray Callander

posted on

May 21, 2020

you may also like...