Enterprise Data Science. It’s complicated.

Binghao Ng

2021-07-20 00:21

In 2017 I wrote a blog post on how data science efforts should be about building representations and nothing else. Now in 2021, four years older, I have different thoughts. I have come to realize that my thoughts four years ago were too myopic, too technically oriented. As it turns out, enterprise data science, like many things in life, is complicated.

The challenge, as it seems to me now, is not technical at all. It's not about building representations or not. It's not about how you deploy your models or whether your applications are containerised. It's not about whether you use big data frameworks or not.

To me, the greatest challenge lies in the way of thinking. The question is, are the business users or even the organisation ready to accept a quantitative way of thinking? Because without that, machine learning projects will remain PoCs, experiments or Powerpoint slides.

I say this because once a quantitative approach to problem solving is adopted, it is often very human-lite. Yet everywhere I look, I see system designed for intimate human involvement. Systems are designed so that humans can look at the data. What for? If we design systems this way, then we are not systemising human knowledge. We are still depending on the human to be there every step of the way. It will not scale.

Not that I feel that systems should be fully autonomous. Note that I didn't say human-less. Humans can be there to impart domain expertise and design strategies. And once that is done, the machine should take over the execution. To do so, we need to approach problems in a systematic and quantitative manner.

This is why I no longer believe that machine learning projects should be a certain way. I believe that all efforts to harness the power of machine learning should be directed towards changing mindsets and way of thinking. Importantly, companies need time to discover the problem statements that are important to their business. Only then can humans and machines work together, each doing what they do best.

There is more to be said about whether humans need be in the loop at all or whether a quantitative manner of solving problems is the right or only way to go, but that's another story for another time. For now, let's just say that the company has decided it wants to use machine learning to help its business, the only question is "how".

In my opinion, "machine learning" projects could be small, simple or even involve no statistical learning. The key is to nudge people towards a more quantitative or data-driven way of thinking and help people understand the ability of machines to help people achieve unprecedented scale of work.

Some might argue that this would cause a lot of wastage of resources. I would say that this is a necessary cost for an organisational mindset change.

thoughtsrecurring