Data Science in Practice: Real-world Applications

Charlie Parker said "you've got to learn your instrument. Then, you practice, practice, practice. And then, when you finally get up there on the bandstand, forget all that and just wail." Data science works the same way. Spend a year learning the math, learn the tooling cold, then walk into a room with a business problem and forget all of it because the only thing that matters now is the question in front of you.

Most of what gets called data science is actually data plumbing with a thin layer of statistics on top. I don't mean that as an insult. The plumbing is what determines whether anything you do later matters, and most of the people I've seen fail at this work failed because they wanted to skip straight to the model.

I want to talk about three projects where the data science was less interesting than the work that surrounded it, because I think that's the honest version of the field nobody tells students.

The first was at HP Tech Ventures, evaluating startups for investment. The framing was "use data to identify which startups have the highest potential." That sounds like a modeling problem. It is not a modeling problem. It is a feature engineering problem dressed as a modeling problem, because the actual question, "is this team going to ship a thing people want," has no clean numerical proxy. What I ended up building was a multi-dimensional framework that scored startups across industry-specific metrics, growth trajectories, team composition, and market timing. The model itself was almost beside the point. The work that mattered was figuring out which signals were even worth measuring for a seed-stage health-tech company versus a Series A enterprise SaaS one. Different industries, different lifecycles, different versions of "doing well." A model can't fix the wrong feature set.

The second was at Beats by Dre, doing NLP on customer reviews. The brief was "tell us what customers are saying." The temptation, when you have 2,000 reviews and a fine-tuned BERT model, is to produce a sentiment dashboard with a positive/negative/neutral breakdown and call it a day. I built that, briefly, and the dashboard was useless. It told the product team that customers were 73% positive about the product. The product team already knew that. They had a sales chart.

What actually moved the needle was aspect-based analysis. Instead of "are customers happy," the question became "what are customers happy or unhappy about, specifically." Battery life. Ear comfort during workouts. Bluetooth pairing on the third device. Once the framing shifted that way, the analysis stopped being a vibe check and started being a backlog. The product team had something to do with the output. That's the test. If the output of your analysis doesn't change anyone's behavior, you didn't do data science. You did a presentation.

The third was at Prayas Entertainment, doing customer segmentation and demand forecasting for a live-events business. This is where I learned that the value of a model is sometimes inversely proportional to its sophistication. I tried fancy clustering at first. The clusters were technically valid and operationally useless because they grouped customers by patterns that nobody in the business could act on. Threw it out and built a much simpler segmentation based on three things the team could actually do something with: how recently someone bought a ticket, how often, and at what price point. RFM. Forty years old. Worked.

The lesson I keep coming back to from all three is that data science is a translation job, not a modeling job. The skill is converting a vague business problem into a question that has a measurable answer, then converting the measurable answer back into something a non-technical person can act on. The model in the middle is usually the easy part.

A few smaller things I've come to believe:

The data is always worse than you think. There is no exception to this rule. If someone tells you the data is clean, they have not looked at the data.

Domain knowledge beats model sophistication, almost always. The person who has done the work for ten years knows things your training set doesn't. Talk to them first.

Most analyses should end with a single sentence that someone can put in a meeting. If you can't write that sentence, you don't have an analysis. You have a Jupyter notebook.

The tools change every two years. The judgment doesn't. Invest accordingly.

The reason I still find this work interesting after a decade of doing it in different forms is that the translation problem never gets solved. Every business is its own dialect. Every dataset is its own kind of broken. You learn to listen for the pattern, but the pattern is always slightly different.

If you're early in this field and trying to figure out what to focus on, my honest advice is this: the modeling will take care of itself. The hard part is learning to see what the actual question is, and that you only learn by getting it wrong a few times in front of people who care about the answer.

Data Science in Practice: Real-world Applications

Continue Reading

Building Intelligent Automation Systems

The Future of AI in Business Operations