Rise of Data Science

Radically accelerated by the advent of cloud computing and devices, a role has begun to develop that will flourish in the coming years, and I am convinced that it will have a major impact on our lives.

New technologies often usher in new disciplines; they typically begin as a chaotic area of focus, with all sorts of people falling into them from different backgrounds.  Over time, they take on structure, books are written, educational and training programs develop, and they turn into a mature discipline.  That’s what happened when the Web was created – building a web site requires a mix of skills that draw from what had been quite separate worlds of activity: art and visual design, image processing, and programming (among others).

The same arc happened a few decades earlier when programming was invented – it drew from fields like mathematics, engineering, and linguistics.  It attracted people from those fields and many others (including more than a few high school students who were supposed to be doing something else!).

This new field hasn’t been officially named yet, but one of the terms that people are using for it is “data science”.  I’ve been diving into it pretty deeply for our startup, and some remarkably interesting work has been happening over the past several years.

What Does a Data Scientist Do, Anyway?

As you would expect from an emerging discipline, people don’t agree yet on exactly what it is all about.  But the fundamental idea is that enormous bodies of data are being gathered through software, and somebody has to make sense of them.  The analysis can influence decisions that people make (“hey, this version of our web service gets 15% more people to sign up for an account than the other one”) and decisions that software makes (while browsing items on Amazon, the web site will tell you that people who bought this product also were interested in …).

A data scientist is somebody who figures out what data to gather, how to analyze it, and what to do with the results of that analysis.  The discipline combines ideas from areas like statistics, machine learning, mathematics, databases, and psychology.

What’s it good for?  Well, here are just a few ways it is being used today:

  • The magical ability of Google search to find what you need from a couple of words and no other hints.  Compare that experience to what you typically get from software – you usually have to tell applications in painfully explicit detail exactly what you want, in very tightly scripted sequences of commands, and it can be extremely frustrating if the programmers haven’t anticipated what you want.  With Google, you type just about anything into the search box, and with incredibly high probability, it will give you a useful set of answers.
  • The ability to recommend things that are likely to interest you.  Amazon is very, very good at helping you find a book you want on any subject under the sun, through a combination of search and recommendations.  Netflix has gotten to the point where 75% of the shows that people watch on their streaming service come from a recommendation
  • Web sites present users with multiple versions of their product simultaneously, watch how users react, and pick the best one.  Large web companies are running dozens or hundreds of these A/B tests simultaneously and are updating their product daily based on the result.  I used to ship large packaged software products to enterprises, and we would conduct a manual poll of our users years after we shipped to try to figure out whether they used the product and what they did with it – the results were very spotty, very late, and highly inaccurate.  It’s like trying to drive by covering the windows of your car with black paint and having somebody write you an occasional letter about where you are and the condition of the road.

Those are just a few examples – almost every Web-based company depends on data science as its lifeblood to make its product come alive for users and to run its business internally.

The Future

What’s being done today with data science, while impressive, just scratches the surface.  The current economic models have only begun to evolve.  And many parts of our lives remain deeply inefficient and filled with friction:

  • Transit is very wasteful – guessing about traffic patterns, individual drivers maneuvering 3000 pound chunks of metal with dubious competence.
  • Integration of medical carediagnosis, and monitoring our bodies, remains technologically primitive.
  • Energy use is highly inefficient, partly because we have little idea how to optimize or the implications of our decisions.
  • Education hasn’t improved much in the last few hundred years, when President Garfield said that the ideal college was a famous teacher (Mark Hopkins) at one end of a log and a student at the other.  It’s arguably the most important competency of a successful nation in the modern age, and our system (in the United States at least) is hardly flourishing.

Along with much of the economy, these areas are ripe to be transformed, and I am convinced that data scientists will be at the heart of that transformation (for good and for ill!).  If you’d like to learn more:

It’s a discipline that I think anyone involved in technology should understand at some basic level.  Pretty much whatever you do these days, there are probably large quantities of data being generated around it that can be mined for insight.  You want to leverage this power, to make your own decisions and to create a great experience for people using your software.  It’s going to continue to transform the world over the coming years .. and maybe you can become a real-life Hari Seldon.

Leave a comment