The Physics of Software

Physics – the natural laws defining the behavior of matter and energy – determines  what is possible in the universe.  I have found that engineering projects have a kind of physics as well.  But as technology undergoes fundamental transformations, that physics changes.

Enterprise Servers

Let’s look at large-scale commercial enterprise servers composed of millions of lines of code, which in my experience follow a pretty well defined model.  It’s dictated by the needs of the customer and by the constraints of the problem.

In a typical release that makes real but evolutionary steps forward, you take two years to do a version.  During those two years, you get three coding milestones, each of which is eight coding weeks.  Each developer on the project might do four coding days per work week, three if they are a lead.  Roughly 40% of the team is developers (the rest are testers, project managers, designers, etc).  That means that for every person-year spent by the engineering team, you are getting an average of 18 days of actual production coding on the product.

Ouch – no wonder those products take large teams to build and evolve so slowly.  What’s the deal?  The absolute killer .. the single thing that takes away the most productivity from the team, is the need to test and stabilize the product for the vast array of configurations that customers will use it in.  We called it The Matrix – the multi-dimensional matrix of every possible operating system (at different patch levels), every possible storage system, every possible database that customers might be using.  The combinatorics are brutal.

You have to test and retest and retest your product .. not just that it works, but that it is stable under load, that it performs as well as it needs to, that it fails in the same predictable ways, that it recovers from failure, that it can be managed, and backed up, and diagnosed when it goes wrong, and is secure, that you can smoothly migrate to it from older versions, that it integrates with all kinds of other systems that customers want to use it with, and on and on.

We called these “the basics”, and they soaked up far more time than actually writing the code that makes the product function.  Even if you are a very wealthy company, your test lab can’t possibly have every configuration that customers actually use (which is in fact unknowable, since everybody does weird things to customize their datacenter), so you put out a beta and work with early adopters to deploy and test the code in even more configurations.

The sad thing is that even if you could go faster, customers often don’t want you to – from their point of view, you are going too fast.  Once you finish your shiny new version, you have to get them to deploy the darned thing.  They frequently don’t want to, since the old one is working and they are risk-averse and have other and better things to do.  So you work on them, and gradually persuade some of them (often it takes years, and plenty of them will just skip a version).  Then they do a test deployment.  Then they come up with a migration plan.  Then they gradually spin up the new system and move over to it.  Multiple years after a new release, it is quite likely that the great majority of your customer base is running the older ones.  So you have to support every version until the support window runs out – for Microsoft that is ten years, or longer in some cases where the customers were very insistent.  If you are shipping every two years, that might mean supporting five different versions.

And worst of all, once you finally have the customer running live on your system, you basically have no idea what they are doing with it.   Are they using those features that you sweated blood to build?  Is the product working well and delivering the results they need?  You have virtually no data and hence there is a lot of guesswork in figuring out what to do next.  You prioritize things that users complain about or that the engineering team is excited about, never really sure if your priorities match the real needs of the customer.

Cloud Services Change the Physics

Contrast that with running a service.  When you host the service yourself, you have one configuration to support.  You have one team operating it that all works for one company and that works closely together.  You have perfect knowledge of, and control over, the environment where the system is going to be run.  When you deploy a new version, everyone is instantly running on it.

The result is a dramatic change in the size of the team that you need and the efficiency with which you can deliver to customers.  And keeping team sizes down pays off in many ways that are hard to quantify but nevertheless crucial – less communication overhead, a stronger sense of unity, and the ability to make changes in plan more quickly.

Even more important, when customers use your service, you know everything.  You know what features they use, what kind of performance you are delivering, whether searches are yielding results that users are interested in, whether one version or the other of your software gets better results.  You can spot and fix problems in real time without desperately trying to diagnose a critical live system in a customer’s facility based on the few clues that they are able and willing to share with you.

What Does It Mean?

In the history of computer technology, there have been several times when the physics changed profoundly.  Moving from batch to interactive systems was one.  The advent of the PC, where computing became ubiquitous and under your own control.  The standardization of key building blocks like operating systems and databases, so applications could focus on the unique logic of the problems they were targeting.

Every time the physics changed in the past, it enabled a massive acceleration of innovation and disruption.  New markets opened up, software was able to do things it had never been able to do before, and established companies faced tremendous pressure.  The traditional way of doing things doesn’t always go away – mainframes running COBOL process enormous numbers of transactions every day.  People still buy PC software, and servers will be deployed into data centers for decades.  But companies and communities that miss out on one of these fundamental transformations usually have a difficult time.  They often end up living in a perpetual twilight world of stasis and conservatism.  The frontiers of technical development, the best developers, and the biggest new opportunities leave them behind.

I believe that this transition, from servers to services, is one of those abiding changes in the physics of software.  The benefits of services are so compelling in terms of efficiency, of deeply understanding your customers and what they need, and of adjusting and adapting software in real time, that I think they will be the place where the white-heat of innovation happens.  If you are on the wrong side of that divide and your competitor is not, it will be very difficult to compete toe to toe.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s