Whenever you’re trying to deliver a valuable system, you realize just how nice it is to have more margin. Civil Engineers design projects to account for the 100-year flood. But what happens when that 100-year flood occurs two years in a row? That’s when everyone becomes a civil engineer without the training! KenChristensen.com is hosted with a small start-up hosting company called Harvest Clouds. If you look at the website, you’ll see we need to work on our own website. While we designed our systems to withstand the information equivalent of the 100-year flood, we were hit by the perfect storm. One thing led to another, and our systems did not react gracefully, or as intended.
You might forgive us for thinking that our servers that were operating below 1% utilization would be able to handle more load. Much to our surprise that 99% margin evaporated, and our operating systems made choices about how to deal with the problem that created still more problems.
My bachelor of science degree in Information Systems, and ongoing years of experience make this current challenge more than a little annoying. I’ve seen this problem defined as the “noisy neighbor” problem. One noisy neighbor keeps the whole neighborhood up at night. One of our clients was a noisy neighbor, but so was I. That’s downright embarrassing.
We didn’t ask for this challenge, but we are up to it. We want our clients to see the value we bring to the table. And not that it is any consolation, but during this challenge, all of our own Internet real estate was affected too. We have a vested interest in resolving this issue, and we’ve said that from the beginning. We were pleased when we corrected the Heart-bleed problem within hours of being alerted to the potential problem. That was a great moment for us. We want to turn this into a great moment as well, and we think the end result is that this event will push us to actually fix the problem, not just throw more resources at the problem to mask it.
Well, we found a solution to this problem, but that required moving everything on our servers to another data center. Ordinarily we do these things over a weekend, and you barely notice that we’ve made a change. Instead this turned into our second storm. While we made plans to be the least disruptive as we could, we had no way of knowing that we would be making this transition at the same time as data centers around the world were battling yet another hacking outbreak.
As of this moment, everything has been successfully migrated, and we are mostly back up and running. Our website needs some work, and our automated systems need to be reconfigured, but we’re able to serve our clients in the manner to which they’ve become accustomed.
Since everything is fixed, why bother with the post then? I’m glad you asked! I believe that transparency is more important than creating the illusion that everything is perfect in paradise. Reality is often quite different than we had planned. There is “Trouble in Paradise.” Furthermore, if I can’t be honest, and explain some technical challenges we’ve had with our servers, what do you think I’ll do when things don’t go as planned with an investment or anything else? If you thought, “make everything ‘look perfect in paradise!'” You would be right on.
That’s why I’m taking the time to explain the problem, and identify what we’ve done about it. The irony is that the system we’ve implemented uses something called “jail” to deal with the noisy neighbor! Wouldn’t we all like that option when we’d like a peaceful night’s sleep, and our neighbor is making too much noise? We can decide just exactly what resources we make available to our jailed neighbors. When they start behaving, we can give them free reign on the rest of our system once again, but until they do, they’re relegated to jail, where they’ll remain with all of the other “law breaking” neighbors instead of bothering the whole neighborhood, they’ll only bother their friends in jail.
This is in sharp contrast to what effectively happened to our systems. They essentially became one giant jail. Our software response was to shut down more and more services to bring down the load, and as I mentioned earlier, that just made the problem bigger. At one point I watched the system utilization creep up at a 45º angle through 100% all the way up to 200%. I remember thinking, this can’t be good! We would isolate one problem, and another would surface.
Here’s how this comes back to relate to the blog. I had intended to have several more posts in the bag, and work towards a soft launch in January with enough content to really keep the momentum up. I had intended to shift focus completely to small business investing, yet I found that was too restrictive. So instead, I’m working to create major themes, and I’ll concentrate on those major themes as I write. Those major themes will likely be refined with time as well. And this blog needs to become self supporting, which is something I cannot do by myself!
At the end of the day, we want to win this game. Winning this game means that we make the investments in time, talent and treasure to get the job done. It means our Harvest Clouds clients get better service as a result. And finally it means that our members are part of an organization that is constantly striving to improve, and demonstrates time and time again that it is up to the challenge.