April 8, 2014
Lessons Learned from Our Recent Server Migration
What We Accomplished, What Went Wrong, and How We Learned from the Experience
Hello, my name is John-Paul Narowski and I am the founder of karmaCRM. I wanted to first and foremost thank you for using our product, and for continually helping us to make it better and better.
As many or most of you know, our company recently migrated its servers to a platform that would ultimately provide you with increased security, improved scalability, and reduced downtime during updates (among other benefits). For the last few months, we were very eager to make the transition to our new servers, and as the date and time approached, the excitement was truly palpable.
But now that we’re on the other side of the server migration, with service and data fully restored after a snafu in both of these areas, we owe you a sincere, official apology – along with a clear explanation of what went wrong.
Bottom line is: we messed up. We delivered a less than optimal experience to you, our customers, and that simply isn’t acceptable. The server migration this past weekend was far from smooth, and it caused varying degrees of inconvenience to some of you. I consider this downtime a personal failure, and as the owner of this company, I accept full responsibility for this mishap. Although I sincerely wish we could, we cannot give you your time back; however, we can and do vow to deeply resolve to learn everything we can from it. And we have already begun in earnest to do so.
The downtime you experienced goes against several of our company’s core values – most importantly, delivering bar-raising customer service. Your time is valuable and not something we ever, ever take lightly. As a team, we’ll be extensively discussing how to better manage future migrations in order to diminish, if not completely eliminate, any disruptions to your business.
What Went Wrong?
Index Refreshing Downtime
KarmaCRM relies on a searching/indexing system, to allow your data to load quickly and be searchable. While making the migration to the new servers, the process of re-indexing all of the data took far longer than we had anticipated and planned for, causing your data to appear “missing” while the import scripts ran.
There were some overlooked bugs in our import script, which caused the indexes to not import all of the data they needed. In our testing, this issue did not present itself with the accounts that were tested; therefore, this was not on our radar and took us by surprise. In the future, we’ll be testing from a multitude of accounts in addition to creating a concrete, thorough, and comprehensive QA checklist for everyone on our team to inspect.
Keeping You in the Loop
There were large periods of time in which we were all heads down working on solving the problems. Because of this, we failed to provide you with frequent updates about where things stood. I certainly understand how this can be quite unsettling when the data you rely on appears to be missing.
How Can We Fix It
Now that we’ve moved over to much more powerful servers, we have a LOT of room to grow, and we don’t expect a migration as major as this one to happen for a long time. Our new servers will allow us to comfortably grow for years to come. That being said, we’re going to be making a number of changes to ensure that planned downtime goes smoothly. Also, If ever we have unplanned downtime, you’ll know exactly what’s happening, while it’s happening, and when it’s resolved. Building timely customer communication in our testing design will be paramount.
Staging Environment Upgrade
We will be creating a 1-to-1 identical clone of our production environment. Up until now, our staging/testing app was somewhat different than our production app, leading to inconsistencies in behavior between the two. In plain English, having an identical environment means that we can much more effectively test system upgrades on our staging system before applying them to production.
Launching a System Status Site
We’ve also just set up a status page, where you can check on our systems at any time. This will also be the primary way we syndicate information about outages, planned downtime, etc. Check it out at status.karmacrm.com.
While we’re very happy to now be on the other side of the migration, and while we certainly do appreciate everyone being so understanding throughout the process, it was, nonetheless, a rocky and extremely uncomfortable path for all of us – company and customer alike. For that, my company and I are truly sorry. We’re happy to let you know that we’re now full speed ahead on our new infrastructure and hope that you enjoy this faster, more stable, and more productive karmaCRM experience for years to come.