sysadvent: change management

Showing posts with label change management. Show all posts

December 4, 2016

Day 4 - Change Management: Keep it Simple, Stupid

Written By: Chris McDermott
Edited By: Christopher Webber (@cwebber)

I love change management. I love the confidence it gives me. I love the traceability–how it’s effectively a changelog for my environment. I love the discipline it instills in my team. If you do change management right, it allows you to move faster. But your mileage may vary.

Not everyone has had a good experience with change management. In caricature, this manifests as the Official Change Board that meets bi-monthly and requires all participants to be present for the full meeting as every proposed plan is read aloud from the long and complicated triplicate copy of the required form. Questions are asked and answered; final judgements eventually rendered. Getting anything done takes weeks or months. People have left organizations because of change management gone wrong.

I suppose we really should start at the beginning, and ask “Why do we need change management at all?” Many teams don’t do much in the way of formal change process. I’ve made plenty of my own production changes without any kind of change management. I’ve also made the occasional human error along the way, with varying degrees of embarrassment.

I challenge you to try a simple exercise. Start writing down your plan before you execute a change that might impact your production environment. It doesn’t have to be fancy – use notepad, or vim, or a pad of paper, or whatever is easiest. Don’t worry about approval or anything. Just jot down three things: step-by-step what you’re planning to do, what you’ll test when you’re done, and what you would do if something went wrong. This is all stuff you already know, presumably. So it should be easy and fast to write it down somewhere.

When I go through this exercise, I find that I routinely make small mistakes, or forget steps, or realize that I don’t know where the backups are. Most mistakes are harmless, or they’re things that I would have caught myself as soon as I tried to perform the change. But you don’t always know, and some mistakes can be devastating.

The process of writing down my change plan, test plan, and roll-back plan forces me to think through what I’m planning carefully, and in many cases I have to check a man page or a hostname, or figure out where a backup file is located. And it turns out that doing all that thinking and checking catches a lot of errors. If I talk through my change plan with someone else, well that catches a whole bunch more. It’s amazing how much smarter two brains are, compared to just one. Sometimes, for big scary changes, I want to run the damn thing past every brain I can find. Heh, in fact, sometimes I show my plan to people I’m secretly hoping can think of a better way to do it. Having another human being review the plan and give feedback helps tremendously.

For me, those are the really critical bits. Write down the complete, detailed plan, and then make sure at least one other person reviews it. There’s other valuable stuff you can do like listing affected systems and stakeholders, and making notification and communication part of the planning process. But it’s critical to keep the process as simple, lightweight, and easy as possible. Use a tool that everyone is already using – your existing ticketing software, or a wiki, or any tool that will work. Figure out what makes sense for your environment, and your organization.

When you can figure out a process that works well, you gain some amazing benefits. There’s a record of everything that was done, and when, and by whom. If a problem manifests 6 or 12 or 72 hours after a change was made, you have the context of why the change was made, and the detailed test plan and roll-back plan right there at your fingertips. Requiring some level of review means that multiple people should always be aware of what’s happening and can help prevent knowledge silos. Calling out stakeholders and communication makes it more likely that people across your organization will be aware of relevant changes being made, and unintended consequences can be minimized. And of course you also reduce mistakes, which is benefit enough all by itself. All of these things combined allow high-functioning teams to move faster and act with more confidence.

I can give you an idea of what this might look like in practice. Here at SendGrid, we have a Kanban board in Jira (a tool that all our engineering teams were already using when we rolled out our change management process). If an engineer is planning a change that has the potential to impact production availability or customer data, they create a new issue on the Change Management Board (CMB). The template has the following fields:

Summary
Description
Affected hosts
Stakeholders
Change plan
Test plan
Roll-back plan
Roll-back verification plan
Risks

All the fields are optional except the Summary, and several of them have example text giving people a sample of what’s expected. When the engineer is happy with the plan, they get at least one qualified person to review it. That might be someone on their team, or it might be a couple of people on different teams. Engineers are encouraged to use their best judgement when selecting reviewers. Once a CMB has been approved (the reviewer literally just needs to add a “LGTM” comment on the Jira issue), it is dragged to the “Approved” column, and then the engineer can move it across the board until they’re done with the change. Each time the CMB’s status in Jira changes, it automatically notifies a HipChat channel where we announce things like deploys. For simple changes, this whole process can happen in the space of 10 or 15 minutes. More complicated ones can take a day or two, or in a few cases weeks (usually indicative of complex inter-team dependencies). The upper bound on how long it has taken is harder to calculate. We’ve had change plans that were written and sent to other teams for review, which then spawned discussions that spawned projects that grew into features or fixes and the original change plan withered and died. Sometimes that’s the the better choice.

I don’t think we have it perfect yet; we’ll probably continue to tune it to our needs. Ours is just one possible solution among many. We’ve tried to craft a process that works for us. I encourage you to do the same.

December 4, 2009

Day 4 - Communication and Organization of Changes

Change happens. Maintenance, (planned or unplanned) outages, upgrades, etc. A change can be something small, such as tweaking a configuration value, or they can be large, such as replacing power units on hundreds racks across multiple datacenters. Changes come in all shapes, sizes, scopes, and risks, and require varying degrees of planning, scheduling, and cooperation.

Communication is an important piece of the change process, and good communication allows you to inform users and groups about pending changes and change impact. Good communication requires using the right tools, so it's worth reviewing the tools that are available to you:

Email. Email is easy to send, and maintaining mailing lists for users of any given component is pretty easy. Email seems cool by itself until you learn that nobody actually reads email (kind of like that intranet wiki...). I'll cover how to work around this shortly.
Calendar. Everyone understands calendars. They grew up living with them, the presentation is familiar, and it is based on a concept everyone understands: time.
Bug/Issue/Ticket systems. These systems are good for tracking work units, such as changes. They tend to have status-setting features including words like "need feedback," "in progress," and "resolved." Seems like a good fit for tracking the progress of a change.
Phones. Phones are good when you need synchronous communication, such as when coordinating a change across geographics, or calling customers to get acknowledgement of a proposed change.
Meetings. Meetings are good places to announce changes, scheduling, and impact. Attendees can nod quietly or object to the change or scheduling. This helps you review a change and fit its schedule to minimize risk and impact.

Every change will not need to involve every tool listed above. Further, the tools you use to communicate, plan, and log changes will depend greatly on the culture and size of your company and on the impact and risks of each change. Use the tools that fit best.

For email, have an 'it-changes' or 'ops-changes' or 'yourteamname-changes' mailing list that has your team (which can just be you) and anyone else who is interested in the changes you are making. Additionally, create a mailing list for any component that might need maintenance, such as a datacenter location, a service like Active Directory, or the network filer; document these and encourage people needing a particular component subscribe to those mailing lists. Use your judgement here. If your company is small, you can probably create fewer mailing lists.

Like engineers who need to consider unreliable networks in their design, you must consider unreliable readers in your change announcements. Folks don't read email; they skim and read things they think are important, which means they may skip your change announcement. You may have to resort to trickery in order to get people to read important change announcements; try prefixing your subject with "CAKE AND PIE" - it might work? Further, there is often no feedback from email - you won't get any acknowledgement of who has read an announcement, or more importantly, that they have understood it.

The best way to ensure your announcements are read are by targeting only the people who need the information and by repeating your message. Depending on the size and impact of the change, you may need to send your change announcement up to three times. First, to announce the scheduled change. Second, a day (or hour) before the change starts. Third, when the change starts. A final, "all clear," message should be sent when the maintenace is complete.

Email has some failings, like no acknowledgements. Calendars can do this, and more. Calendars are great visual tools for communicating schedules. Online calendars (in Exchange, Google Calendar, whatever) are great for several reasons. First, you can invite people to the event, which gives them a visual reminder in their calendar. Further, you get reminders for free: Just before the event starts, your invitees will likely get a popup reminding them of an change. Additionally, Calendars are shareable and publishable. Calendar data exchange is pretty standardized - iCalendar format sends well over email. Invitees can acknowledge receipt, helping you figure out who hasn't acknowledged and might need a phone call. Finally, calendars can often be downloaded to smartphones and other devices. All of these are excellent features of modern online calendars which will help you communicate changes more effectively.

You should schedule maintenance and outages in a calendar. Create a calendar (or multiple, using the same principles from the mailing list creation above) to track these events. Invite people and groups who need to know about the event. The data in each event should include two things: a short description (think email subject) and a link to wherever the detailed plan/discussion for the change lives, which is likely an issue/bug system. For an unplanned outage, create an event that represents the actual time and duration of the outage.

Ticket (aka bug or issue) systems should be used to track individual changes. As mentioned above, you get the state-tracking benefits (open, pending, in-progress, fixed, etc) and a reasonable place to record planning notes and actions taken during a change. Your emails and calendar entries should include links to the change ticket if it is relevant. The ticket should also, if possible, include a link to the calendar event for easy import into other calendars.

Phones and meetings are of similar use as they both grant you synchronous communication. Phones and in-person meetings are good for the planning stage. They are also both good for reviewing pending changes or for confirming that all involved acknowledge the change. Your use depends on your needs. For example, a previous job had weekly meetings to discuss impact and scheduling of planned changes. Phone is also a useful tool for calling in more experienced teammates when there's a problem with a change.

Lastly, you are a customer, too. Others will make changes that affect you. Datacenter facilities, ISPs, and other service providers make changes just like you do. I would love if my service providers sent me change notifications with calendar invites, but nobody does. Do your vendors send you change notifications? Do they include calendar data you can import into your own calendars? Do they follow the advice above? If you said no to any of these, it's worth having a chat with your vendors and providers to help work towards this. I'm trying to work with my providers to get them to do this, but it's a slow battle due to the current systems and practices, so be prepared and patient.

Remember, we are often guilty of undercommunicating and even communicating poorly. Focusing on effective communication of changes will help ensure your customers (coworkers, users, clients, etc) are well-informed of the changes that affect them. Informed customers are happy customers. As mentioned in the previous paragraph, you are a customer, too. Making you a happy may require you working with your service providers to sell them on the advice here.

December 23, 2008

Day 23 - Change Management

This post was contributed by Matt Simmons. Thanks, Matt! :)

It's been said that change is the only thing that ever stays the same, and whoever said that probably worked in IT. Transitions are a part of life, but we administrators are burdened by what I would judge to be more than our fair share.

Too frequently, we find ourselves picking up the pieces from the last major system change we made, while at the same time designing the next iteration of the infrastructure that we'll be putting in place. How many times have you chosen an implementation that wasn't ideal now, because a bigger change was just around the corner, and you wanted to "future proof" your design? Bonus points for having to make that decision due to a previous change that was still being implemented. It doesn't seem to matter how precisely you've planned a major upgrade, snags and snafus are expected to rear their ugly heads.

Is this something that we just have to deal with? Are we at the mercy of Murphy, or are there ways we can induce these issues to work to our benefit? Sure, it would be easy if we had a crystal ball, but too often we don't even have a rough guess as to where our plans will encounter problems.

Change itself isn't the enemy. Change promotes progress, and from the 10,000ft view, our long-term goals should work towards this progress. Dealing with change is a natural and positive endeavor.

Instead of being thrown about by the winds of chance, lets put some sails on our boat, and see if we can make headway by trying to manage the change on our terms. If we know that problems are going to be encountered, and we face those facts before we edit the first configuration, then we've taken the first step towards real change management.

The enemies of successful change (and the resulting progress) are imprecise requirements and lack of project leadership. Unless you plan around these pitfalls, your project may very well go into ventricular fibrillation, flip-flopping back and forth, unable to decide between two unforeseen evils midway through the work flow. While it's possible to recover from this with an injection of leadership, it's much easier to inoculate against the problem in the beginning.

If you're going to be planning a big project, you will probably want to follow a methodology. There are just about as many methods of managing a change as there are people who want you to pay them to do it, but with IT projects, I've found what I consider to be the most efficient for me. Your mileage may vary, of course.

Team and goal formation
Assuming your change is moderate to large scale, you've (hopefully) got a team of people involved, and one of them has been appointed leader. This is the point where you want to decide on your goals. Determine what success will be defined as at the end of the project, and how best to get there.
Many times we don't yet know what or how success will be defined, or even what the target should be. Because of this, it's natural to perform step 2 before your goals have been decided upon. In fact, I'd recommend it.
Analysis (Research) & Information Organization
Too often (or not often enough, depending on your view point) we're asked to do too much with too little. Frequently, we don't even know how to do it. This Analysis step is here to allow you to make informed decisions, and to acquire the skills and resources necessary to succeed in your task. Sometimes the resources are people, in the form of new employees or contractors, or both.
Design
By this time, you know what the task entails, but you don't have a road map of to how you're going to get there. This step makes you the cartographer, planning the route from where you are to the implementation of your project and beyond. Some details of the design may change during development, but it's important to have the major framework laid out in this step as you proceed.
Development
In a perfect world, you would take the design produced in step three and translate it straight into something usable. We all know that this rarely, if ever happens. Instead, you encounter the first set of really difficult problems in this stage. Issues spring up with the technology that you're using, or with kinks in the design that you thought were smoothed over, but weren't. Development appears to follow Hofstadter's Law: 'It always takes longer than you expect, even when you take into account Hofstadter's Law'. Thorough testing at the end of the development stage will prevent misery in the next step.
Implementation
Here we find the second repository of unforeseen bugs and strange glitches that counteract your carefully planned designs. The good thing about issues at this point is that, provided you've tested thoroughly enough in development, you won't find many show stoppers. On the other hand, sometimes these bugs can appear as niggling details and intermittent issues, hard to reproduce.
Support
If you're designing, developing, and implementing a product, support is just another part of the game. This is where you pay for how carefully you performed the preceding steps. Garbage In, Garbage Out, they say, but because you've designed and built a solid system, your support tasks will be light, possibly just educating the users and performing routine maintenance.
Evaluation
Remember that part in step 1, where you decided what success would be defined as? Dust it off and evaluate your project according to those requirements. Discuss with your team what you could have improved on, and don't forget to give credit where it is due. Hard work deserves appreciation.

This method is really a modified ADDIE design, so named because it consists of Analysis, Design, Development, Implementation, and Evaluation. We've added a couple of steps to help it flow better in the IT world we live in. There are certainly other methods to look at. The Instructional Systems Design (ISD) is another one which is well known.

However you decide to manage change, it's important to stay with your plan and follow through. Remember to work and communicate with your teammates, and don't stress because the project is too big. Just take it one step at a time, follow your plan, and you'll get the job done. s

Subscribe to: Posts ( Atom )