Zero Defects - What is it?
Somewhere along the line I picked up the idea that any problem (a ticket in my world) represents a failure somewhere in the overall process. This isn’t intended to point blame, or make us think like we’re failing horrifically when we gaze upon or vast collection of tickets, but rather to help us frame every one as an opportunity to improve. This failure could be in training employees, training managers, a technical glitch, or how an interaction with HR is handled. Regardless of WHY it occurred, it is still a defect that needs to be examined, understood, and never repeated.
What is a defect?
In general, however, a defect is any adverse or unintended behavior in a system or process. This may be further refined based on who you’re talking to and/or what industry your in. For example it could be the QA team finding a flaw before the production run in completed, or could be a piece of code that generates an incorrect result. “Zero defect” also implies that the defect is reported, that someone out there in the “wild” (not your team or testing) has found the flaw and is bringing it to your attention. This could be represented by a help desk ticket (common), and email (also common), a sticky note on your desk (less common) or a brick through your window with a note tied to it (hopefully least common). Regardless of how the message is sent, you are now in possession of a bright, shiny new defect.
In my world a defect is anything that results in the customer (fellow employees) having to reach out to a support team for help (this includes the HR, tech and IT teams). This outreach could be a complaint about someone’s behavior, confusion over a policy, or help finding a payslip. Note that this does not include any time someone has figured out something on their own (e.g. restarting their computer fixed the problem), or being able to look it up somewhere (e.g. HR policy information). The focus is on the group of individuals who have some problem and then have to reach out to one of our support teams for help. This makes things a bit challenging to totally control since it’s entirely possible an employee reaches out to a co-worker, manager, etc. and gets an answer without needing to reach out to support. Personally I’m not sure if this is truly a defect since they were able to figure it out without reaching out, but it’s a bit of a grey area for me. Regardless, there’s no easy way to measure those, so I stick with whatever happens to get into our ticketing system.
Rating Defects
Ideally 100% of defects are examined, understood, and never happen again. Unfortunately we don’t live in this version of reality, so some yardstick is needed to understand and rank them. It’s important for the teams that look at defects to sit down and determine what dimensions they want to use to sort these (such as impact to the bottom line, type of system impacted etc.) and then to share those definitions with stakeholders to help improve trust and transparency. When in doubt I tend to start with two simple dimensions, complexity and urgency since they are relatively easy to understand and allow for quick sorting of defects.
Defects range from the simple (where do I find my payslip) to insanely complex or in-depth (employees in a specific area cannot not log into to one part of a certain system) An important note that “simple” does not mean it isn’t important to whomever raises it. “Simple” instead refers to the likely response/solution (e.g. “Click this link to access our payroll system”). Defects are never “simple” from the “Well, your thing doesn’t really matter” perspective (especially when related to HR issues), and always have a real-world consequence to someone. This is part of an overall tech mindset that’s a bit beyond this post, but definitely something I’ll dig into later.
In addition to falling somewhere on a complexity scale, they also fall into an urgency scale. This scale can range from “I need my medical insurance confirmed so I can get my cancer drugs” to “Just wondering what our policy on XYZ is” for HR matters to “Our payment processing system is offline and we’re losing millions every minute” to “the icon on some little-used internal page is broken”. Similarly to how “simple” doesn’t mean it’s not important to the reporter, the seemingly non-urgent items are still important. The icon on that site is the Priority Zero for someone and should not be dismissed simply because it’s “not urgent to me”.
Having these dimensions on defects help make it easier to prioritize both how they are resolved and how they are investigated. An ultra-critical, ultra-complex item will likely be immediately addressed, but an in depth investigation will take time, while a low-criticality, highly complex item will likely wait to be addressed. Exactly what dimensions are used to sort tickets can be up to any given team, but SOME yardstick should be used to help the followup work on determining how to avoid the defect in the future. All that said, regardless of the metrics being used, it is still a defect, which represents a failure.
Next up are responses to defects, something I’ll take a look at a bit down the line.