Hean Tech

View Original

Best intentions pave the road to massive headaches for other people

There’s almost nothing more dangerous than a well-intentioned individual who knows a little bit about how things work.  This can be the combination of many factors, such as seeing a particular solution somewhere else in the past, thinking they know what they’re doing / thinking it “can’t be that hard”, rushing, and just good old fashion dumb luck.  Unless other evidence exists I always treat these as honest mistakes.. That said, I’ve seen these crusaders end up:

  • Emailing a group of 1200+ people accidentally - Instead of just collecting emails it also forwarded them to EVERYONE on the alias… 

  • Telling a VP a 60+ hour job should only take 2 hours - Nothing like being told to deploy something in 2 hours that you know from experience is over a week of work.

  • Crash a corporate network offline - Backing up a 60+gb harddisk over a cable modem is bad enough, but when it crashes an entire corporate network you know you’ve done REAL well.

(Fun fact - I was responsible for one of those….)

Personally I always encourage folks to learn more about tech and systems.  This, in general, makes for more informed users and can make things easier overall.  I’m constantly looking for ways to help people better engage with their tech, and to help foster that curiosity.  I’ve also seen this approach ignite interest in folks who want to learn more about tech and systems, to the point of them changing their career and interests. 


That said, a little bit of knowledge goes a long way towards wreaking havoc.  Once folks get just a little wind behind their sails they tend to forget they don’t know everything about a setup.  Many training videos, sessions and tutorials are tailored for a specific audience and use case… one your coworker(s) may be exceeding.  The sense of confidence felt when making or requesting a change masks a lack of knowledge over what is really going on… which can result in massive headaches for tech teams when they have to clean things up.


I’m “Helping”

To help curb this… “helping”... I’ve adjusted my approach to working with non-tech folks (While tech teams certainly commit these errors I’ve recently found it to be much more common in tech-adjacent groups).  In addition to giving them basic knowledge to do their job, I now also take the following steps:

  • Point out hidden dangers - Thinking through what MIGHT go wrong is a great way to uncover potential risks.  I go through this before I setup training, and then use the output to better inform users.  For example, if there’s a field I think they might want to use, but isn’t included in the training, I’ll specifically call it out and what it’s for.  This helps avoid situations where they see it later and think “hey, that looks useful, let’s use that!”.

  • High Level Documentation - While I cannot expect my users to understand ALL the ins-and-outs of a system, I can expect them to know high-level basics.  Knowing that Workday sends a termination file to XYZ teams is important info… and they should know that.  Knowing reports are accurate as of last night at midnight is important info… and they should know that.  I both call this out in training, and ensure there is documentation to back that up.  Training also covers where the documentation is, and how to find it… while this doesn’t guarantee we’ll avoid those challenges at least I know I’ve given it to them.

  • Encourage them to come up with ideas… then talk to you - I have always encouraged my users to think up better ways to do stuff… now I’ve added a second step - talk to me (or someone!) about that idea BEFORE doing it.  I frame it along the lines of “You’re a super smart person, so help me find better ways to do things.  We need to keep in mind that there’s a LOT of other moving parts, so when you’ve come up with a new idea, let’s brainstorm on how to make it happen”.  This both stops them from acting blindly, and also helps build better partnerships and trust.


Whoopsie

Throughout all of this it is important to remember everyone makes mistakes.  I’ve watched experienced and trained engineers forget to deploy a change to production due to a poor file name.  I’ve personally deleted over 5,000 trouble tickets by changing a configuration file (thankfully we could roll it back!).  The important thing is learn and not do it again.  This requires a great amount of support from your team, as well a company culture of safety and acceptance.  This story of a man who lost $2 million is the perfect example - his company treats these as development experiences, opportunities to get better.

It can be incredibly challenging to not react poorly when uncovering these errors.  (Why the *()@#$*&^ did you do that??!??!), and it is certainly a learned skill in responding appropriately.  I’ve started taking specific steps to help both keep myself in check, but also limit damage and maximize learning:

  1. Breathe - The damage has likely already been done, so taking a moment to breathe and assess won’t hurt much.

  2. Quick Assessment - Determine a preliminary root cause and prevent further damage.  This might mean rolling back some code, (un)plugging some hardware or making a quick phone call.

  3. Alert Partners - Generally happens the same time as #2, but let any downstream partner teams know of the error.  THis will allow them to help out with damage control and minimize surprises.

  4. Deep Assessment - Dig into what happened and why.  Understand as deeply as possible how the change negatively impacted things and what should have been done differently to prevent that from happening.

  5. Deploy Adjusted Fix - Based on the assessment deploy the updated changes.

  6. Share - Clearly document and share out a detailed assessment of what happened and how it will be prevented next time.


The individual who caused the ruckus should be involved in every step of undoing it.  This will help build their confidence by improving their skill sets and give others more confidence in their ability going forward.  The final step, “Share” is also critically important (and frequently missed) to ensure this doesn’t happen again.  Saving this knowledge also helps ensure that other team members do not commit the same error (something that is sadly too common).