Data Analysts - Know Thy System

“Know Thyself” is certainly an important concept to being a human.  “Know Thy System” is an equally important concept to being a data analyst.

In addition to knowing data analytics concepts (not making misleading graphics, for example), a good data analyst also understands how data moves through the system(s) they analyze.  This knowledge goes beyond basics and extends into how specific metrics are collected, how users behave and how data is stored.  This background knowledge not only makes it easier for them to find the numbers they need, it allows them to draw better conclusions on that data.

Some examples I’ve run into include:

  • Knowing when tickets are assigned - Many reports on ticketing systems relate to who’s been working on tickets.  Knowing when/how users are assigned tickets will impact the report.  For example, if someone triages a ticket THEN assigns it, we can expect tickets to remain unassigned throughout the triage process.

  • When data is added to something  - Some data fields are populated when a record is created (Employee ID, for example).  Other data, however, may remain defaulted or even null.  These defaulted values will appear in reporting, and may throw off results.

  • Where data comes from - End-user reports sometimes contain data from multiple sources/systems/etc.  While this makes for easier story telling it can make troubleshooting or updating things more complicated.  Being aware of how your reporting system pulls information reduces time to update things and makes it easier to explain how things were determined.


The problem with NOT knowing thy system

Understanding the UI and the users perspective isn’t enough to really know how the system operates. Many systems, for example, combine multiple metrics when displaying values such as Service Level Agreements (SLA), total compensation and others. Looking at the back end database tables can be incredibly confusing if you don’t realize that “SLA” really feeds 5 different dimensions.

As data analysts it is our job to know about these concepts. Failing to understand them will, at best, result in a LOT of wasted time as we dig through tables or configuration settings, and at worst, result in bad decisions being made about the data. Something as simple as not realizing your underlying data doesn’t include any data from January can be catastrophic; your customers will make business decisions on the lack of results from January… which aren’t real.


Helping others to Know Thy System

I’ve run into MANY situations where someone doesn’t have any background on the underlying systems and tries to make determinations on something.  At best, this results in confusion as folks come to you, the Data Analyst with more questions.  At worst, this results in conclusions being drawn from information that folks don’t truly understand.

In many cases we cannot expect our customers (folks reading our reports) to fully understand all the underlying concepts.  This isn’t to say there shouldn’t be SOME expectation that they understand things, just that we cannot expect them to know anything close to what we do.  To help bridge that gap, I’ve been following a few guidelines:

  • Always put comments in reports - Many reporting systems have a comment or text field.  I’ve started always putting notes in there that relate to that specific report.  These include a brief description of what the report is intended to show, and any call outs to specifics that may not be 100% obvious (e.g. a specific field showing a null/none until a specific time period, etc).

  • Proper Naming - Definitely another easy win - just name things more accurately.  I try to keep a specific convention that briefly describes the report.  This both makes it easier for me to organize, but also helps my customers find the right report more quickly.

  • Basic Documentation - In addition to cranking out reports and dashboards, I’ve found providing basic documentation on where data comes from is incredibly helpful.  This doesn’t have to be (nor should it be!) a full data dictionary.  This wouldn’t only drown your customer in too much info.  Rather it’s a quick guide on what common metrics mean and how they’re pulled.  Think of this as proactive defense against common questions.

Laid Off? Invest in Your Self

Laid Off? Invest in Your Self

Attendance Automation