The End of the Snapshot for People Analytics: Workday Edition

Written by Chris Butler | Mar 29, 2017 5:00:00 AM

The vast majority of organizations and People Analytics vendors use snapshots for extracting data from Workday because this is really the only choice they've been given to access the data.

But snapshots are problematic for several reasons:

They are inaccurate - You will typically miss out on the changes occurring between snapshots. This makes it impossible to track data/attribute changes in between, to pro-rate, and create analysis any deeper than the snapshot's time context.
They are inflexible - An object or time context has already been applied to the data, which you can't change without replacing the entire data set with a new context.
They don't allow for changes - If data is corrected or changed in history, you need to replace the entire data set, urggh.
External data is difficult to connect - Without effective dating, joining in any external data means you have to assume a connection point and apply that time context's values to the external data set. This compounds the inaccuracy problem if you end up having to snapshot the external data as well.
A pain in the #$% - To pull snapshots from Workday now, you need to create the report for each snapshot period that you need to provide. Three years of data with a month-end snapshot means 36 reports to build and maintain.

With our background in working with raw data directly from HR systems, this approach wasn't going to cut it—it couldn't deliver the accuracy that should be the basis of an HR data strategy.

The answer isn’t just buying another big data tool, because this same challenges will still exist. Instead, you need to enhance your existing structure and fundamentally rebuild your data architecture to truly solve these issues. We do just that, we extract all employee and object data, analyze the data as it flows and generate additional requests to the Workday API that work through the history of each object. Data is materialized into a schema close to the original but has additional effective-dated transactional records that you just wouldn't see in a snapshot-based schema. This becomes our raw data input into One Model, delivered to your own warehouses to be used any way you wish.

The resulting dataset is perfect for delivering accurate, flexible reporting and analytics. The final structure is actually closer to what you would see with a traditional relational schema used by the HRIS sold by SAP, Oracle, PeopleSoft etc. Say what you will about the interfaces of these systems, but for the most part, the way they manage data is better suited for reporting and analytics.

Now don't get me wrong, this should be a low priority decision point for Workday when selecting an HRIS. Don't compromise the value of a good transactional fit of an HRIS for your business in an attempt to solve for the reporting and analytics capability because ultimately you will be disappointed. Choose the HRIS system that fits how your business operates, solving for the reporting and analytics needs in another solution, as needed.

Time to get a little more technical. What I'm going to discuss below is the original availability format of data in comparison to the approach we take at One Model.

Object-Oriented - The Why of the Snapshot

Okay, so we all know that Workday employs an Object-Oriented approach to storing data, which is impressively effective for its transactional use case. It's also quite good at being able to store the historical states of the object. You can see what I mean by taking a look at the API references below:

The above means the history itself is there, but the native format for access is a snapshot at a specific point in time. We need to find a way of accessing this history and making the data useful for more advanced reporting and analytics.

Time Context

In providing a point in time, we are applying a time context to the data at the point of extraction. This context is then static and will never change unless you replace the data set with a different time context. Snapshot extractions are simply a collection of records with a time context applied. Often when extracting for analytics, companies will take a snapshot at the end of each month for each person or object. We get a result set similar to the below:

The above is a simple approach but will miss out on the changes that occur between snapshot, because they're effectively hidden and ignored. When connecting external data sets that are properly effective- dated, you will need to make a decision on which snapshot is accurate to report against, but you simply don't have enough information available to make this connection correct. This snapshot is an inaccurate representation of what is really occurring in the data set, and it's terrible for pro-rating calculations to departments or cost centers and even something as basic as an average headcount is severely limited. Close enough is not good enough.

If you are not starting out with a basis of accuracy, then everything you do downstream has the potential to be compromised.

Remove the Context of Time

There's a better way to represent data for reporting and analytics.

Connect transactional events into a timeline
Extract the details associated with the events
Collapse the record set to provide an effective-dated set of records.

The above distills down the number of records to only that which is needed and matches transactional and other object changes which means you can join to the data set at the correct point in time rather than approximating.

Time Becomes a Flexible Concept

This change requires that you apply a time context at query time, providing infinite flexibility for aligning data with different time constructs like the below

Calendar
Fiscal
Pay periods
Weeks
Any time construct you can think of

It's a simple enough join to create the linkage

left outer join timeperiods tp 
on tp.date between employee.effective_date and employee.end_date

We are joining at the day level here, which gives us the most flexibility and accuracy but will absolutely explode the number of records used in calculations into the millions and potentially billions of intersections. For us at One Model, accuracy is a worthwhile trade-off and the volume of data can be dealt with using clever query construction and, of course, some heavy compute power. We recently moved to a Graphics Processing Unit (GPU)-powered database because, really, why would you have dozens of compute cores when you can have thousands? (And, as a side note, it also allows us to run R and Python directly in the warehouse #realtimedatascience).

More on this in a future post but for a quick comparison, take a look at the Mythbusters demonstration

What About Other Objects?

We also apply the same approach to the related objects within Workday so that we're building a historical effective-dated representation over time. Not all objects support this, so there are some alternative methods for building history.

Retroactive Changes?

Data changes and corrections occur all the time, as we regularly see volumes of changes being most active in the last six months and can occur several years in the past. Snapshots often ignore these changes unless you replace the complete data set for each load. The smarter way is to identify changes and replace only the data that is affected (i.e., replace all historical data for a person who has had a retroactive change). This approach facilitates a changes-only feed and can get you close to a near-real-time data set. I say "close to near-real time" because the Workday API is quite slow, so speed will differ depending on the number of changes occurring.

Okay, So How Do You Accomplish This Magic?

We have built our own integration software specifically for Workday that accomplishes all of the above. It follows this sequence:

Extracts all object data and for each of them it...
Evaluates the data flow and identifies where additional requests are needed to extract historical data at a different time context, then...
Merges these records, collapses them, and effective-dates each record.
We now have an effective-dated historical extract of each object sourced from the Workday API. This is considered the raw input source into One Model, and it is highly normalized and enormous in its scope, as most customers have 300+ tables extracted. The pattern in the below image is a representation of each object coming through; you can individually select the object slice itself
The One Model modeling and calculation engines take over to make sense of the highly normalized schema, connect in any other data sources available, and deliver a cohesive data warehouse built specifically for HR data.

6. Data is available in our toolsets or you have the option to plug in your own software like Tableau, PowerBI, Qlik, SAS, etc.

7. One Model is up and running in a few days. To accomplish all of the above, all we need is a set of authorized API credentials with access provided to the objects you'd like us to access.

8. With the data model constructed, the storyboards, dashboards, and querying capabilities are immediately available. Examples:

Flexibility - The Biggest Advantage You Now Have

We now have virtually all data extracted from Workday in a historically accurate transaction-based format that is perfect for integrating additional data sources or generating an output with any desired time context (even convert back to snapshots, if required). Successful reporting and analytics with Workday starts with having a data strategy for overcoming the inherent limitations of the native architecture that is just not built for this purpose.

We're HR data and People Analytics experts and we do this all day long. If you would like to take a look, please feel free to contact us or book some time to talk directly below.

Learn more about One Model's Workday Integration

Book a Demo

View full post