I recently made a simple post on LinkedIn which received a crazy amount of views and overwhelmed us with requests to take a look at what we had built. The simple release was that we had managed to take Workday's point in time (snapshot) based reporting and rebuild a data schema that is effective dated, and transactional in nature. The vast majority of organizations and people analytics vendors use snapshots for extracting data from workday because this is really the only choice they've been given to access the data.
We don't like snapshots for several reasons
With our background in working with raw data directly from HR systems this approach wasn't going to cut the mustard and couldn't deliver the accuracy that should be the basis of an HR data strategy.
The solution is not to buy Workday's big data tools because you're going to be living with many of the same challenges. You need to take the existing structure, enhance, and fundamentally reconstruct a data architecture that solves these problems. We do just that, we extract all employee and object data, analyse the data as it flows and generate additional requests to the Workday API that work through the history of each object. Data is materialized into a schema close to the original but has additional effective dated transactional records that you just wouldn't see in a snapshot based schema. This becomes our raw data input into One Model, delivered to your own warehouses to be used any way you wish.
The resulting dataset is perfect for delivering accurate, flexible reporting and analytics. The final structure is actually closer to what you would see with a traditional relational schema used by the HRIS sold by SAP, Oracle, PeopleSoft etc. Say what you will about the interfaces of these systems but for the most part the way that they manage data is better suited for reporting and analytics.
Now don't get me wrong, this is one area most people know Workday lags in, and in my opinion it should be a low priority decision point when selecting an HRIS. Don't compromise the value of a good transactional fit of an HRIS for your business in an attempt to solve for the reporting and analytics capability because ultimately you will be disappointed. Choose the HRIS system that fits how your business operates, solving for the reporting and analytics needs in another solution as needed.
Time to get a little more technical. What I'm going to discuss below is the original availability format of data in comparison to the approach we take at One Model.
Object Oriented - the why of the snapshot
Okay, so we all know that Workday employs an Object Oriented approach to storing data which is impressively effective for it's transactional use case. It's also quite good at being able to store the historical states of the object. You can see what i mean by taking a look at the API references as below:
The above means the history itself is there but the native format for access is a snapshot at a specific point in time. We need to find a way of accessing this history and making the data useful for more advanced reporting and analytics.
Time Context
In providing a point in time we are applying a time context to the data at the point of extraction. This context is then static and will never change unless you replace the data set with a different time context. Snapshot extractions are simply a collection of records with a time context applied. Often when extracting for analytics, companies will take a snapshot at the end of each month for each person or object. We get a result set similar to the below:
The above is a simple approach but will miss out on the changes that occur between snapshot because they're effectively hidden and ignored. When connecting external data sets that are properly effective dated you will need to make a decision on which snapshot is accurate to report against in the above but you simply don't have enough information available to make this connection correct. This is just simply an inaccurate representation of what is really occurring in the data set, and it's terrible for pro-rating calculations to departments or cost centers and even something as basic as an average headcount is severely limited. Close enough is not good enough.
If you are not starting out with a basis of accuracy then everything you do downstream has the potential to be compromised.
Remove the context of time
There's a better way to represent data for reporting and analytics.
The above distills down the number of records to only that which is needed and matches transactional and other object changes which means you can join to the data set at the correct point in time rather than approximating.
Time becomes a flexible concept
This change requires that you apply a time context at query time providing infinite flexibility for aligning data with different time constructs like the below
It's a simple enough join to create the linkage
left outer join timeperiods tp on tp.date between employee.effective_date and employee.end_date
We are joining at the day level here which gives us the most flexibility and accuracy but will absolutely explode the number of records used in calculations into the millions and potentially billions of intersections. For us at One Model accuracy is a worthwhile trade-off and the volume of data can be dealt with using clever query construction and of course some heavy compute power. We recently moved to a Graphics Processing Unit (GPU) powered database because really why would you have dozens of compute cores when you can have thousands? (And, as a side note, it also allows us to run R and Python directly in the warehouse #realtimedatascience).
More on this in a future post but for a quick comparison take a look at the Mythbusters demonstration
What about other objects?
We also apply the same approach to the related objects within Workday so that we're building a historical effective dated representation over time. Not all objects support this so there are some alternative methods for building history.
Retroactive changes?
Data changes and corrections occur all the time as we know we regularly see volumes of changes being most active in the last six months and can occur several years in the past. Snapshots often ignore these changes unless you replace the complete data set for each load. The smarter way is to identify changes and replace only the data that is affected (i.e, replace all historical data for a person who has had a retroactive change). This approach facilitates a changes-only feed and can get you close to a near-real-time data set. I say "close to near-real time" because the Workday API is quite slow so speed will differ depending on the number of changes occurring.
Okay, so how do you accomplish this magic?
We have built our own integration software specifically for Workday that accomplishes all of the above. It follows this sequence:
Flexibility - the biggest advantage you now have
We now have virtually all data extracted from workday in a historically accurate transaction-based format that is perfect for integrating additional data sources or generating an output with any desired time context (convert back to snapshots if required). Successful reporting and analytics with Workday starts with having a data strategy for overcoming the inherent limitations of the native architecture that is just not built for this purpose.
We're HR data and people analytics experts and we do this all day long. If you would like to take a look please feel free to contact us or book some time to talk directly below.
Learn more about One Model's Workday Integration