The vast majority of organizations and People Analytics vendors use snapshots for extracting data from Workday because this is really the only choice they've been given to access the data.
But snapshots are problematic for several reasons:
With our background in working with raw data directly from HR systems, this approach wasn't going to cut it—it couldn't deliver the accuracy that should be the basis of an HR data strategy.
The answer isn’t just buying another big data tool, because this same challenges will still exist. Instead, you need to enhance your existing structure and fundamentally rebuild your data architecture to truly solve these issues. We do just that, we extract all employee and object data, analyze the data as it flows and generate additional requests to the Workday API that work through the history of each object. Data is materialized into a schema close to the original but has additional effective-dated transactional records that you just wouldn't see in a snapshot-based schema. This becomes our raw data input into One Model, delivered to your own warehouses to be used any way you wish.
The resulting dataset is perfect for delivering accurate, flexible reporting and analytics. The final structure is actually closer to what you would see with a traditional relational schema used by the HRIS sold by SAP, Oracle, PeopleSoft etc. Say what you will about the interfaces of these systems, but for the most part, the way they manage data is better suited for reporting and analytics.
Now don't get me wrong, this should be a low priority decision point for Workday when selecting an HRIS. Don't compromise the value of a good transactional fit of an HRIS for your business in an attempt to solve for the reporting and analytics capability because ultimately you will be disappointed. Choose the HRIS system that fits how your business operates, solving for the reporting and analytics needs in another solution, as needed.
Time to get a little more technical. What I'm going to discuss below is the original availability format of data in comparison to the approach we take at One Model.
Object-Oriented - The Why of the Snapshot
Okay, so we all know that Workday employs an Object-Oriented approach to storing data, which is impressively effective for its transactional use case. It's also quite good at being able to store the historical states of the object. You can see what I mean by taking a look at the API references below:
The above means the history itself is there, but the native format for access is a snapshot at a specific point in time. We need to find a way of accessing this history and making the data useful for more advanced reporting and analytics.
Time Context
In providing a point in time, we are applying a time context to the data at the point of extraction. This context is then static and will never change unless you replace the data set with a different time context. Snapshot extractions are simply a collection of records with a time context applied. Often when extracting for analytics, companies will take a snapshot at the end of each month for each person or object. We get a result set similar to the below:
The above is a simple approach but will miss out on the changes that occur between snapshot, because they're effectively hidden and ignored. When connecting external data sets that are properly effective- dated, you will need to make a decision on which snapshot is accurate to report against, but you simply don't have enough information available to make this connection correct. This snapshot is an inaccurate representation of what is really occurring in the data set, and it's terrible for pro-rating calculations to departments or cost centers and even something as basic as an average headcount is severely limited. Close enough is not good enough.
If you are not starting out with a basis of accuracy, then everything you do downstream has the potential to be compromised.
Remove the Context of Time
There's a better way to represent data for reporting and analytics.
The above distills down the number of records to only that which is needed and matches transactional and other object changes which means you can join to the data set at the correct point in time rather than approximating.
Time Becomes a Flexible Concept
This change requires that you apply a time context at query time, providing infinite flexibility for aligning data with different time constructs like the below
It's a simple enough join to create the linkage
left outer join timeperiods tp on tp.date between employee.effective_date and employee.end_date
We are joining at the day level here, which gives us the most flexibility and accuracy but will absolutely explode the number of records used in calculations into the millions and potentially billions of intersections. For us at One Model, accuracy is a worthwhile trade-off and the volume of data can be dealt with using clever query construction and, of course, some heavy compute power. We recently moved to a Graphics Processing Unit (GPU)-powered database because, really, why would you have dozens of compute cores when you can have thousands? (And, as a side note, it also allows us to run R and Python directly in the warehouse #realtimedatascience).
More on this in a future post but for a quick comparison, take a look at the Mythbusters demonstration
What About Other Objects?
We also apply the same approach to the related objects within Workday so that we're building a historical effective-dated representation over time. Not all objects support this, so there are some alternative methods for building history.
Retroactive Changes?
Data changes and corrections occur all the time, as we regularly see volumes of changes being most active in the last six months and can occur several years in the past. Snapshots often ignore these changes unless you replace the complete data set for each load. The smarter way is to identify changes and replace only the data that is affected (i.e., replace all historical data for a person who has had a retroactive change). This approach facilitates a changes-only feed and can get you close to a near-real-time data set. I say "close to near-real time" because the Workday API is quite slow, so speed will differ depending on the number of changes occurring.
Okay, So How Do You Accomplish This Magic?
We have built our own integration software specifically for Workday that accomplishes all of the above. It follows this sequence:
6. Data is available in our toolsets or you have the option to plug in your own software like Tableau, PowerBI, Qlik, SAS, etc.
7. One Model is up and running in a few days. To accomplish all of the above, all we need is a set of authorized API credentials with access provided to the objects you'd like us to access.
8. With the data model constructed, the storyboards, dashboards, and querying capabilities are immediately available. Examples:
Flexibility - The Biggest Advantage You Now Have
We now have virtually all data extracted from Workday in a historically accurate transaction-based format that is perfect for integrating additional data sources or generating an output with any desired time context (even convert back to snapshots, if required). Successful reporting and analytics with Workday starts with having a data strategy for overcoming the inherent limitations of the native architecture that is just not built for this purpose.
We're HR data and People Analytics experts and we do this all day long. If you would like to take a look, please feel free to contact us or book some time to talk directly below.
Learn more about One Model's Workday Integration