Engagement and its Drivers

Retrospective look at a yearlong consulting assignment

10 min readDec 12, 2018

Note: This is a retrospective on a year long consulting assignment that brought me back to my industrial-organizational psychology roots. For confidentiality, names are omitted.

Initial Client Engagement

At the start of the year, I got an opportunity to help an organization address some of its challenges by conducting an organization-wide engagement survey. Sensing a chance to re-connect with my applied psychology roots, refine my data analysis skills and learn R in the process, I agreed.

I met with the leadership team to get a sense for issues they were struggling with. They were experiencing a wide range of issues common for a rapidly expanding global organization (i.e., challenges pertaining to staff development, talent retention/turnover etc).

The leadership team had a working hypothesis about their issues, but wanted concrete data to help guide their decision-making, so we agreed that creating an organization-wide engagement survey would be a helpful way to get the pulse of the organization and provide guidance for addressing their challenges.

Note: Applied psychologists don’t generally think about things like governance, but it’s important to note that I was contracted by the organization’s board. The ‘client’ can change from project to project, but I found it immensely helpful to be authorized by an independent board. Boundaries and authority dynamics are important to note.

Survey Construction

This would be the first systematic, organization-wide data collection effort in the organization’s history and, more importantly, an opportunity to set a foundation for being evidence-based and data-driven for years to come.

The goal was to create an engagement survey that was as comprehensive as possible. I scoured the academic literature, industry benchmarks, best practices and other practitioner sources. I consulted sources like Qualtrics, Culture Amp, Gallup, and others for practitioner input. I also surveyed how researchers are conceptualizing employee engagement including Macey & Schneider (2008), Shuck (2011), and Schaufeli, Bakker & Salanova (2006).

It was important to balance being as comprehensive as possible, while being mindful of respondent fatigue.

I also found it essential to have a general working model of organizational performance in mind. To ensure that the survey was comprehensive in coverage, thinking about all the factors that could impact staff engagement, I went back to my training:

There are other models of organizational performance out there, but my preference is for Burke & Litwin (1992).

Nevertheless, the most important part of survey construction is accommodating the client’s desire to have things customized. The theory, the framework, the stats, the model — none of that matters if the client isn’t buying in. Therefore, we spent a significant amount of time working with the client to co-create this survey.

The theory, the framework, the stats, the model — none of that matters if the client isn’t buying in.

I customized as much as I could, without compromising existing survey scales that already had tons of research backing.

When co-creating a survey with the client, I found two things important to keep in mind:

Be prepared to run analyses for scale internal consistency. When developing a new scale, this is best practice, but as I learned with client engagement — you need to do this if the client wants anything customized.
Be honest with the client. Taking time to explain that some survey items have already been tested and that changing the wording could potentially lower its reliability.

Qualitative Data

In addition to quantitative survey items, my client was adamant that staffers would have a lot to share and so they wanted an open-ended qualitative response for every section.

Although this adds an additional layer of complexity when doing data analysis, this is important.

I think it’s extremely important to have qualitative data compliment quantitative data. Yes, your data is more robust and the two types of data are able to get at certain things that, when combined, are quite powerful. However,

The most important thing that qualitative data adds is your ability to provide a narrative for the client.

As important as it is to be data-driven, not everyone speaks quantitative.

Data Analysis

One of the reasons I took on this assignment was because I was looking for an opportunity to learn R. It was great. Looking back, I wish my graduate training had empowered me further to learn R, instead of SPSS. There’s no reason not to learn a powerful, open-source, F-R-E-E tool like R, instead of paying a yearly fee to IBM just to use SPSS.

As fun as it was learning R, when it came to presenting impactful data for the client, I found a mixture of R and Excel was the way to go.

Arguably, if you’re in an organization that does not have a track record with analytics, using excel simply to get the ball rolling is fine.

Step 1: Cleaning and Exploring the data

[Note: These steps are recommendations before creating frequency tables and before loading the data into R. Google forms will allow you to download the dataset in a csv format which can then be easily uploaded into R. Most of the prior work will be done in excel. After downloading the dataset from Google Forms, save it as ‘raw’ and create a copy].

First, Google form allows a diverse range of response options from traditional Likert rating (which makes up the majority of the items) to other response types. The first task is to re-code non-numerical response types to make them amenable to data analysis.

These responses are technically already in numerical form, but the “text” had to be removed (i.e., ‘ = Almost never (a few times a year or less)’ needed to be changed to ‘1’).

0 = Never

1 = Almost never (a few times a year or less)

2 = Rarely (once a month or less)

3 = Sometimes (a few times a month)

and so on…

Demographic variables had to be coded. This document on coding conventions for demographic variables was helpful. It’s fairly comprehensive, covering age, gender, ethnicity, education, marital status, house-hold income, to coding conventions for religious identification like ‘Roman Catholic — Protestant: Protestant, Adventist, Episcopalian, Baptist and so on’). Having such a document allows for uniform coding and reference. [NOTE: this should be planned out during Survey Construction]

Most of the items used a standard 5-point Likert scale (i.e., 1 = Strongly Disagree; 2 = Disagree; 3 = Neither Agree nor Disagree; 4 = Agree; 5 = Strongly Agree), which did not need recoding.

It’s essential to get intimate with the data. R can be used to get a sense of the data, it’s shape — number of columns, rows, column names — and it’s important to manipulate the data to ready it for analysis (i.e., recoding certain demographic variables to numbers). Finally, you need to handle missing data and reverse score certain items; all of this can be accomplished in R.

Scripts for familiarizing yourself with the data in R.

It’s true that cleaning the data is 50% of the analysis. If this is done well, everything else is gravy.

Step 2: Scale Internal Consistency

This is generally recommended when creating new scales, but you still would want to run them when using existing scales. As mentioned above, in the event that the client wants “customized” scales — and they surely do — it helps to have confidence in the instrument, you’ll definitely need to do inter-item correlation analysis.

Scripts for conducting scale internal consistency analysis.

Sources consulted for this section of the analysis include:

Fricker, Ronald D. “Analysis of Questionnaire Data with R.” Journal of Statistical Software 46, no. Book Review 1 (2012). https://doi.org/10.18637/jss.v046.b01. Written by Bruno Falissard.
“An Introduction to Data Cleaning with R,” n.d., 53. Written by Edwin de Jonge and Mark van der loo.
This post provides a nice discussion on different ways to calculate internal consistency of survey items.
In addition, I loaded the ‘psy’ package (i.e., library(psy)) to run these analyses.

Step 3: Frequencies, Central Tendency and Variability

Before going into R, I’d recommend creating frequency tables for all survey items. As you’ll see, this step forms the bedrock of your analysis and interpretation of the data. Although I took on this assignment to learn R, this particular step of creating frequency tables in excel is arguably the most impactful part of the analysis. I’d say for 80% of the time, having frequency tables with central tendency and variability are enough to add value when analyzing survey data. Moreover, these descriptive statistics provides an anchor for making sense of the more complex calculations down the line.

The two sources that helped me most for this section are:

Ann K Emery’s video tutorial on analyzing satisfaction survey data with countif. This video will allow you to create simple clear frequency tables (She helps communicate technical data through data visualizations, reports, slideshows, and dashboards, highly recommended).
Dane Bertram’s “Likert Scales…are the meaning of life”. This article provides a discussion on the best way to present likert scale data. In addition to frequency, you’ll want measure of central tendency (i.e., median, mode for likert data) and variability (i.e., range). You can download the file here.

With these two sources, I settled on a simple way to communicate scale data:

The table should include:

Frequency statistics (total count, percentages). Note, missing data for any specific will simply show up as a smaller total count.
Measure of central tendency (median, mode, instead of the mean). Whether respondents tended to score in the ‘agree’ or ‘disagree’ portions.
Measure of variability (range, interquartile range). The level of agreement on an item.

Step 4: Data Analysis: Correlation and Regression

At the end of the day, clients have to make decisions in an environment with scarce resources. You want to be able to say, “out of all the important things you could address, here’s where you start…these are the most important things…”.

Because we were aiming for comprehensiveness in our coverage of employee engagement drivers, we ended up with nearly 22 drivers. However, having 22 variables can be overwhelming, and could lead to paralysis. Thus, the goal was to focus on key issues that had the most effect on engagement outcomes.

This allows us to hone in and guide management decision about:

where to start
what to prioritized
what could potentially yield the highest return on investment

First, I ran a correlation among all outcome variables, expecting to see fairly high correlations (0.70+).

Then, I ran a correlation with all 22 drivers and 4 outcome variables. The objective was to filter through all correlations and focus on those that are at least 0.60 (upper-medium-to-high); this narrowed the list of drivers down from 22 to eleven.

Note: It should be pointed out that eleven variables were filtered out because their correlation with any of the outcome variables was less than 0.60; this does not diminish the importance of those variables, but that for this particular dataset, there is a pattern such that some variables have a higher impact on outcome variables than others.

With eleven drivers that were fairly highly correlated (0.60+) with the outcome variables, the next would be to further narrow this set down to see out of these eleven, which driver would provide most unique explanatory power for each of the outcome.

To do this, we ran multiple regression analyses. Variables that would generate “unique” variance are considered “starting points” for management to act.

Scripts for running Correlations in R.

Scripts for running Regression in R.

Main & Interaction Effects

Across the four outcome variables, there were six drivers that had significant main effects on the outcome.

Furthermore, there was a need to understand if the main effects differed by certain demographic characteristics (i.e., position, team, age, tenure, race/ethnicity).

This would require test for moderated regression (interaction effects).

At this point, I have enough data to direct management attention on the following:

Where to start
Which variables provide highest ‘return on investment’
Whether or not variables differed based on certain demographics

Next, we want to make use of the qualitative data, especially to help corroborate or buffer specific quantitative results.

Step 5: Qualitative Data Themes

Qualitative analysis is a huge topic, deserving of its own post.

I found this video most helpful in doing straightforward content analysis in Excel. A straight forward way of counting and making sense of themes that emerge from qualitative, open-ended responses. Remember, qualitative data is just as important as quantitative.

It’s easy for applied psychologist to “take refuge” in the quantitative numbers, but the reality is, clients don’t want to hear numbers, they want a narrative.

Communication and Storytelling

I had to learn the hard way that statistics is not a language most people appreciate. I was feeling good about the data, the analysis and excited to share what they revealed, but my clients informed me they wanted a narrative.

Narrative in this sense requires answering ‘why’, as in: Why does this driver matter more than that one? Why does this hold more weight? Why should we start here? Why is this driver emphasized more than that?

And the trick is to communicate all that without using phrases like “explains more variance” or “more significant”. I found it helpful to explain how regression works.

[Note: For future reference, provide a narrative for the client with all the numbers and data in a separate appendix]

Commitment to Action

To cap off the project, I ran a workshop with the senior leadership team to guide them through brainstorming and committing to a course of action, based on what the data is telling them.

As an external consultant, that last thing you want is to have your “recommendations” gather dust in an un-opened file.

Going the distance by hosting a workshop to guide your clients towards committing to action is really the only thing that can at least give you some comfort that your consultation will not have gone to waste.

Reflection

When I took on this assignment, the opportunity for me to get back in touch with quantitative survey data excited me. And it was definitely fun to learn how to run analyses on R.

Yet, it’s the qualitative aspects that made the most impact. As an external consultant, you bump up against client organization dynamics. This is very natural and normal.

The work is political. People will reveal their agendas to you. I found it helpful to consider all “types” of data.

In a world driven by A.I., big data and data science, I wonder if we’ve overlooked the qualitative.