Featuring Psychology Today: Traditional reference checks are plagued with bias - Should they be left in the past?

Guides & Reports Emelie Dahl


Refapp's researcher and business analyst Carl-Johan Holmberg and Melissa Wheeler, Ph.D., shedding light on reference checking bias with an article in Psychology Today

Key Points:

  • Although popular, traditional reference checks are plagued with biases and validity issues.
  • Referees tend to use more standout adjectives when describing male candidates compared to female candidates.
  • Biases like these can exacerbate the underrepresentation of women in organizations.
  • Reference checking can be a useful personnel selection procedure if it is conducted in a structured way.
Written by Carl-Johan Holmberg and Melissa Wheeler for Psychology Today

Reference checking is a personnel selection procedure that is widely used across the world. In reference checking, potential employers request information from informants who have previously worked with a job applicant so that employers might sneak a peek or get an idea of what the applicant will be like to work with. Although popular, traditional reference checks are plagued with biases and validity issues.

Implicit Gender Bias

Implicit biases are unconscious processes that lead to a negative evaluation of a person based on irrelevant characteristics, such as gender, age, or ethnicity. Several studies have shown that there is a gender bias against women in employment references. For example, Madera and colleagues (2019) found that referees tend to put more doubt in the evaluation of female candidates compared with male candidates.

An example of a doubt-raising comment is: "She is unlikely to become a superstar, but she is very solid." In another study, Schmader and colleagues (2007) found that referees tend to use more standout adjectives (that is, exceptional, unique, and outstanding) when describing male candidates compared with female candidates. Biases like these can exacerbate the underrepresentation of women in organizations. They may also affect hiring decisions in such a way that suitable candidates are screened out in favor of less suitable candidates.

Combating Gender Bias

New research shows that gender bias can be mitigated by adding structure to the reference check. Fisher and colleagues (2022) analyzed data from over one million digital reference checks that were highly structured. That is, numerical scales were used instead of free-text responses. The result showed that in these highly structured reference checks, the candidates’ gender did not affect the referee ratings. This was even the case for candidates in occupations that are typically gender stereotyped, such as truck drivers and nurses.

Thus, gender bias in reference checks seems to be combated by not allowing free text evaluations (where subjective evaluations and different adjectives are applied to men and women) and instead forcing referees to rate the candidates on objective, consistently applied numerical scales. 

The Leniency Bias

Imagine you have applied for a job and have been asked to supply a few referees to be contacted to vouch for you. Who are you likely to ask: A previous supervisor who was tough on you or your informal mentor and friend? Job applicants who self-select their references are apt to choose raters who are likely to give them the highest scores. So, if everyone is reaching out to referees that they know (or believe) will speak highly of them, it begs the question: What’s the point in continuing with this approach? 

With the digitalization of human resource management, new techniques for reference checking have recently been introduced. For example, digital reference checking software makes it possible to verify the identity of the referees and collect data from a large number of referees without adding to the workload of the recruiter.

Since the reliability of a reference check increases with the number of referees, utilizing digital data collection and identity verification may be a way of mitigating leniency bias and improving the quality of the data collection. Providing information about candidates digitally seems to be accepted by referees. According to statistics from Refapp (2023), the response rate to digital reference checks is 86 percent, and referees respond on average after 24 hours. 

Employers can also specify who qualifies as an appropriate referee and may choose to limit these to only previous line managers of the applicant.

The Problem of Low Validity

The purpose of personnel selection procedures, such as reference checks, usually is to predict how job applicants will behave in a specific job position. Based on these predictions, employers hope to hire the most suitable candidate. To evaluate how well a selection method can predict work-related behavior, researchers often examine the predictive validity of the selection method. Predictive validity refers to how well the score from one measurement (for example, an intelligence test) can predict the response of another variable (for example, work performance).

In a frequently cited article, Schmidt and Hunter (1998) summarized meta-analytic findings on how well different personnel selection procedures predict work performance. Compared to other personnel selection procedures, the authors found that reference checks have relatively low predictive validity on work performance. This evidence should caution against the continued reliance on traditional reference checks.

However, some studies have shed new light on the validity of reference checks. As Zimmerman and collegues. (2010) pointed out, the studies included in Schmidt and Hunter’s (1998) review used unstructured reference checks. During the 21st century, studies have instead investigated the validity of structured reference checks. In these studies, the scores from structured reference checks have shown strong relationships with work performance. Thus, evidence suggests that reference checking can be a useful personnel selection procedure if it is conducted in a structured way.

How to Increase Validity

As mentioned above, scientific evidence suggests that structured reference checks outperform unstructured reference checks in predicting work performance. Adding structure to reference checks can also mitigate implicit bias. In personnel selection procedures, such as reference checks and job interviews, structure usually refers to the following criteria being met:

  • The questions are based on a job analysis
  • The questions are related to areas recommended by researchers, such as job-related behavioral questions
  • The rating formats are standardized
  • The questions are consistent across candidates

There are both economic and ethical reasons why bias and low validity do not belong in a personnel selection procedure. By adding structure and thus mitigating bias and increasing validity, organizations may have good reasons to keep reference checks in their arsenal.

 Are you curious to know more about digital reference checking with Refapp and how the tool could have created value in your business? Please contact us!