Key types of discrimination to be aware of in AI and why it happens to Amazon as well

Amit Cohen
5 min readJun 6, 2023

--

In the 1930s, a widespread practice known as redlining significantly impacted towns and cities throughout the US and Canada. Redlining involved drawing lines on maps in categorizing areas as desirable or undesirable. Unfortunately, these maps were then used as a basis for denying crucial resources such as loans, insurance, and public services to the residents of these marked areas.

Bias is often associated with prejudice, where judgments are formed without understanding all the facts. While bias is far from ideal, it is crucial to acknowledge its existence. Some psychologists argue that bias is a cognitive shortcut our brains use for making quick decisions. For instance, if confronted by a snarling tiger, our bias would be to run away to ensure our safety instinctively.

In AI, bias can be found within neural networks through a component called “bias.” This element plays a role in guiding the model toward more accurate results. However, it is crucial to recognize that redlining was a form of bias. Although it was intended to facilitate risk assessment for policymakers and business owners in the 1930s, its consequences were undeniably detrimental.

Recently, companies like Amazon have employed AI models for automated resume screenings. However, during one particular experiment, it was discovered that the model consistently assigned lower scores to female applicants who were equally qualified compared to their male counterparts. The root cause of this bias can be traced back to the training data, which predominantly consisted of male resumes. Terms associated with women or colleges received less weight in the model’s evaluation. In contrast, specific terms found on male resumes were given more importance.

The Amazon AI experiment that revealed bias against women is a notable example of how biases can manifest in AI systems. Amazon implemented an automated resume screening system using a trained artificial intelligence model in this experiment. The intention was to streamline the hiring process and identify qualified candidates objectively. However, the system demonstrated a consistent bias against female applicants.

During the training phase, the model was provided with a dataset of resumes from the previous ten years. Unfortunately, the dataset was predominantly composed of resumes from male applicants. As a result, the model learned to associate specific terms more commonly found in male resumes with positive attributes while devaluing words more prevalent in resumes from women or women’s colleges.

Consequently, when new resumes were fed into the system, female applicants received lower scores than equally qualified male applicants. The bias arose from the skewed training data, which failed to represent female candidates’ diversity and qualifications sufficiently.

This discovery prompted Amazon to terminate the experiment in 2017. They acknowledged that the AI model’s recommendations should not be the sole determinant in the hiring process. The incident highlighted the importance of addressing bias in training data and ensuring that AI models are not perpetuating discriminatory practices or reinforcing existing gender disparities.

The Amazon case is a cautionary tale, illustrating the significance of considering biases throughout the entire AI development pipeline. It highlights the need for diverse and balanced training data, rigorous evaluation of model outputs, and ongoing monitoring to detect and rectify biases in AI systems. We can strive to create more fair and equitable AI technologies by learning from such incidents.

Bias in AI models often arises unintentionally due to innocent mistakes or a lack of awareness. These biases can lead to incorrect judgments or assumptions based on incomplete or skewed data. It is essential to be aware of common types of discrimination, such as faulty assumptions, automation bias, experimenter bias, and group attribution bias.

Additionally, sampling and reporting methods can introduce bias. Coverage bias occurs when the training data fails to represent the intended population accurately. Sampling bias arises when the sample collected for a study does not provide a representative cross-section of the target group. Participation bias occurs when survey respondents do not adequately represent the entire population, potentially skewing the results. Convenience bias, on the other hand, occurs when easily accessible data is used instead of more relevant or recent information.

While there are numerous types of biases, it is vital to identify and mitigate them relevant to a particular problem. While it may be impossible to eliminate all prejudices, it is our responsibility in machine learning projects to address and minimize any preferences that may influence outcomes diligently. Please do so to avoid unfair and harmful consequences.

Bias in AI can manifest in various ways. Here are the key types of discrimination to be aware of in AI:

  1. Training Data Bias: Bias can arise from the training data used to train AI models. If the data is complete, representative, and balanced, it can result in biased outcomes. For example, suppose a dataset predominantly includes data from a specific demographic group. In that case, the model may generalize poorly to other groups.
  2. Automation Bias: Automation bias occurs when humans unquestioningly trust or defer to the decisions made by AI systems without critically evaluating them. This can lead to biased outcomes if the AI system itself is discriminatory.
  3. Experimenter Bias: Experimenter bias refers to the unintentional bias introduced by researchers or developers during the design, implementation, or evaluation of AI systems. Their preconceived notions or preferences can influence the outcomes and interpretations of experiments or studies.
  4. Group Attribution Bias: Group attribution bias occurs when individuals are attributed specific characteristics or behaviors based solely on their group membership, leading to unfair generalizations. AI systems can inadvertently perpetuate these biases if trained on biased data or rely on flawed assumptions.
  5. Coverage bias arises when the training data fails to represent the intended population fully or includes disproportionate representations of certain groups. This can result in AI models that are more accurate for some groups but perform poorly for others.
  6. Sampling Bias: Sampling bias occurs when the sample collected for a study or analysis does not provide a representative cross-section of the target group or population. This can introduce distortions and inaccuracies in the results obtained from the AI system.
  7. Participation Bias: Participation bias arises when the individuals who participate in surveys or provide data for AI systems do not adequately represent the entire population. This can lead to skewed or incomplete data, producing biased outcomes.
  8. Convenience Bias: Convenience bias happens when easily accessible data is used instead of more relevant or recent information. This bias can occur if AI systems rely on readily available data sources that may not accurately reflect the current reality or diversity of the analyzed subject.

Identifying and addressing these biases in AI systems is crucial to ensure fair and equitable outcomes. AI can better serve diverse populations and avoid perpetuating discriminatory practices by mitigating biases.

--

--

Amit Cohen
Amit Cohen

Written by Amit Cohen

A product leader with exceptional skills and strategic acumen, possessing vast expertise in cloud orchestration, cloud security, and networking.

No responses yet