Friday, October 24, 2025

Analyzing Gender Differences in Work-Related Accidents Through Natural Language Processing

Share

markdown

In this section, we dive into the nuances and challenges we faced during our project, particularly regarding the evaluation of mechanisms detection in accident reports.

Human Evaluation of Mechanisms Detection

It’s no secret that models can make errors. To assess the performance of our mechanism detection model, we selected a subset of 1,103 documents labeled by human annotators. This selection wasn’t random; it was meticulously designed to ensure a comprehensive and fair evaluation of the model’s capabilities. Our sampling strategy employed two main criteria:

  • Temporal Stratification: To avoid seasonal bias—for instance, the higher incidence of certain accidents during specific months—we balanced our sample across each month of the study period.

  • Complexity Stratification: To prevent an inflated F1-score from overly simplistic cases, we stratified the sample by the length of the free-text descriptions, including a mix of short, medium, and long texts to challenge the model’s robustness.

The classes referenced correspond to Level 1 presented in Table 2. This involved creating annotation guidelines and deploying a rigorous annotation process for accident mechanisms. In total, one annotator labeled 612 documents, another labeled 365, while 126 documents had overlapping annotations for further curation. A confusion matrix comparing the model’s predictions against the human labels (the gold standard) is available in the Supplementary Material.

Our model achieved a weighted F1-score of 0.62 for the Level 2 classes of the International Labour Organization (ILO) ontology. These metrics were calculated by comparing the model’s results against the ground truth established by specially trained human annotators. It’s essential to note that our intent was to evaluate the model’s performance on a dataset that mirrors the real-world distribution of accident reports, including inherent class imbalances. Creating a balanced sample of rare and common accident types could lead to misleading results, reflecting an artificially constructed environment rather than a genuine operational context. The reported F1-score of 0.62, therefore, offers a more realistic appraisal of the model’s effectiveness and the genuine challenges posed by the actual data.

Mechanisms of Accidents

Many inconsistencies between the model’s outputs and human evaluations can be traced to semantic overlaps. For example, 22 incidents classified as “excessive force” were identified by the model as “overexertion.” This indicates a nuanced distinction: while both categories involve force exertion, “excessive force” refers to chronic high exertion, whereas “overexertion” pertains to acute force misapplication. Similarly, 25 documents labeled as “explosions or fires” were classified as “contact with extreme heat/cold,” a valid misclassification given the inherent link between fire incidents and extreme heat exposure.

However, some misclassifications are less easily rationalized. For instance, “steps” were occasionally confused with “falls,” and “animal attacks” were misidentified as different accident types. Instances of “projection of particles” were also confused with “contact with sharp objects” and “contact with substances.” These examples suggest that refining the class descriptions in our prompts could significantly enhance model differentiation. Such improvement would likely lead to greater accuracy, reinforcing the model’s strong potential in this classification task. Overall, while “GPT-3.5-turbo” exhibits substantial promise in categorizing accident mechanisms, periodic human evaluations and assessments of newer models will be essential.

Beyond assessing model performance, our findings reveal intriguing patterns in accident mechanisms. Notably, a significant finding was the disproportionately higher incidence of same-level falls among women, constituting 59% of all fall-related accidents, despite men being more frequently involved in accidents overall. This observation aligns with broader studies indicating that men sustain more accidents generally, yet women experience specific types of injuries such as falls, often linked to the nature of their occupations.

Occupations such as cleaning, education, manual packing, and manufacturing prominently featured in the analysis of fall-related accidents. A statistical analysis of individual accident records unveiled a significant correlation between sex and accident mechanisms (χ² = 17,116.52, p < 0.001), alongside a moderate effect size (Cramér’s V = 0.224). Additionally, a strong relationship was identified between occupation and the occurrence of falls (χ² = 58,475.75, p < 0.001), indicating certain occupational groups are at greater risk. The fact that a significantly higher concentration of falls occurs among women (z = -103.26, p < 0.001) underscores the necessity for integrating a gender-sensitive approach in occupational safety strategies.

Similar Occupations

An interesting challenge arose while dealing with datasets in which crucial information, particularly concerning occupations, was recorded as free text. For instance, distinctions between operario and operador (both typically translated as “operator”) often appeared blurred. We opted to group these categories due to insufficient clarity regarding competency levels among the records. Future efforts will involve primary data collection to clarify these distinctions from individuals involved in the accident reporting process.

  • Docente vs. profesor: Likewise, the terms docente and profesor, both interpreted as “teacher,” must be scrutinized. Variations in teaching roles depending on academic levels exist, with distinctions in English such as tutor, lecturer, or professor. Our discussions with ACHS clarified that such variability likely arises from free-text entries rather than differences in competencies. Understanding how occupation descriptors are entered during the reporting process is crucial for improving future analyses.

Using Word Embeddings to Map Occupations to an Ontology

To classify over 42,000 free-text occupations, we initially employed word embeddings, assigning vector representations to words within our vocabulary. The cosine similarity algorithm was applied to evaluate the affinity between these vectors and standardized classifications. However, when contrasted with a human-created gold standard, this method yielded less satisfactory results compared to the approach employing GPT-4o-mini.

This discussion emphasizes the transformative impact of generative large language models (LLMs) on natural language processing (NLP). While the capabilities they offer are groundbreaking, we reiterate the necessity of ongoing human evaluation and critical assessment as new models emerge. Additionally, the rise of open-source models such as Llama-3 or Mistral presents promising avenues for future research.

Sex vs. Gender

Our dataset included a binary “sex” classification, typically assigned based on physical appearance rather than self-identification. Moving forward, we advocate for recognizing and registering gender identity, including options for non-binary individuals. ACHS has recently initiated capturing gender data; however, updates are required to standardize this approach across reporting to the Superintendency of Social Safety (SUSESO), necessitating coordinated efforts among occupational health providers.

A critical limitation of this dataset centers on its binary sex classification and the lack of comprehensive information beyond this dichotomy. Ensuring that accounting systems allow for self-reporting of both sex and gender, rather than imposing assumptions based on appearance, is crucial to mitigating biases and respecting individual identities.

In Chile, progress has been made towards distinguishing between sex at birth and gender in national statistics, spearheaded by the National Statistics Institute (INE). This differentiation enhances the measurement of social, economic, and cultural phenomena not adequately captured by binary classifications, particularly in the labor market where gaps in participation, especially among women in unpaid domestic work, become evident.

Mostly Female and Male Occupations

Although women’s participation in various economic roles has evolved considerably over the past five years, persistent gender roles continue to pervade the labor market. Our findings reveal a prominent concentration of women in care-oriented jobs—including domestic work, cleaning, education, and childcare. Notably, 11% of recorded accidents occurred in care occupations, with women constituting 84% of this demographic. This statistic underscores the importance of analyzing care-related roles to better understand the implications for women, many of whom may not be reflected in paid employment statistics.

Conversely, male-dominated occupations, including vehicle operation and manual labor, are characterized by higher rates of occupational accidents. Men represented 55.9% of incidents recorded in our dataset, emphasizing the need for disaggregated data to more effectively target prevention strategies tailored to the distinct needs of each gender.

Such disaggregation enables a nuanced understanding of occupational health risks, promoting the design of preventive strategies that cater to the specific needs of both men and women in the workplace.

Workers Without Contracts

As previously noted, ACHS primarily accommodates individuals with employment contracts, leaving less than 2% of affiliates as independent workers. The precarious employment circumstances disproportionately affect women in Chile, often resulting in a lack of formal contracts. Consequently, these workers remain unprotected by labor laws, facing lower wages and job insecurity, alongside insufficient social benefits like pensions, sick leave, or health insurance.

National statistics reveal that three out of ten employed women work informally, exhibiting a 1.9 percentage point gap compared to men. This lack of representation in our analyzed database underscores the necessity of addressing the occupational hazards faced by informal workers, particularly those disproportionately affecting women.

Read more

Related updates