Male Characters are Four Times More Prevalent in Pre-Modern Literature than Female Characters: Study

Apr 29, 2022 by News Staff

A duo of scientists from the Information Sciences Institute at the University of Southern California has analyzed the publicly available English-language books in the Project Gutenberg corpus. The genre of books ranged from adventure and science fiction, to mystery and romance, and in varied mediums, including novels, short stories, and poetry.

Nagaraj & Kejriwal defined and measured the differences between male character prevalence and female character prevalence using three robust measures of prevalence, on a corpus of copyright-expired literary texts from the Project Gutenberg English-language corpus. Image credit: Sci-News.com.

Nagaraj & Kejriwal defined and measured the differences between male character prevalence and female character prevalence using three robust measures of prevalence, on a corpus of copyright-expired literary texts from the Project Gutenberg English-language corpus. Image credit: Sci-News.com.

“Gender bias is very real, and when we see females four times less in literature, it has a subliminal impact on people consuming the culture,” said co-author Dr. Mayank Kejriwal.

“We quantitatively revealed in an indirect way in which bias persists in culture.”

“Books are a window to the past, and the writing of these authors gives us a glimpse into how people perceive the world, and how it has changed,” added first author Akarsh Nagaraj.

The researchers utilized Named Entity Recognition (NER), a prominent natural language processing (NLP) method used to extract gender-specific characters.

“One of the ways we define this is through looking at how many female pronouns are in a book compared to male pronouns,” Dr. Kejriwal said.

“The other technique is to quantify how many female characters are the main characters in it.”

This allowed the team to determine whether the male characters were central to the story.

The findings also showed that the discrepancy between male and female characters decreases under female authorship.

“It clearly showed us that women in those times would represent themselves much more than a male writer would,” Dr. Nagaraj said.

The team’s diversified methods to measure and determine female representation in literature did not come without limitations, however, when authors are neither male or female.

“When we published the dataset paper, reviewers had this criticism that we were ignoring non-dichotomous genders,” Kejriwal said.

“But we agreed with them, in a way. We think it’s completely suppressed, and we won’t be able to find many transgender individuals or non-dichotomous individuals.”

“Our study shows us that the real world is complex but there are benefits to all different groups in our society participating in the cultural discourse,” he added.

“When we do that, there tends to be a more realistic view of society.”

_____

Akarsh Nagaraj & Mayank Kejriwal. Robust Quantification of Gender Disparity in Pre-Modern English Literature using Natural Language Processing. arXiv.org, arXiv: 2204.05872

Share This Page