数据洋

women-in-headlines-bias

Social Issues and Ad Human Rights

￥30

28.71MB

数据标识：D17168905291034321

发布时间：2024/05/28

Women in Headlines: Bias

Investigating Gendered Language, Temporal Trends, and Themes

By Amber Thomas [source]

About this dataset

This dataset contains all of the data used in the Pudding essay When Women Make Headlines published in January 2022. This dataset was created to analyze gendered language, bias and language themes in news headlines from across the world. It contains headlines from top50 news publications and news agencies from four major countries - USA, UK, India and South Africa - as published by SimilarWeb (as of 2021-06-06).

To collect this data we used RapidAPI's google news API to query headlines containing one or more of keywords selected based on existing research done by Huimin Xu & team and The Swaddle team. We analyzed words used in headlines manually curating two dictionaries — gendered words about women (words that are explicitly gendered) and words that denote societal/behavioral stereotypes about women. To calculate bias scores, we utilized technology developed through Yasmeen Hitti & team’s research on gender bias text analysis. To categorize words used into themes (violence/crime, empowerment, race/ethnicity/identity etc), we manually curated four dictionaries utilizing Natural Language Processing packages for Python like spacy & nltk for our analysis. Plus, inverting polarity scores with vaderSentiment algorithm helped us shed light on differences between women-centered/non-women centered polarity levels as well as differences between global polarity baselines of each country's most visited publications & news agencies according to SimilarWeb 2020 statistics..

This dataset enables journalists, researchers and educators researching issues related to gender equity within media outlets around the world further insights into potential disparities with just a few lines of code! Any discoveries made by using this data should provide valuable support for evidence-based argumentation . Let us advocate for greater awareness towards female representation better quality coverage!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset provides a comprehensive look at the portrayal of women in headlines from 2010-2020. Using this dataset, researchers and data scientists can explore a range of topics including language used to describe women, bias associated with different topics or publications, and temporal patterns in headlines about women over time.

To use this dataset effectively, it is helpful to understand the structure of the data. The columns include headline_no_site (the text of the headline without any information about which publication it is from), time (the date and time that the article was published), country (the country where it was published), bias score (calculated using Gender Bias Taxonomy V1.0) and year (the year that the article was published).

By exploring these columns individually or combining them into groups such as by publication or by topic, there are many ways to make meaningful discoveries using this data set. For example, one could explore if certain news outlets employ more gender-biased language when writing about female subjects than other outlets or investigate whether female-centric stories have higher/lower bias scores than average for a particular topic across multiple countries over time. This type of analysis helps researchers to gain insight into how our culture's dialogue has evolved over recent years as relates to women in media coverage worldwide

Research Ideas

A comparative, cross-country study of the usage of gendered language and the prevalence of gender bias in headlines to better understand regional differences.

Creating an interactive visualization showing the evolution of headline bias scores over time with respect to a certain topic or population group (such as women).

Analyzing how different themes are covered in headlines featuring women compared to those without, such as crime or violence versus empowerment or race and ethnicity, to see if there’s any difference in how they are portrayed by the media

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

See the dataset description for more information.

Columns

File: headlines_reduced_temporal.csv

Column name	Description
headline_no_site	The headline of the article without the publication site name mentioned above it. (String)
time	The time the article was published. (DateTime)
country	The country the article was published in. (String)
bias	The bias score of the article. (Float)
year	The year the article was published. (Integer)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Amber Thomas.

看了又看

验证报告

目前该文件尚无匹配的数据质量验证程序。我们将在后续版本中提供相应的验证支持，敬请谅解。

women-in-headlines-bias

￥30

28.71MB

申请报告

women-in-headlines-bias

Women in Headlines: Bias

Investigating Gendered Language, Temporal Trends, and Themes

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

关于典枢

下载与支持

服务协议

关于我们

官方公众号

技术交流群