
卖家暂未授权典枢平台对该文件进行数据验证,您可以向卖家
数据描述
Women in Headlines: Bias
Investigating Gendered Language, Temporal Trends, and Themes
By Amber Thomas [source]
About this dataset
This dataset contains all of the data used in the Pudding essay When Women Make Headlines published in January 2022. This dataset was created to analyze gendered language, bias and language themes in news headlines from across the world. It contains headlines from top50 news publications and news agencies from four major countries - USA, UK, India and South Africa - as published by SimilarWeb (as of 2021-06-06).
To collect this data we used RapidAPI's google news API to query headlines containing one or more of keywords selected based on existing research done by Huimin Xu & team and The Swaddle team. We analyzed words used in headlines manually curating two dictionaries — gendered words about women (words that are explicitly gendered) and words that denote societal/behavioral stereotypes about women. To calculate bias scores, we utilized technology developed through Yasmeen Hitti & team’s research on gender bias text analysis. To categorize words used into themes (violence/crime, empowerment, race/ethnicity/identity etc), we manually curated four dictionaries utilizing Natural Language Processing packages for Python like spacy & nltk for our analysis. Plus, inverting polarity scores with vaderSentiment algorithm helped us shed light on differences between women-centered/non-women centered polarity levels as well as differences between global polarity baselines of each country's most visited publications & news agencies according to SimilarWeb 2020 statistics..
This dataset enables journalists, researchers and educators researching issues related to gender equity within media outlets around the world further insights into potential disparities with just a few lines of code! Any discoveries made by using this data should provide valuable support for evidence-based argumentation . Let us advocate for greater awareness towards female representation better quality coverage!
More Datasets
For more datasets, click here.
Featured Notebooks
- 🚨 Your notebook can be here! 🚨!
How to use the dataset
This dataset provides a comprehensive look at the portrayal of women in headlines from 2010-2020. Using this dataset, researchers and data scientists can explore a range of topics including language used to describe women, bias associated with different topics or publications, and temporal patterns in headlines about women over time.
To use this dataset effectively, it is helpful to understand the structure of the data. The columns include headline_no_site (the text of the headline without any information about which publication it is from), time (the date and time that the article was published), country (the country where it was published), bias score (calculated using Gender Bias Taxonomy V1.0) and year (the year that the article was published).
By exploring these columns individually or combining them into groups such as by publication or by topic, there are many ways to make meaningful discoveries using this data set. For example, one could explore if certain news outlets employ more gender-biased language when writing about female subjects than other outlets or investigate whether female-centric stories have higher/lower bias scores than average for a particular topic across multiple countries over time. This type of analysis helps researchers to gain insight into how our culture's dialogue has evolved over recent years as relates to women in media coverage worldwide
Research Ideas
- A comparative, cross-country study of the usage of gendered language and the prevalence of gender bias in headlines to better understand regional differences.
- Creating an interactive visualization showing the evolution of headline bias scores over time with respect to a certain topic or population group (such as women).
- Analyzing how different themes are covered in headlines featuring women compared to those without, such as crime or violence versus empowerment or race and ethnicity, to see if there’s any difference in how they are portrayed by the media
Acknowledgements
If you use this dataset in your research, please credit the original authors.
Data Source
License
See the dataset description for more information.
Columns
File: headlines_reduced_temporal.csv
Column name | Description |
---|---|
headline_no_site | The headline of the article without the publication site name mentioned above it. (String) |
time | The time the article was published. (DateTime) |
country | The country the article was published in. (String) |
bias | The bias score of the article. (Float) |
year | The year the article was published. (Integer) |
Acknowledgements
If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Amber Thomas.
