晓彤

verify-tagConsumer Complaints Dataset for NLP

financebeginnerexploratory data analysisnlpmulticlass classification

6

已售 0
94.86MB

数据标识:D17220366803239339

发布时间:2024/07/27

以下为卖家选择提供的数据验证报告:

数据描述

Context

The Consumer Financial Protection Bureau (CFPB) is a federal U.S. agency that acts as a mediator when disputes arise between financial institutions and consumers. Via a web form, consumers can send the agency a narrative of their dispute. An NLP model would make the classification of complaints and their routing to the appropriate teams more efficient than manually tagged complaints.

Content

A data file was downloaded directly from the CFPB website for training and testing the model. It included one year's worth of data (March 2020 to March 2021). Later in the project, I used an API to download up-to-the-minute data to verify the model's performance.

Each submission was tagged with one of nine financial product classes. Because of similarities between certain classes as well some class imbalances, I consolidated them into five classes:

  • credit reporting
  • debt collection
  • mortgages and loans (includes car loans, payday loans, student loans, etc.)
  • credit cards
  • retail banking (includes checking/savings accounts, as well as money transfers, Venmo, etc.)

After data cleaning, the dataset consisted of around 162,400 consumer submissions containing narratives. The dataset was still imbalanced, with 56% in the credit reporting class, and the remainder roughly equally distributed (between 8% and 14%) among the remaining classes.

Acknowledgements

data icon
Consumer Complaints Dataset for NLP
6
已售 0
94.86MB
申请报告