以下为卖家选择提供的数据验证报告:
数据描述
This dataset contains 200 Acquired Podcast Transcripts we collected from the official website (https://www.acquired.fm/) with metadata specified in acquired_metadata.csv
.
We also developed a QA dataset for RAG evaluation in acquired-qa-evaluation.csv
contains the following columns:
- question: The question posed for evaluation.
- human_answer: The answer provided by a human.
- ai_answer_without_the_transcript: The answer provided by an AI model without access to the transcript.
- ai_answer_without_the_transcript_correctness: The factual accuracy of the AI answer without the transcript verified by a human.
- ai_answer_with_the_transcript: The answer provided by an AI model with access to the transcript.
- ai_answer_with_the_transcript_correctness: The factual accuracy of the AI answer with the transcript verified by a human.
- quality_rating_for_answer_with_transcript: The quality of the AI answer rated by a human.
- post_url: The URL of the podcast episode related to the question.
- file_name: The name of the transcript file associated with the episode.
The project was created and designed by me with the help of the following people:
- Rain Jiang: crawler development and data collection
- Yihong Chen: data parsing, cleaning, and analysis
The following are students in my Introduction to Generative AI course (Spring 2024), who created the QA dataset:
- Priya Amara
- Saviour Adelwin Anyagri
- Ezgi Basaranlar
- Sara Baskaran
- Nimet Batan Altiyaprak
- Reed Bidgood
- Daniel Coleman
- James Dalton
- Chaitanya Dhullipala
- Yin Ding
- Aksel Dirkzwager
- Malek Elsayyid
- John Fabricatore
- J'Quoi George
- Ed Gorman
- Amanda Grosz
- Donald Harris
- Bryan Horsey
- David Kam
- Daria Klimkovskaia
- Mathieu Lippens
- Ruth McDuffie
- Ashish Mishra
- Achal Modi
- Jayaprakash Moses
- Naomi Nyarinda Okemwa
- Silvia Atelo Okwach
- Kardam Patel
- Pramila Paudyal
- Chris Pic
- Rajesh Rao
- Ronald Russian
- Summer Shaheed
- Rohan Swain
- Shriya Tandon
- Aniket Turaskar
- Upendar Vanavasam
- Andrea Young

Acquired Podcast Transcripts and RAG Evaluation
13.34MB
申请报告