老下头

verify-tagMovie Scripts Corpus

movies and tv showsnlptexttransformers

8

已售 0
657.42MB

数据标识:D17175023701505482

发布时间:2024/06/04

以下为卖家选择提供的数据验证报告:

数据描述

Movie Scripts Corpus

This corpus was collected to use for screenplay analysis with machine learning methods. Corpus includes movie scripts, crawled from different sources, their annotations by script structural elements and movies metadata.

Corpus description

Screenplay data consists of:

  • Movie scripts TXT-documents with raw full text (2858 docs)
  • Movie scripts TXT-documents with full text lemmas (2858 docs)
  • Manual annotation TXT-documents for some movie scripts (33 docs, more than 6000 annotated rows)
  • Movie scripts annotations TXT-documents obtained by BERT
  • Movie scripts annotations json-documents obtained by rule-based annotator ScreenPy

Movies metadata consists of:

  • Cut versions of movie reviews and scores from metacritic:
    • Number of reviews: 21025
    • Number of movies with reviews: 2038
  • Metadata for movies, including:
    • title, akas, launch year, score from metacritic, imdb user rating and number of votes from imdb.com, movie awards, opening weekend, producers, budget, script department, production companies, writers, directors, cast info, countries involved in production, age restrict, plot (with outline), keywords, genres, taglines, critics' synopsis
  • Screenplay awards information:
    • Academy Awards adapted screenplay, Academy Awards original screenplay, BAFTA, Golden Globe Award for Best Screenplay, Writers Guild Awards Winners & Nominees 2020-2013 nominations information for 462 movies in total.

Movie characters data consists of:

  • Script text fragments with dialogs and scene descriptions for characters, gathered with annotators:
    • 2153 movies and text fragments for 32114 characters in total
  • Gender labels for 4792 characters
data icon
Movie Scripts Corpus
8
已售 0
657.42MB
申请报告