
卖家暂未授权典枢平台对该文件进行数据验证,您可以向卖家
数据描述
About Dataset
Context
While inspecting the great Wikipedia Movies Plots dataset by JustinR ( https://www.kaggle.com/jrobischon/wikipedia-movie-plots ), I figured that having plots being summarized would be of great use, since nowadays state-of-the-art NLP models have limitations regarding the number of tokens on input.
I wrote a medium article based on this dataset.
Content
Everything is the same as in https://www.kaggle.com/jrobischon/wikipedia-movie-plots, I simply added a new column with the summary of each and every plot with 128 tokens at maximum, using DistilBART-CNN-12-6 model( https://huggingface.co/sshleifer/distilbart-cnn-12-6 ) for summarization. Code here.
Acknowledgements
Please, go upvote https://www.kaggle.com/jrobischon/wikipedia-movie-plots dataset, since this is 100% based on that.
