数据描述
ClidSum跨语言对话摘要基准数据集
作者:王佳安
数据集介绍
ClidSum基准数据集分为两部分:XSAMSum与XMediaSum,分别是基于SAMSum和MediaSum对话摘要数据集进行额外的标注完成的。ClidSum数据集包含了5万6千余英文对话文档,每个对话文档标注了对应的中文摘要与德语摘要。
数据预览
[{"dialogue":"Hannah: Hey, do you have Betty's number? Amanda: Lemme check Hannah: <file_gif> Amanda: Sorry, can't find it. Amanda: Ask Larry Amanda: He called her last time we were at the park together Hannah: I don't know him well Hannah: <file_gif> Amanda: Don't be shy, he's very nice Hannah: If you say so.. Hannah: I'd rather you texted him Amanda: Just text him Hannah: Urgh.. Alright Hannah: Bye Amanda: Bye bye","summary":"Hannah needs Betty's number but Amanda doesn't have it. She needs to contact Larry.","summary_de":"hannah braucht bettys nummer, aber amanda hat sie nicht. sie muss larry kontaktieren.","summary_zh":"汉娜需要贝蒂的电话号码,但阿曼达没有。她得联系拉里。"},{"dialogue":"Eric: MACHINE! Rob: That's so gr8! Eric: I know! And shows how Americans see Russian ;) Rob: And it's really funny! Eric: I know! I especially like the train part! Rob: Hahaha! No one talks to the machine like that! Eric: Is this his only stand-up? Rob: Idk. I'll check. Eric: Sure. Rob: Turns out no! There are some of his stand-ups on youtube. Eric: Gr8! I'll watch them now! Rob: Me too! Eric: MACHINE! Rob: MACHINE! Eric: TTYL? Rob: Sure :)","summary":"Eric and Rob are going to watch a stand-up on youtube.","summary_de":"eric und rob werden sich ein stand-up auf youtube ansehen.","summary_zh":"埃里克和罗伯要在youtube上看一场单口相声。"}]
验证报告
以下为卖家选择提供的数据验证报告:

ClidSum跨语言对话摘要基准数据集
140.86MB
申请报告