öğe
Koleksiyonlar ile düzeninizi koruyun
İçeriği tercihlerinize göre kaydedin ve kategorilere ayırın.
ASSET, "ASSET: A Dataset for Tuning and Evaluation of Cümle Sadeleştirme Modelleri ile Çoklu Yeniden Yazma Dönüşümleri" bölümünde açıklandığı gibi, Cümle Sadeleştirme sistemlerini çoklu yeniden yazma dönüşümleriyle değerlendirmek için bir veri kümesidir. Derlem, her biri farklı annotators tarafından 10 kez basitleştirilmiş 2000 doğrulama ve 359 test orijinal cümlesinden oluşmaktadır. Derlem aynı zamanda çeşitli otomatik metin basitleştirme sistemlerinin çıktıları için anlam koruma, akıcılık ve basitlik gibi insan yargılarını da içerir.
@inproceedings{alva-manchego-etal-2020-asset,
title = "{ASSET}: {A} Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations",
author = "Alva-Manchego, Fernando and
Martin, Louis and
Bordes, Antoine and
Scarton, Carolina and
Sagot, Benoit and
Specia, Lucia",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-main.424",
pages = "4668--4679",
}
varlık/basitleştirme (varsayılan yapılandırma)
Bölmek | örnekler |
---|
'test' | 359 |
'validation' | 2.000 |
FeaturesDict({
'original': Text(shape=(), dtype=string),
'simplifications': Sequence(Text(shape=(), dtype=string)),
})
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|
| ÖzelliklerDict | | | |
orijinal | Metin | | sicim | |
basitleştirmeler | Sıra(Metin) | (Hiçbiri,) | sicim | |
varlık/derecelendirmeler
Bölmek | örnekler |
---|
'full' | 4.500 |
FeaturesDict({
'aspect': ClassLabel(shape=(), dtype=int64, num_classes=3),
'original': Text(shape=(), dtype=string),
'original_sentence_id': int32,
'rating': int32,
'simplification': Text(shape=(), dtype=string),
'worker_id': int32,
})
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|
| ÖzelliklerDict | | | |
Görünüş | SınıfEtiketi | | int64 | |
orijinal | Metin | | sicim | |
orijinal_sentence_id | tensör | | int32 | |
değerlendirme | tensör | | int32 | |
basitleştirme | Metin | | sicim | |
işçi_kimliği | tensör | | int32 | |
Aksi belirtilmediği sürece bu sayfanın içeriği Creative Commons Atıf 4.0 Lisansı altında ve kod örnekleri Apache 2.0 Lisansı altında lisanslanmıştır. Ayrıntılı bilgi için Google Developers Site Politikaları'na göz atın. Java, Oracle ve/veya satış ortaklarının tescilli ticari markasıdır.
Son güncelleme tarihi: 2022-12-06 UTC.
[[["Anlaması kolay","easyToUnderstand","thumb-up"],["Sorunumu çözdü","solvedMyProblem","thumb-up"],["Diğer","otherUp","thumb-up"]],[["İhtiyacım olan bilgiler yok","missingTheInformationINeed","thumb-down"],["Çok karmaşık / çok fazla adım var","tooComplicatedTooManySteps","thumb-down"],["Güncel değil","outOfDate","thumb-down"],["Çeviri sorunu","translationIssue","thumb-down"],["Örnek veya kod sorunu","samplesCodeIssue","thumb-down"],["Diğer","otherDown","thumb-down"]],["Son güncelleme tarihi: 2022-12-06 UTC."],[],[],null,["# asset\n\n\u003cbr /\u003e\n\n- **Description**:\n\nASSET is a dataset for evaluating Sentence Simplification systems with multiple\nrewriting transformations, as described in \"ASSET: A Dataset for Tuning and\nEvaluation of Sentence Simplification Models with Multiple Rewriting\nTransformations.\" The corpus is composed of 2000 validation and 359 test\noriginal sentences that were each simplified 10 times by different annotators.\nThe corpus also contains human judgments of meaning preservation, fluency and\nsimplicity for the outputs of several automatic text simplification systems.\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/asset)\n\n- **Homepage** :\n \u003chttps://github.com/facebookresearch/asset\u003e\n\n- **Source code** :\n [`tfds.datasets.asset.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/asset/asset_dataset_builder.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): Initial release.\n- **Download size** : `3.47 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Citation**:\n\n @inproceedings{alva-manchego-etal-2020-asset,\n title = \"{ASSET}: {A} Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations\",\n author = \"Alva-Manchego, Fernando and\n Martin, Louis and\n Bordes, Antoine and\n Scarton, Carolina and\n Sagot, Benoit and\n Specia, Lucia\",\n booktitle = \"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics\",\n month = jul,\n year = \"2020\",\n address = \"Online\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://www.aclweb.org/anthology/2020.acl-main.424\",\n pages = \"4668--4679\",\n }\n\nasset/simplification (default config)\n-------------------------------------\n\n- **Config description**: A set of original sentences aligned with 10 possible\n simplifications for each.\n\n- **Dataset size** : `2.64 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 359 |\n| `'validation'` | 2,000 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'original': Text(shape=(), dtype=string),\n 'simplifications': Sequence(Text(shape=(), dtype=string)),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|-----------------|----------------|---------|--------|-------------|\n| | FeaturesDict | | | |\n| original | Text | | string | |\n| simplifications | Sequence(Text) | (None,) | string | |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nasset/ratings\n-------------\n\n- **Config description**: Human ratings of automatically produced text\n simplification.\n\n- **Dataset size** : `1.44 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------|----------|\n| `'full'` | 4,500 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'aspect': ClassLabel(shape=(), dtype=int64, num_classes=3),\n 'original': Text(shape=(), dtype=string),\n 'original_sentence_id': int32,\n 'rating': int32,\n 'simplification': Text(shape=(), dtype=string),\n 'worker_id': int32,\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|----------------------|--------------|-------|--------|-------------|\n| | FeaturesDict | | | |\n| aspect | ClassLabel | | int64 | |\n| original | Text | | string | |\n| original_sentence_id | Tensor | | int32 | |\n| rating | Tensor | | int32 | |\n| simplification | Text | | string | |\n| worker_id | Tensor | | int32 | |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples..."]]