asset
Stay organized with collections
Save and categorize content based on your preferences.
ASSET is a dataset for evaluating Sentence Simplification systems with multiple
rewriting transformations, as described in "ASSET: A Dataset for Tuning and
Evaluation of Sentence Simplification Models with Multiple Rewriting
Transformations." The corpus is composed of 2000 validation and 359 test
original sentences that were each simplified 10 times by different annotators.
The corpus also contains human judgments of meaning preservation, fluency and
simplicity for the outputs of several automatic text simplification systems.
@inproceedings{alva-manchego-etal-2020-asset,
title = "{ASSET}: {A} Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations",
author = "Alva-Manchego, Fernando and
Martin, Louis and
Bordes, Antoine and
Scarton, Carolina and
Sagot, Benoit and
Specia, Lucia",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-main.424",
pages = "4668--4679",
}
asset/simplification (default config)
Split |
Examples |
'test' |
359 |
'validation' |
2,000 |
FeaturesDict({
'original': Text(shape=(), dtype=string),
'simplifications': Sequence(Text(shape=(), dtype=string)),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
original |
Text |
|
string |
|
simplifications |
Sequence(Text) |
(None,) |
string |
|
asset/ratings
Split |
Examples |
'full' |
4,500 |
FeaturesDict({
'aspect': ClassLabel(shape=(), dtype=int64, num_classes=3),
'original': Text(shape=(), dtype=string),
'original_sentence_id': int32,
'rating': int32,
'simplification': Text(shape=(), dtype=string),
'worker_id': int32,
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
aspect |
ClassLabel |
|
int64 |
|
original |
Text |
|
string |
|
original_sentence_id |
Tensor |
|
int32 |
|
rating |
Tensor |
|
int32 |
|
simplification |
Text |
|
string |
|
worker_id |
Tensor |
|
int32 |
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-12-06 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2022-12-06 UTC."],[],[],null,["# asset\n\n\u003cbr /\u003e\n\n- **Description**:\n\nASSET is a dataset for evaluating Sentence Simplification systems with multiple\nrewriting transformations, as described in \"ASSET: A Dataset for Tuning and\nEvaluation of Sentence Simplification Models with Multiple Rewriting\nTransformations.\" The corpus is composed of 2000 validation and 359 test\noriginal sentences that were each simplified 10 times by different annotators.\nThe corpus also contains human judgments of meaning preservation, fluency and\nsimplicity for the outputs of several automatic text simplification systems.\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/asset)\n\n- **Homepage** :\n \u003chttps://github.com/facebookresearch/asset\u003e\n\n- **Source code** :\n [`tfds.datasets.asset.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/asset/asset_dataset_builder.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): Initial release.\n- **Download size** : `3.47 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Citation**:\n\n @inproceedings{alva-manchego-etal-2020-asset,\n title = \"{ASSET}: {A} Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations\",\n author = \"Alva-Manchego, Fernando and\n Martin, Louis and\n Bordes, Antoine and\n Scarton, Carolina and\n Sagot, Benoit and\n Specia, Lucia\",\n booktitle = \"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics\",\n month = jul,\n year = \"2020\",\n address = \"Online\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://www.aclweb.org/anthology/2020.acl-main.424\",\n pages = \"4668--4679\",\n }\n\nasset/simplification (default config)\n-------------------------------------\n\n- **Config description**: A set of original sentences aligned with 10 possible\n simplifications for each.\n\n- **Dataset size** : `2.64 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 359 |\n| `'validation'` | 2,000 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'original': Text(shape=(), dtype=string),\n 'simplifications': Sequence(Text(shape=(), dtype=string)),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|-----------------|----------------|---------|--------|-------------|\n| | FeaturesDict | | | |\n| original | Text | | string | |\n| simplifications | Sequence(Text) | (None,) | string | |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nasset/ratings\n-------------\n\n- **Config description**: Human ratings of automatically produced text\n simplification.\n\n- **Dataset size** : `1.44 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------|----------|\n| `'full'` | 4,500 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'aspect': ClassLabel(shape=(), dtype=int64, num_classes=3),\n 'original': Text(shape=(), dtype=string),\n 'original_sentence_id': int32,\n 'rating': int32,\n 'simplification': Text(shape=(), dtype=string),\n 'worker_id': int32,\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|----------------------|--------------|-------|--------|-------------|\n| | FeaturesDict | | | |\n| aspect | ClassLabel | | int64 | |\n| original | Text | | string | |\n| original_sentence_id | Tensor | | int32 | |\n| rating | Tensor | | int32 | |\n| simplification | Text | | string | |\n| worker_id | Tensor | | int32 | |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples..."]]