Wikipedia

Text simplification

Text simplification is an operation used in natural language processing to modify, enhance, classify or otherwise process an existing corpus of human-readable text in such a way that the grammar and structure of the prose is greatly simplified, while the underlying meaning and information remains the same. Text simplification is an important area of research, because natural human languages ordinarily contain large vocabularies and complex compound constructions that are not easily processed through automation. In terms of reducing language diversity, semantic compression can be employed to limit and simplify a set of words used in given texts.

Example

Text Simplification is illustrated with an example from Siddharthan (2006).[1] The first sentence contains two relative clauses and one conjoined verb phrase. A text simplification system aims to simplify the first sentence to the second sentence.

  • Also contributing to the firmness in copper, the analyst noted, was a report by Chicago purchasing agents, which precedes the full purchasing agents report that is due out today and gives an indication of what the full report might hold.
  • Also contributing to the firmness in copper, the analyst noted, was a report by Chicago purchasing agents. The Chicago report precedes the full purchasing agents report. The Chicago report gives an indication of what the full report might hold. The full report is due out today.

One approach to text simplification is lexical simplification via lexical substitution, a two-step process consisting of identifying complex words and replacing them with simpler synonyms. A key challenge here is identifying complex words, which is performed by a machine learning classifier trained on labelled data. An improvement over classical methods of applying binary labels to words as simple or complex is to ask labellers to sort words in order of complexity; this results in higher consistency of resultant labels.[2]

See also

References

  1. ^ Siddharthan, Advaith (28 March 2006). "Syntactic Simplification and Text Cohesion". Research on Language and Computation. 4 (1): 77–109. doi:10.1007/s11168-006-9011-1. S2CID 14619244.
  2. ^ Gooding, Sian; Kochmar, Ekaterina; Sarkar, Advait; Blackwell, Alan (August 2019). "Comparative judgments are more consistent than binary classification for labelling word complexity". Proceedings of the 13th Linguistic Annotation Workshop: 208–214. doi:10.18653/v1/W19-4024. Retrieved 22 November 2019.
  • Wei Xu, Chris Callison-Burch and Courtney Napoles. "Problems in Current Text Simplification Research". In Transactions of the Association for Computational Linguistics (TACL), Volume 3, 2015, Pages 283–297.
  • Advaith Siddharthan. "Syntactic Simplification and Text Cohesion". In Research on Language and Computation, Volume 4, Issue 1, Jun 2006, Pages 77–109, Springer Science, the Netherlands.
  • Siddhartha Jonnalagadda, Luis Tari, Joerg Hakenberg, Chitta Baral and Graciela Gonzalez. Towards Effective Sentence Simplification for Automatic Processing of Biomedical Text. In Proc. of the NAACL-HLT 2009, Boulder, USA, June. [1]

External links

This article is copied from an article on Wikipedia® - the free encyclopedia created and edited by its online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of Wikipedia® encyclopedia articles provide accurate and timely information, please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.

Copyright © 2003-2025 Farlex, Inc Disclaimer
All content on this website, including dictionary, thesaurus, literature, geography, and other reference data is for informational purposes only. This information should not be considered complete, up to date, and is not intended to be used in place of a visit, consultation, or advice of a legal, medical, or any other professional.