An open, extendible, and fast Turkish morphological analyzer
dc.authorid | 0000-0001-5838-4615 | |
dc.authorid | 0000-0003-2843-2334 | |
dc.authorid | 0000-0002-2782-8217 | |
dc.contributor.author | Yıldız, Olcay Taner | en_US |
dc.contributor.author | Avar, Begüm | en_US |
dc.contributor.author | Ercan, Gökhan | en_US |
dc.date.accessioned | 2020-04-14T06:59:53Z | |
dc.date.available | 2020-04-14T06:59:53Z | |
dc.date.issued | 2019-09 | |
dc.department | Işık Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü | en_US |
dc.department | Işık University, Faculty of Engineering, Department of Computer Engineering | en_US |
dc.description.abstract | In this paper, we present a two-level morphological analyzer for Turkish which consists of five main components: finite state transducer, rule engine for suffixation, lexicon, trie data structure, and LRU cache. We use Java language to implement finite state machine logic and rule engine, Xml language to describe the finite state transducer rules of the Turkish language, which makes the morphological analyzer both easily extendible and easily applicable to other languages. Empowered with a comprehensive lexicon of 54,000 bare-forms including 19,000 proper nouns, our morphological analyzer is amongst the most reliable analyzers produced so far. The analyzer is compared with Turkish morphological analyzers in the literature. By using LRU cache and a trie data structure, the system can analyze 100,000 words per second, which enables users to analyze huge corpora in a few hours. | en_US |
dc.description.version | Publisher's Version | en_US |
dc.identifier.citation | Yıldız, O. T., Avar, B. & Ercan, G. (2019). An open, extendible, and fast Turkish morphological analyzer. Paper presented at the International Conference Recent Advances in Natural Language Processing, RANLP, 1364-1372. doi:10.26615/978-954-452-056-4_156 | en_US |
dc.identifier.doi | 10.26615/978-954-452-056-4_156 | |
dc.identifier.endpage | 1372 | |
dc.identifier.isbn | 9789544520557 | |
dc.identifier.issn | 1313-8502 | |
dc.identifier.scopus | 2-s2.0-85076499372 | |
dc.identifier.scopusquality | N/A | |
dc.identifier.startpage | 1364 | |
dc.identifier.uri | https://hdl.handle.net/11729/2300 | |
dc.identifier.uri | http://dx.doi.org/10.26615/978-954-452-056-4_156 | |
dc.identifier.volume | 2019 | |
dc.indekslendigikaynak | Scopus | en_US |
dc.institutionauthor | Yıldız, Olcay Taner | en_US |
dc.institutionauthor | Ercan, Gökhan | en_US |
dc.institutionauthorid | 0000-0001-5838-4615 | |
dc.institutionauthorid | 0000-0002-2782-8217 | |
dc.language.iso | en | en_US |
dc.peerreviewed | Yes | en_US |
dc.publicationstatus | Published | en_US |
dc.publisher | Incoma Ltd | en_US |
dc.relation.ispartof | International Conference Recent Advances in Natural Language Processing, RANLP | en_US |
dc.relation.publicationcategory | Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı | en_US |
dc.rights | info:eu-repo/semantics/openAccess | en_US |
dc.subject | Computational linguistics | en_US |
dc.subject | Data structures | en_US |
dc.subject | Deep learning | en_US |
dc.subject | Engines | en_US |
dc.subject | Finite state transducers | en_US |
dc.subject | Java language | en_US |
dc.subject | Morphological analyzer | en_US |
dc.subject | Natural language processing systems | en_US |
dc.subject | Proper nouns | en_US |
dc.subject | Rule engine | en_US |
dc.subject | Semantics | en_US |
dc.subject | Speech recognition | en_US |
dc.subject | Text processing | en_US |
dc.subject | Transducers | en_US |
dc.subject | Trie data structures | en_US |
dc.subject | Turkish language | en_US |
dc.subject | XML languages | en_US |
dc.title | An open, extendible, and fast Turkish morphological analyzer | en_US |
dc.type | Conference Object | en_US |
dspace.entity.type | Publication |