An open, extendible, and fast Turkish morphological analyzer

Yıldız, Olcay Taner; Avar, Begüm; Ercan, Gökhan

An open, extendible, and fast Turkish morphological analyzer

dc.authorid	0000-0001-5838-4615
dc.authorid	0000-0003-2843-2334
dc.authorid	0000-0002-2782-8217
dc.contributor.author	Yıldız, Olcay Taner	en_US
dc.contributor.author	Avar, Begüm	en_US
dc.contributor.author	Ercan, Gökhan	en_US
dc.date.accessioned	2020-04-14T06:59:53Z
dc.date.available	2020-04-14T06:59:53Z
dc.date.issued	2019-09
dc.department	Işık Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü	en_US
dc.department	Işık University, Faculty of Engineering, Department of Computer Engineering	en_US
dc.description.abstract	In this paper, we present a two-level morphological analyzer for Turkish which consists of five main components: finite state transducer, rule engine for suffixation, lexicon, trie data structure, and LRU cache. We use Java language to implement finite state machine logic and rule engine, Xml language to describe the finite state transducer rules of the Turkish language, which makes the morphological analyzer both easily extendible and easily applicable to other languages. Empowered with a comprehensive lexicon of 54,000 bare-forms including 19,000 proper nouns, our morphological analyzer is amongst the most reliable analyzers produced so far. The analyzer is compared with Turkish morphological analyzers in the literature. By using LRU cache and a trie data structure, the system can analyze 100,000 words per second, which enables users to analyze huge corpora in a few hours.	en_US
dc.description.version	Publisher's Version	en_US
dc.identifier.citation	Yıldız, O. T., Avar, B. & Ercan, G. (2019). An open, extendible, and fast Turkish morphological analyzer. Paper presented at the International Conference Recent Advances in Natural Language Processing, RANLP, 1364-1372. doi:10.26615/978-954-452-056-4_156	en_US
dc.identifier.doi	10.26615/978-954-452-056-4_156
dc.identifier.endpage	1372
dc.identifier.isbn	9789544520557
dc.identifier.issn	1313-8502
dc.identifier.scopus	2-s2.0-85076499372
dc.identifier.scopusquality	N/A
dc.identifier.startpage	1364
dc.identifier.uri	https://hdl.handle.net/11729/2300
dc.identifier.uri	http://dx.doi.org/10.26615/978-954-452-056-4_156
dc.identifier.volume	2019
dc.indekslendigikaynak	Scopus	en_US
dc.institutionauthor	Yıldız, Olcay Taner	en_US
dc.institutionauthor	Ercan, Gökhan	en_US
dc.institutionauthorid	0000-0001-5838-4615
dc.institutionauthorid	0000-0002-2782-8217
dc.language.iso	en	en_US
dc.peerreviewed	Yes	en_US
dc.publicationstatus	Published	en_US
dc.publisher	Incoma Ltd	en_US
dc.relation.ispartof	International Conference Recent Advances in Natural Language Processing, RANLP	en_US
dc.relation.publicationcategory	Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Computational linguistics	en_US
dc.subject	Data structures	en_US
dc.subject	Deep learning	en_US
dc.subject	Engines	en_US
dc.subject	Finite state transducers	en_US
dc.subject	Java language	en_US
dc.subject	Morphological analyzer	en_US
dc.subject	Natural language processing systems	en_US
dc.subject	Proper nouns	en_US
dc.subject	Rule engine	en_US
dc.subject	Semantics	en_US
dc.subject	Speech recognition	en_US
dc.subject	Text processing	en_US
dc.subject	Transducers	en_US
dc.subject	Trie data structures	en_US
dc.subject	Turkish language	en_US
dc.subject	XML languages	en_US
dc.title	An open, extendible, and fast Turkish morphological analyzer	en_US
dc.type	Conference Object	en_US
dspace.entity.type	Publication

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1

İsim:: 2300.pdf
Boyut:: 272.14 KB
Biçim:: Adobe Portable Document Format
Açıklama:: Publisher's Version

İndir

Lisans paketi

Listeleniyor 1 - 1 / 1

İsim:: license.txt
Boyut:: 1.44 KB
Biçim:: Item-specific license agreed upon to submission
Açıklama:

İndir

Koleksiyon

Bildiri Koleksiyonu | Bilgisayar Mühendisliği Bölümü
Scopus İndeksli Yayınlar Koleksiyonu