Improving the accuracy using pre-trained word embeddings on deep neural networks for Turkish text classification

dc.authoridKarci, Ali/0000-0002-8489-8617
dc.authoridAYDOGAN, Murat/0000-0002-6876-6454
dc.authorwosidKarci, Ali/AAG-5337-2019
dc.authorwosidAYDOGAN, Murat/ABF-7251-2020
dc.contributor.authorAydogan, Murat
dc.contributor.authorKarci, Ali
dc.date.accessioned2024-08-04T20:46:53Z
dc.date.available2024-08-04T20:46:53Z
dc.date.issued2020
dc.departmentİnönü Üniversitesien_US
dc.description.abstractToday, extreme amounts of data are produced, and this is commonly referred to as Big Data. A significant amount of big data is composed of textual data, and as such, text processing has correspondingly increased in its importance. This is especially valid to the development of word embedding and other groundbreaking advancements in this field. However, When studies on text processing and word embedding are examined, it can be seen that while there have been many world language-oriented studies, especially for the English language, there has been an insufficient level of study undertaken specific to the Turkish language. As a result, Turkish was chosen as the target language for the current study. Two Turkish datasets were created for this study. Word vectors were trained using the Word2Vec method on an unlabeled large corpus of approximately 11 billion words. Using these word vectors, text classification was applied with deep neural networks on a second dataset of 1.5 million examples and 10 classes. The current study employed the Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and the Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) methods - other types of this architecture - and their variations as deep neural network architectures. The performances of the embedding methods for the words used in this study, their effects on the rate of accuracy, and the success of the deep neural network architectures were then analyzed in detail. When studying the experimental results, it was determined that the GRU and LSTM methods were more successful compared to the other deep neural network models used in this study. The results showed that the pre-trained word vectors' (PWVs) accuracy on deep neural networks improved at rates of approximately 5% and 7%. The datasets and word vectors of the current study will be shared in order to contribute to the Turkish language literature in this field. (C) 2019 Elsevier B.V. All rights reserved.en_US
dc.identifier.doi10.1016/j.physa.2019.123288
dc.identifier.issn0378-4371
dc.identifier.issn1873-2119
dc.identifier.scopus2-s2.0-85074525713en_US
dc.identifier.scopusqualityQ2en_US
dc.identifier.urihttps://doi.org/10.1016/j.physa.2019.123288
dc.identifier.urihttps://hdl.handle.net/11616/99013
dc.identifier.volume541en_US
dc.identifier.wosWOS:000514758600038en_US
dc.identifier.wosqualityQ2en_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.language.isoenen_US
dc.publisherElsevieren_US
dc.relation.ispartofPhysica A-Statistical Mechanics and Its Applicationsen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectDeep learningen_US
dc.subjectWord embeddingen_US
dc.subjectTurkish text classificationen_US
dc.subjectText processingen_US
dc.titleImproving the accuracy using pre-trained word embeddings on deep neural networks for Turkish text classificationen_US
dc.typeArticleen_US

Dosyalar