Yazar "Aydogan, Murat" seçeneğine göre listele
Listeleniyor 1 - 3 / 3
Sayfa Başına Sonuç
Sıralama seçenekleri
Öğe Improving the accuracy using pre-trained word embeddings on deep neural networks for Turkish text classification(Elsevier, 2020) Aydogan, Murat; Karci, AliToday, extreme amounts of data are produced, and this is commonly referred to as Big Data. A significant amount of big data is composed of textual data, and as such, text processing has correspondingly increased in its importance. This is especially valid to the development of word embedding and other groundbreaking advancements in this field. However, When studies on text processing and word embedding are examined, it can be seen that while there have been many world language-oriented studies, especially for the English language, there has been an insufficient level of study undertaken specific to the Turkish language. As a result, Turkish was chosen as the target language for the current study. Two Turkish datasets were created for this study. Word vectors were trained using the Word2Vec method on an unlabeled large corpus of approximately 11 billion words. Using these word vectors, text classification was applied with deep neural networks on a second dataset of 1.5 million examples and 10 classes. The current study employed the Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and the Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) methods - other types of this architecture - and their variations as deep neural network architectures. The performances of the embedding methods for the words used in this study, their effects on the rate of accuracy, and the success of the deep neural network architectures were then analyzed in detail. When studying the experimental results, it was determined that the GRU and LSTM methods were more successful compared to the other deep neural network models used in this study. The results showed that the pre-trained word vectors' (PWVs) accuracy on deep neural networks improved at rates of approximately 5% and 7%. The datasets and word vectors of the current study will be shared in order to contribute to the Turkish language literature in this field. (C) 2019 Elsevier B.V. All rights reserved.Öğe Spam Mail Detection using Naive Bayes method with Apache Spark(Ieee, 2018) Aydogan, Murat; Karci, AliSignificant progress has been made in internet technologies with great progress in information infrastructure and in parallel, the amount of data produced has reached incredible dimensions. Nowadays, storage and processing of this data is the most important big data problem. In recent years new technologies have been developed in this study area. The Apache Spark project is considered one of the most important of these Technologies. In this study, a classification application was devoloped on Apache Spark using the Naive Bayes method which machine learning libraries of Apache Spark A data set including of mails labeled as Spam and Not Spam was analyzed using Apache Spark and a classification application with a high accuracy ratio was performed. The performance of Apache Spark is quite different compared to other platforms that are most used in data analysis.Öğe Turkish Text Classification with Machine Learning and Transfer Learning(Ieee, 2019) Aydogan, Murat; Karci, AliThe problem of text classification is one of the most fundamental topics of study in the field of natural language processing, but when reviewing the literature, it is seen that there is an inadequate number of studies for the issue of Turkish text classification. Two different Turkish datasets were created for this aim. Word vectors were created on the first dataset of unlabeled texts. These word vectors were transferred to the second dataset created with data collected from various news sites by transfer learning. Text classification was applied with the machine learning algorithms on this dataset. The effects of transfer learning and transferring of word vectors on the accuracy rate and the performance of machine learning methods were analyzed in detail. When studying the experimental results, it was determined that Support Vector Machine model was performed more successful and It was seen that the accuracy rate was improved with transfer learning.











