Feature Selection for Text Classification Using Mutual Information

dc.authoridsel, ilhami/0000-0003-0222-7017
dc.authoridKarci, Ali/0000-0002-8489-8617
dc.authoridHanbay, Davut/0000-0003-2271-7865
dc.authorwosidsel, ilhami/ABD-7350-2020
dc.authorwosidKarci, Ali/AAG-5337-2019
dc.authorwosidHanbay, Davut/AAG-8511-2019
dc.contributor.authorSel, Ilhami
dc.contributor.authorKarci, Ali
dc.contributor.authorHanbay, Davut
dc.date.accessioned2024-08-04T20:58:51Z
dc.date.available2024-08-04T20:58:51Z
dc.date.issued2019
dc.departmentİnönü Üniversitesien_US
dc.descriptionInternational Conference on Artificial Intelligence and Data Processing (IDAP) -- SEP 21-22, 2019 -- Inonu Univ, Malatya, TURKEYen_US
dc.description.abstractThe feature selection can be defined as the selection of the best subset to represent the data set, that is, the removal of unnecessary data that does not affect the result. The efficiency and accuracy of the system can be increased by decreasing the size and the feature selection in classification applications. In this study, text classification was applied by using 20 news group data published by Reuters news agency. The pre-processed news data were converted into vectors using the Doc2Vec method and a data set was created. This data set is classified by the Maximum Entropy Classification method. Afterwards, a subset of data sets was created by using the Mutual Information Method for the feature selection. Reclassification was performed with the resulting data set and the results were compared according to the performance rates. While the success of the system with 600 features was (0.9285) before the feature selection, (0.9285), then, the performance rates of the 200, 100, 50, 20 models were obtained as (0.9454, 0.9426, 0.9407, 0.9123), respectively. When the results were examined, the success of the 50-featured model was higher than the 600-featured model initially created.en_US
dc.description.sponsorshipIEEE Turkey Sect,Anatolian Sci,Inonu Univ, Comp Sci Dept,Inonu Univ, Muhendisli Fakultesien_US
dc.identifier.doi10.1109/idap.2019.8875927
dc.identifier.urihttps://doi.org/10.1109/idap.2019.8875927
dc.identifier.urihttps://hdl.handle.net/11616/103211
dc.identifier.wosWOS:000591781100056en_US
dc.identifier.wosqualityN/Aen_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.language.isotren_US
dc.publisherIeeeen_US
dc.relation.ispartof2019 International Conference on Artificial Intelligence and Data Processing (Idap 2019)en_US
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectNatural Language Processingen_US
dc.subjectDoc2Vecen_US
dc.subjectMutual Informationen_US
dc.subjectMaximum Entropyen_US
dc.titleFeature Selection for Text Classification Using Mutual Informationen_US
dc.typeConference Objecten_US

Dosyalar