ISJ Theoretical & Applied Science

 

 

Information about the scientific journal

Submit an article to the journal

Requirements to the article

Section

Indexing

Journal archive

Tracing of postal items

Cooperation

Editorial Board

 

 

www.T-Science.org       p-ISSN 2308-4944 (print)       e-ISSN 2409-0085 (online)
SOI: 1.1/TAS         DOI: 10.15863/TAS

Journal Archive

ISJ Theoretical & Applied Science 04(84) 2020

Philadelphia, USA

* Scientific Article * Impact Factor 6.630


Kozhevnikov, V. A., & Pankratova, E. S.

Research of text pre-processing methods for preparing data in Russian for machine learning.

Full Article: PDF

Scientific Object Identifier: http://s-o-i.org/1.1/TAS-04-84-55

DOI: https://dx.doi.org/10.15863/TAS.2020.04.84.55

Language: English

Citation: Kozhevnikov, V. A., & Pankratova, E. S. (2020). Research of text pre-processing methods for preparing data in Russian for machine learning. ISJ Theoretical & Applied Science, 04 (84), 313-320. Soi: http://s-o-i.org/1.1/TAS-04-84-55 Doi: https://dx.doi.org/10.15863/TAS.2020.04.84.55

Pages: 313-320

Published: 30.04.2020

Abstract: The article includes information about pre-processing methods for preparing text data in Russian language for machine learning. The article covers such techniques as tokenization, normalization, named entity recognition, stemming, lemmatization and removing of stop words. Also, this article shows some approaches using morphological analyzers and libraries for NLP tasks.

Key words: pymorphy2, gensim, mystem, spacy, stemming, lemmatization, ner, deeppavlov


 

 

 

 

 

 

E-mail:         T-Science@mail.ru

© «Theoretical &Applied Science»                      2013 г.