«Theoretical & Applied Science»

ISJ Theoretical & Applied Science

Information about the scientific journal

Submit an article to the journal

Requirements to the article

Journal archive

Tracing of postal items

Editorial Board

www.T-Science.org p-ISSN 2308-4944 (print) e-ISSN 2409-0085 (online) SOI: 1.1/TAS DOI: 10.15863/TAS Journal Archive
ISJ Theoretical & Applied Science 04(84) 2020 Philadelphia, USA
* Scientific Article * Impact Factor 6.630
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191
Kozhevnikov, V. A., & Pankratova, E. S. Research of text pre-processing methods for preparing data in Russian for machine learning.

Full Article: PDF Scientific Object Identifier: http://s-o-i.org/1.1/TAS-04-84-55 DOI: https://dx.doi.org/10.15863/TAS.2020.04.84.55 Language: English Citation: Kozhevnikov, V. A., & Pankratova, E. S. (2020). Research of text pre-processing methods for preparing data in Russian for machine learning. ISJ Theoretical & Applied Science, 04 (84), 313-320. Soi: http://s-o-i.org/1.1/TAS-04-84-55 Doi: https://dx.doi.org/10.15863/TAS.2020.04.84.55
Pages: 313-320 Published: 30.04.2020 Abstract: The article includes information about pre-processing methods for preparing text data in Russian language for machine learning. The article covers such techniques as tokenization, normalization, named entity recognition, stemming, lemmatization and removing of stop words. Also, this article shows some approaches using morphological analyzers and libraries for NLP tasks. Key words: pymorphy2, gensim, mystem, spacy, stemming, lemmatization, ner, deeppavlov

E-mail: T-Science@mail.ru

© «Theoretical &Applied Science» 2013 г.