Skip to main content

Posts

Showing posts from June, 2019

Document Vector Estimation using Partition Word Vector Averaging

Averaging vs Partition Averaging TLDR;  This post discusses my NLP research work on representing documents by weighted partition word vectors averaging . (15-minute read) tags: document representation, partition word vectors averaging Prologue Let's consider a corpus (C) with N documents with the corresponding most frequent words vocabulary (V). Figure1 represents the word-vectors space of V, where similar meaning words occur closer to each other (a partition of word vocabulary based on word vector similarity).  Few words are polysemic and belong to multiple topics with some proportion. In figure 1 we represent the topic number of the word in subscript and corresponding proportion in braces. Figure 1: Corpus C-word vocabulary partitioned using word2vec fuzzy clustering  Let's consider a text document (d):  "Data journalists deliver data science news to the general public. They often take part in interpreting the data models. Also, they cre