Data Representation in NLP

Sparse Vector Representations

BoW Explaination
Tf-Idf numerical example
  • “it was”
  • “was the”
  • “the best”
  • “best of”
  • “of times”
  • “it was the”
  • “was the best”
  • “the best of ”
  • “best of times”
  • Vocabulary consists of all unique words present in the given data. If data is too large and contains many unique words then it can effect the sparsity in representations.
  • it is very low memory scalable to large datasets as there is no need to store a vocabulary dictionary in memory
  • it is fast to pickle and un-pickle as it holds no state besides the constructor parameters
  • it can be used in a streaming (partial fit) or parallel pipeline as there is no state computed during fit.
  • there is no way to compute the inverse transform (from feature indices to string feature names) which can be a problem when trying to introspect which features are most important to a model.
  • there can be collisions: distinct tokens can be mapped to the same feature index. However in practice this is rarely an issue if n_features is large enough (e.g. 2 ** 18 for text classification problems).
  • no IDF weighting as this would render the transformer stateful.

Dense Vector Representations

Where is Word Embedding used?

  • Compute similar words: Word embedding is used to suggest similar words to the word being subjected to the prediction model. Along with that it also suggests dissimilar words, as well as most common words.
  • Create a group of related words: It is used for semantic grouping which will group things of similar characteristic together and dissimilar far away.
  • Feature for text classification: Text is mapped into arrays of vectors which is fed to the model for training as well as prediction. Text-based classifier models cannot be trained on the string, so this will convert the text into machine trainable form. Further its features of building semantic help in text-based classification.
  • Document clustering is another application where word embedding is widely used
  • Natural language processing: There are many applications where word embedding is useful and wins over feature extraction phases such as parts of speech tagging, sentimental analysis, and syntactic analysis.

Why use this approach:

  • It ignores the order of the word, for example, ‘this is bad ‘= ‘bad is this’.
  • It ignores the context of words. Suppose If I write the sentence “He loved books. Education is best found in books”. It would create two vectors one for “He loved books” and other for “Education is best found in books.” It would treat both of them orthogonal which makes them independent, but in reality, they are related to each other

Word2Vec Representations

What is word2vec?

Shallow Neural Network

What word2vec does?

How Tf-Idf works?

Word2vec Architecture

  1. Continuous Bag of words (CBOW)
  2. skip gram
  1. Being probabilistic is nature, it is supposed to perform superior to deterministic methods(generally).
  2. It is low on memory. It does not need to have huge RAM requirements like that of co-occurrence matrix where it needs to store three huge matrices.
  1. CBOW takes the average of the context of a word (as seen above in calculation of hidden activation). For example, Apple can be both a fruit and a company but CBOW takes an average of both the contexts and places it in between a cluster for fruits and companies.
  2. Training a CBOW from scratch can take forever if not properly optimized.

Implementation via Gensim

Pre-trained Word Embeddings

Custom Word Embeddings




PhD Researcher/Candidate at IIITD

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Machine learning in medicine: an introduction

Custom Open AI gym agent

Rainbow Silhouettes

The art and science of dealing with imbalanced datasets

To Classify or To Regress

Final Implementation with Test-Time Ensembling

Challenges for Training Generative Model from Different Tabular Data Sources

Multi Layer Neural Networks Python Implementation

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Shivangi Singhal

Shivangi Singhal

PhD Researcher/Candidate at IIITD

More from Medium

Sentiment Classification — Comparison of ML and DL algorithms

Stop Words In Natural Language Processing — NLP

Stop Words In Natural Language Processing — NLP

Introduction to Natural Language Processing (NLP)

Dependency parsers in Natural Language Processing (NLP)