Text Matching with Deep Learning

In our daily life, we always want to know whether or not they are similar things. One of the typical example is Face ID. Apple launched a face recognition system for unlocking your iPhone X. You have to take several photos as a golden images. When you want to unlock your iPhone, iPhone compute current photo matches the pre-defined photos or not.

In previous blog, I shared to use word existence measurement and WMD to compute the difference between two sentences. Unlike previous methods, we apply neural network to tackle same problem.

After reading this article, you will understand:

  • Reason of computing the sentence similarity
  • Manhattan LSTM
  • Manhattan LSTM Variant

Reason of computing the sentence similarity

“black clothes hanged in rack” by The Creative Exchange on Unsplash

Besides image domain, can we apply the similarity checking in NLP domain? As a forum owner such as Stack Overflow, you do not want to have lots of duplicating questions as it harms the user experience. When searching something from search engine, you except result includes something similar but not just exact what you type.

From my project experience, I leveraged this approach on comparing customer name. As input is vague due to some reasons, model have to find the most similar customer name for application.

Manhattan LSTM

“New York street during daytime” by Aaron Sebastian on Unsplash

Muelle et al. proposed Manhattan LSTM architecture for learning sentence similarity in 2016. The general goal of Manhattan LSTM is to compare two sentences which can decide they are same or not. This neural network architecture includes two same neural network. Two inputs go through identical neural network (shared weights).

First of all, converting both sentences to vector representations (i.e. embeddings) and then passing it to the neural network. Two vector representations will go to two sub-neural network (shared weight). Unlike other language modelling RNN architectures , it does not predict next word but computing the similarity between 2 sentences.

During the experiment, Muelle et al. use:

  • Vector: word2vec
  • Word Vector Dimension: 300
  • Loss function: mean-squared-error (MSE)
  • Optimizer: Adadelta
  • Number of LSTM unit: 50

Manhattan LSTM Variant

The concept is that you can build any simple or complex neural network as long as it accepts two input .From my experience, you can try any more complex Manhattan LSTM neural network. I also include additional word feature and other RNN architecture such as GRU or Attention Mechanism.

Take Away

  • Preparing large amount of labelled data is important
  • Total computing time may be long. For my case, I have to compare all customer name (> 5M) when prediction. Therefore, I have to use other ways to reduce number of record such that it can serve online prediction requirements.

About Me

I am Data Scientist in Bay Area. Focusing on state-of-the-art in Data Science, Artificial Intelligence , especially in NLP and platform related. You can reach me from Medium Blog, LinkedIn or Github.


Keras Implementation: https://github.com/likejazz/Siamese-LSTM

Muelle J., Thyagarajan A.. “ Siamese Recurrent Architectures for Learning Sentence Similarity”. 2016. http://www.mit.edu/~jonasm/info/MuellerThyagarajan_AAAI16.pdf

Koch G., Zemel R., Salakhutdinov R.. “Siamese Neural Networks for One-shot Image Recognition”. 2015. http://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf

read original article at https://towardsdatascience.com/text-matching-with-deep-learning-e6aa05333399?source=rss——artificial_intelligence-5