In our daily life, we always want to know whether or not they are similar things. One of the typical example is Face ID. Apple launched a face recognition system for unlocking your iPhone X. You have to take several photos as a golden images. When you want to unlock your iPhone, iPhone compute current photo matches the pre-defined photos or not.
After reading this article, you will understand:
- Reason of computing the sentence similarity
- Manhattan LSTM
- Manhattan LSTM Variant
Reason of computing the sentence similarity
Besides image domain, can we apply the similarity checking in NLP domain? As a forum owner such as Stack Overflow, you do not want to have lots of duplicating questions as it harms the user experience. When searching something from search engine, you except result includes something similar but not just exact what you type.
From my project experience, I leveraged this approach on comparing customer name. As input is vague due to some reasons, model have to find the most similar customer name for application.
Muelle et al. proposed Manhattan LSTM architecture for learning sentence similarity in 2016. The general goal of Manhattan LSTM is to compare two sentences which can decide they are same or not. This neural network architecture includes two same neural network. Two inputs go through identical neural network (shared weights).
First of all, converting both sentences to vector representations (i.e. embeddings) and then passing it to the neural network. Two vector representations will go to two sub-neural network (shared weight). Unlike other language modelling RNN architectures , it does not predict next word but computing the similarity between 2 sentences.
During the experiment, Muelle et al. use:
- Vector: word2vec
- Word Vector Dimension: 300
- Loss function: mean-squared-error (MSE)
- Optimizer: Adadelta
- Number of LSTM unit: 50
Manhattan LSTM Variant
The concept is that you can build any simple or complex neural network as long as it accepts two input .From my experience, you can try any more complex Manhattan LSTM neural network. I also include additional word feature and other RNN architecture such as GRU or Attention Mechanism.
- Preparing large amount of labelled data is important
- Total computing time may be long. For my case, I have to compare all customer name (> 5M) when prediction. Therefore, I have to use other ways to reduce number of record such that it can serve online prediction requirements.
Keras Implementation: https://github.com/likejazz/Siamese-LSTM
Muelle J., Thyagarajan A.. “ Siamese Recurrent Architectures for Learning Sentence Similarity”. 2016. http://www.mit.edu/~jonasm/info/MuellerThyagarajan_AAAI16.pdf
Koch G., Zemel R., Salakhutdinov R.. “Siamese Neural Networks for One-shot Image Recognition”. 2015. http://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf