Mining patterns using a trained RNN model  


New Member
Joined: 7 months ago
Posts: 1
05/04/2020  

I trained a LSTM model that given a history of N characters it predicts the next character. In other words, this is a character-level text generator.

Since this is a character level model, I wonder if it could be used to define the vocabulary of the learned language. Can I find the words that exist in the corpus, like in pattern mining?

I thought about letting the model to generate words by giving it an initial state and some random input, and continue until it predicts a space or any other terminating character, but I'm looking for a better way since it doesn't perform very well. More over, I don't want to rely on criterions like terminating characters, because I want the word-mining task to be fully unsupervised.


