Mining patterns using a trained RNN model
I trained a LSTM model that given a history of N characters it predicts the next character. In other words, this is a character-level text generator.
Since this is a character level model, I wonder if it could be used to define the vocabulary of the learned language. Can I find the words that exist in the corpus, like in pattern mining?
I thought about letting the model to generate words by giving it an initial state and some random input, and continue until it predicts a space or any other terminating character, but I'm looking for a better way since it doesn't perform very well. More over, I don't want to rely on criterions like terminating characters, because I want the word-mining task to be fully unsupervised.