Relative regularization coefficients: ARTM & TopicNet using
By reading this tutorial you will know how to use TopicNet and ARTM libraries for building topic models with regularizers.
Why do we need regularizers?
Building the topic model for a collection of documents helps us to recognize the topics of the documents and determine the words describing these topics. If we want the model to have certain properties, regularizers are the best solution.
What about the relative coefficients of regularization?
The first step of building a regularized topic model is to select the hyperparameter — regularization coefficient τ showing how strong the regularizer affects the model. The optimization of this coefficient is a challenging task as its value depends on various features including the dataset size. Now just imagine: firstly, you were selecting τ for a collection, then you collected a bigger dataset so you have to select τ again. That doesn’t sound like a productive activity, does it? Here come to rescue the relative regularization coefficients.
How do we use them?
The relative regularization coefficient shows how many times the regularizer affects the model more strongly than the collection of documents itself. The relative coefficients λ transform to absolute τ by this formula:
If we designate parameter λ while announcing regularizer, τ will show the relative degree of regularization, not an absolute one.
ARTM using
artm.DecorrelatorPhiRegularizer(name=’DecorrelatorPhi’, gamma=0, tau=2, …)
Set the decorrelation regularizer which will affect the model twice as strong as the text (with equal strength for all topics).
TopicNet using
rel_toolbox_lite.handle_regularizer
This method takes your regularizer, model and some other things and transforms the regularization coefficient from absolute to relative by the above formula.
Congratulations, now you are a happy possessor of secret knowledge — relative regularization coefficients in topic models!