Models, bugs & how can we discover them?
Sometimes, in the process of working with ML models, funny things may happen, ranging from an odd output to amusing errors in models' performance. While collecting such things, a certain picture starts forming and we begin to observe the interesting patterns and understand how can we not only laugh at these cases but also learn and take advantage of them.
In this tutorial, we are going to tell you how can you quickly detect bugs in models' performance and fight them.
1. Playing "may haves"
The most proper way of detecting bugs in a model's performance is to go through the results, which are the furthest from annotation or have the lowest quality assessment. You should study these mistakes until you start to "guess" all the types of errors that a model makes on the next prediction whether it is an incorrectly recognized text box, inability to recognize more than n objects on the image or generated phrase resembling word salad.
The job is done, now you realize the data types where the model makes more mistakes, so you know how to make it better!
2. In the image and likeness
Another good way to detect bugs is to form the list of criteria answering the question "What is the right model?". This way we can discard all the “obvious" knowledge about the problem statement and, therefore, about the model. For example, if you do text embeddings it is quite clear that similar text will be close in metric, accordingly, the repeated embedding of the same text shouldn't be any different from the original. This setting implies the fulfilment of the condition where the unique texts collection contains exactly one complete match on the embeddings of these texts and never matches with the rest. In practice, it turns out that in the dataset under different text identifiers, identical data may be present, which we managed to catch, abandoning the “obvious” considerations.
Voila, yet another bug has caught our hook!
Which insights do you use for bugs detection? Clap your hands if the tutorial was useful for you and share your own life hacks in the comments.