Engineering Reports (Mar 2024)
An interpretable ensemble method for deep representation learning
Abstract
Abstract In representation learning domain, the mainstream methods for model ensemble include “implicit” ensemble approaches, such as using techniques like dropout, and “explicit” ensemble methods, such as voting or weighted averaging based on multiple model outputs. Compared to implicit ensemble techniques, explicit ensemble methods allow for more flexibility in combining models with different structures to obtain different perspectives on representations. However, the representations obtained from different models do not guarantee a linear relationship, and simply linearly combining multiple model outputs may result in a degraded performance. Meanwhile, employing non‐linear fusion mechanisms such as distillation and meta‐learning can be uninterpretable and time‐consuming. To this end, we propose the hypothesis of linear fusion based on the output representations of deep learning models, and design a interpretable linear fusion method based on this hypothesis. This method applies a transform layer to map the output representations of different models to the same classification center. Experimental results demonstrate that compared to directly averaging the representations, our method achieves better performance. Additionally, our method retains the convenience of direct averaging while offering improved performance in terms of time and computational efficiency compared to non‐linear fusion. Furthermore, we test the applicability of our method in both computer vision and natural language processing representation tasks using supervised and semi‐supervised approaches.
Keywords