Trainless model performance estimation based on random weights initialisations for neural architecture search

Ekaterina Gracheva

Array (Dec 2021)

Trainless model performance estimation based on random weights initialisations for neural architecture search

Ekaterina Gracheva

Affiliations

Ekaterina Gracheva: Correspondence to: University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577, Japan.; International Center for Materials Nanoarchitectonics, National Institute for Materials Science, 1-1 Namiki, Tsukuba, Ibaraki, 305-0044, Japan; University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577, Japan

Journal volume & issue: Vol. 12
p. 100082

Abstract

Read online

Neural architecture search has become an indispensable part of the deep learning field. Modern methods allow to find one of the best performing architectures, or to build one from scratch, but they typically make decisions based on the trained accuracy information. In the present article we explore instead how the architectural component of a neural network affects its prediction power. We focus on relationships between the trained accuracy of an architecture and its accuracy prior to training, by considering statistics over multiple initialisations. We observe that minimising the coefficient of variation of the untrained accuracy, C V U , consistently leads to better performing architectures. We test the C V U as a neural architecture search scoring metric using the NAS-Bench-201 database of trained neural architectures. The architectures with the lowest C V U value have on average an accuracy of 91 . 90 ± 2 . 27 , 64 . 08 ± 5 . 63 and 38 . 76 ± 6 . 62 for CIFAR-10, CIFAR-100 and a downscaled version of ImageNet, respectively. Since these values are statistically above the random baseline, we make a conclusion that a good architecture should be stable against weights initialisations. It takes about 190 s for CIFAR-10 and CIFAR-100 and 133.9 s for ImageNet16-120 to process 100 architectures, on a batch of 256 images, with 100 initialisations.

Published in Array

ISSN: 2590-0056 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.journals.elsevier.com/array

About the journal

Abstract

Keywords