Reinforcement Learning in the Problem of Synthesis of Majority Schemes

Sergey Gurov; Dmitry Zolotarev; Alexander Samburskiy

doi:10.25559/SITITO.17.202102.295-307

Современные информационные технологии и IT-образование (Jun 2021)

Reinforcement Learning in the Problem of Synthesis of Majority Schemes

Sergey Gurov,
Dmitry Zolotarev,
Alexander Samburskiy

Affiliations

Sergey Gurov: ORCiD; Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Moscow, Russia
Dmitry Zolotarev: ORCiD; Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Moscow, Russia
Alexander Samburskiy: ORCiD; Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Moscow, Russia

DOI: https://doi.org/10.25559/SITITO.17.202102.295-307
Journal volume & issue: Vol. 17, no. 2
pp. 295 – 307

Abstract

Read online

The article presents an approach to the synthesis of combinational-logic circuits using artificial neural networks (ANNs). The presented method is focused on the use of a perspective basis using the majority function (a Boolean function of three arguments that takes the value "true" if at least two of its inputs are true). This choice is based on emerging nanotechnologies, where the majority element is most easily represented. The class of applied ANNs is deep networks with reinforcement. Such networks have been actively studied and applied in recent years. There are examples of their effective use for automatic logic circuits optimization. The original synthesis method proposed in the article with the simplification of circuits that implement the Shannon expansion in all variables of the corresponding function of the logic algebra (FLA). On large schemes, it becomes essential to use some simple but effective techniques for training deep ANN agents with reinforcement. This allows you to distribute calculations into several independent subtasks, each of which is explored and makes performing by agents quicker and easier. Two reinforcement learning algorithms for simplifying schemas are described. They provide a solution to the Exploration-Exploitation conflict, which is the contradiction between exploring the environment to find the optimal episode and using information about the episode considered optimal at the current time. The dependences of the parameters of the synthesized circuits on the number n = 3, ..., 10 of FAL variables and the number of network training episodes are presented.

Published in Современные информационные технологии и IT-образование

ISSN: 2411-1473 (Print)
Publisher: The Fund for Promotion of Internet media, IT education, human development «League Internet Media»
Country of publisher: Russian Federation
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://sitito.cs.msu.ru

About the journal

Abstract

Keywords