DE19610849C1

DE19610849C1 - Iterative determination of optimised network architecture of neural network by computer

Info

Publication number: DE19610849C1
Application number: DE19610849A
Authority: DE
Inventors: Markus Dipl Ing Hoehfeld; Elvis Galic
Original assignee: Siemens AG
Current assignee: Siemens AG
Priority date: 1996-03-19
Filing date: 1996-03-19
Publication date: 1997-10-16
Anticipated expiration: 2016-03-20

Abstract

The method involves performing iterations by computer. In each iteration step, neural nets of different architectures are generated using a probability vector and an evolutionary method in which each vector component probability is described to form neurons and/or neuron links. For each neural net, a weighting vector is optimised using a training data set and a fitness function determined for the optimised weighting vector. The optimised network architecture of a neural net results from the neural net with a minimal fitness function value if the value lies below a defined level. A further probability vector is determined for generating neural nets in a further iteration step taking into account the value of the fitness function, if the value is not below the defined level.

Description

Die Erfindung betrifft ein Verfahren zur Bestimmung eines Neuronalen Netzes, welches eine möglichst optimale Komplexi tät bei einer möglichst guten Generalisierung des Neuronalen Netzes bezüglich neuer Eingabedatensätze hat.The invention relates to a method for determining a Neural network, which has the best possible complexi with the best possible generalization of the neural Network with regard to new input data records.

Für ein Neuronales Netz mit vorgegebener Architektur existie ren verschiedene Verfahren zur Optimierung der Gewichte des Neuronalen Netzes anhand eines Trainingsdatensatzes. Es ist jedoch kein automatischer Ansatz zur Ermittlung einer opti mierten Architektur eines Neuronalen Netzes für einen spezi fischen Anwendungsfall, also für die jeweiligen Trainingsda ten bekannt. Unter Architektur des Neuronalen Netzes wird im folgenden die Kombination von Neuronen und Verbindungen zwi schen den einzelnen Neuronen des Neuronalen Netzes verstan den.For a neural network with a given architecture exist ren various methods to optimize the weights of the Neural network based on a training data set. It is however, no automatic approach for determining an opti architecture of a neural network for a speci fishing use case, i.e. for the respective training ten known. Under architecture of the neural network is in following the combination of neurons and connections between understand the individual neurons of the neural network the.

Zur Optimierung der Netzarchitektur ist es notwendig, einen hochdimensionalen diskreten Raum von Variablen zu berücksich tigen. Bei der Optimierung wird ein möglichst optimaler Kom promiß zwischen den folgenden Extremfällen gesucht. Ist das Neuronale Netz zu klein, ist es nicht möglich, eine komplexe Struktur der Trainingsdatensätze nachzubilden. Ist das Neuro nale Netz zu groß, so ist eine Tendenz zu beobachten, daß durch das "überdimensionierte" Neuronale Netz "Rauschen" der Trainingsdatensätze nachgebildet wird. Diese Situation wird als Überlernen bezeichnet. Dadurch wird eine Reduktion in der Generalisierungseigenschaft des Neuronalen Netzes verursacht.To optimize the network architecture, it is necessary to have one to consider high-dimensional discrete space of variables term. In the optimization, the optimal possible com promiss sought between the following extreme cases. Is this Neural network too small, it is not possible to create a complex one To reproduce the structure of the training data records. Is that neuro nale network too large, a tendency can be observed that through the "oversized" neural network "noise" of the Training data records is simulated. This situation will referred to as over-learning. This will result in a reduction in the Generalization property of the neural network.

Es sind Verfahren zum sogenannten Pruning von Gewichten eines Neuronalen Netzes bekannt, beispielsweise aus dem Dokument [1]. There are methods for so-called pruning weights of a Neural network known, for example from the document [1].

Ein erheblicher Nachteil der Pruning-Verfahren ist darin zu sehen, daß das Pruning selbst nicht vollautomatisch durchge führt wird, sondern immer nur interaktiv zusammen mit einem Benutzer. Der Benutzer benötigt ein sehr spezielles Fachwis sen über das Pruning. Somit sind Verfahren, die auf dem Pruning beruhen, nicht für Laien durchführbar.A major disadvantage of the pruning process is there too see that the pruning itself is not fully automated leads, but always only interactively with one User. The user needs very special expertise about pruning. Thus, procedures based on the Pruning based, not feasible for laypeople.

Weiterhin ist ein Verfahren zur iterativen Ermittlung einer optimierten Netzarchitektur eines Neuronalen Netzes bekannt, bei dem die Netzarchitektur mit einem evolutionären Verfahren optimiert wird und für jedes Individuum eine Optimierung der Gewichte mittels eines Gradientenabstiegsverfahren durchge führt wird [2].Furthermore, a method for iteratively determining a optimized network architecture of a neural network, where the network architecture uses an evolutionary process is optimized and an optimization of the for each individual Weights are weighted using a gradient descent method leads [2].

Bei diesem Verfahren werden spezielle Mutationsoperatoren verwendet, was zu einem komplizierten Verfahren und somit zu einem erhöhten Rechenzeitbedarf bei der Durchführung des Ver fahrens durch einen Rechner führt.This procedure uses special mutation operators used, resulting in a complicated process and therefore too an increased computing time when performing Ver driving through a computer.

Ferner ist ein Nachteil des Verfahrens darin zu sehen, daß dieses Verfahren lediglich zur Klassifikation von Datensätzen nach der Trainingsphase verwendet wird und geeignet ist. Für eine feine Approximation einer durch die Trainingsdatensätze gegebenen Abbildungsfunktion ist dieses Verfahren nicht ge eignet.Another disadvantage of the method is that this procedure only for the classification of data sets is used and suitable after the training phase. For a fine approximation of one through the training data sets Given the mapping function, this method is not ge is suitable.

Auch ist eine Kontinuität der Entwicklung neuer Individuen und somit eine stetige Verbesserung der Individuen nicht ge währleistet.There is also a continuity in the development of new individuals and therefore a constant improvement of the individuals is not ge ensures.

Ferner ist ein sogenanntes Populationsbasiertes Inkrementel les Lernverfahren (Population-Based Incremental Learning, PBIL) bekannt [3]. There is also a so-called population-based increment The learning process (population-based incremental learning, PBIL) is known [3].

Es sind verschiedene Verfahren zur Optimierung von Gewichten zu einer gegebenen Netzarchitektur bekannt, beispielsweise aus [4] und [5].There are several methods for optimizing weights known to a given network architecture, for example from [4] and [5].

Aus Zhang, Byoung-Tak; Mühlenbein, Heinz, Evolving Optimal Neural Networks Using Genetic Algorithms with Occam′s Razor, In: Complex Systems 7, S. 199-220, 1993 ist es bekannt, neuronale Netze zur Klassifikation von Eingangsmustern unter Verwendung genetischer Algorithmen zu optimieren, wobei in der Fitnessfunktion der Trainingsfehler der Trainingsphase der neuronalen Netze mit dem Trainingsdatensatz berücksich tigt wird.From Zhang, Byoung-Tak; Mühlenbein, Heinz, Evolving Optimal Neural Networks Using Genetic Algorithms with Occam’s Razor, In: Complex Systems 7, pp. 199-220, 1993 it is known neural networks for the classification of input patterns under Optimize the use of genetic algorithms, whereby in the fitness function the training errors of the training phase of the neural networks with the training data record is done.

Weitere Verfahren zur Generierung neuronaler Netze unter Ver wendung genetischer Algorithmen sind aus Shamir, Nachum; Saad, David; Marom, Emanuel, Using the Functional Behaviour of Neu rons for Genetic Recombination in Neural Nets Training, In: Complex Systems 7, S. 445-467, 1993 und US 51 40 530 be kannt.Other methods for generating neural networks under Ver genetic algorithms are from Shamir, Nachum; Saad, David; Marom, Emanuel, Using the Functional Behavior of Neu rons for Genetic Recombination in Neural Nets Training, In: Complex Systems 7, pp. 445-467, 1993 and US 51 40 530 knows.

Diese Verfahren weisen erhebliche Nachteile auf, insbesondere bezüglich der Komplexität der Verfahren, die diese Verfahren lediglich für relativ kleine Beispiele tauglich macht. Für komplexere Anwendungsbeispiele führen diese Verfahren zu ei nem sehr erheblichen Rechenbedarf, der die Durchführung der Verfahren erheblich kostenintensiv werden läßt. Ferner sind die Verfahren lediglich zur Klassifikation von Eingangssigna len und nicht zur Approximation nichtlinearer Funktionen ge eignet, da eine gute Verallgemeinerungsfähigkeit der ermit telten neuronalen Netze nicht gegeben ist.These methods have significant disadvantages, in particular regarding the complexity of the procedures that these procedures only suitable for relatively small examples. For These processes lead to more complex application examples nem very considerable computing need, which the implementation of the Processes can be considerably cost-intensive. Furthermore are the procedures only for the classification of input signals len and not to approximate nonlinear functions is suitable because the generalization ability of the mitit there are no neural networks.

Somit liegt der Erfindung das Problem zugrunde, ein Verfahren zur Ermittlung einer optimierten Netzarchitektur eines neuro nalen Netzes anzugeben, welches schnell, einfach realisierbar und für die Approximation nichtlinearer Funktionen geeignet ist. The invention is therefore based on the problem of a method to determine an optimized network architecture of a neuro nal network to specify which is quick, easy to implement and suitable for the approximation of nonlinear functions is.

Dieses Problem wird durch das Verfahren gemäß Patentanspruch 1 gelöst.This problem is solved by the method according to claim 1 solved.

Bei dem erfindungsgemäßen Verfahren wird iterativ mittels ei nes evolutionären Verfahrens eine optimierte Netzarchitektur eines Neuronalen Netzes ermittelt. Dabei werden in jedem Ite rationsschritt die einzelnen Individuen der Neuronalen Netze generiert, wobei die einzelnen Neuronen und/oder Verbindungen der Neuronalen Netze mit einer Wahrscheinlichkeit generiert werden oder auch nicht generiert werden, die in einem Wahr scheinlichkeitsvektor angegeben werden. Die Netzarchitektur ist in einem Architekturvektor codiert, welcher nur binäre Werte aufweist. Für jedes Neuronale Netz, welches durch den spezifischen Architekturvektor charakterisiert wird, werden die Gewichte des Neuronalen Netzes mit bekannten Verfahren optimiert. Anschließend wird ein Wert einer Fitnessfunktion ermittelt, durch den die Eignung der Netzarchitektur des Neu ronalen Netzes für den Anwendungsfall beschrieben wird. In der Fitnessfunktion wird zumindest ein mit einem Validierungsdatensatz ermittelter Validierungsfehler be rücksichtigt. Als Ausgangsarchitektur für die Generierung der nächsten Generation mit Individuen, deren Neuronen und/oder Verbindungen mit einer Wahrscheinlichkeit generiert werden, die in jeweils einem weiteren Wahrscheinlichkeitsvektor ange geben wird, wird jeweils die Information über den geeignet sten Vorgänger der Neuronalen Netze verwendet.In the method according to the invention, iteratively using egg an optimized network architecture of a neural network. Every Ite ration step the individual individuals of the neural networks generated, the individual neurons and / or connections of the neural networks with a probability will or may not be generated in a true probability vector can be specified. The network architecture is encoded in an architecture vector which is only binary Has values. For each neural network that is created by the specific architecture vector is characterized the weights of the neural network using known methods optimized. Then a value of a fitness function determined by the suitability of the network architecture of the new ronal network for the application is described. In the fitness function will at least one determined with a validation data set Validation error be considered. As the starting architecture for the generation of the next generation with individuals, their neurons and / or Connections are generated with a probability each in a further probability vector is given, the information about the most predecessors of neural networks.

Ein erheblicher Vorteil des erfindungsgemäßen Verfahrens ist in seiner Einfachheit und somit in dem geringen Rechenzeitbe darf bei der Durchführung des Verfahrens mit einem Rechner zu sehen. Die Einfachheit wird vor allem durch die binäre Codie rung der jeweiligen Netzarchitektur begründet.A significant advantage of the method according to the invention is in its simplicity and thus in the low computing time allowed to perform the procedure with a computer see. The simplicity is mainly due to the binary code justification of the respective network architecture.

Ein weiterer Vorteil liegt darin, daß das Verfahren vollauto matisch ohne Interaktion mit einem Benutzer mit speziellem Fachwissen durchgeführt werden kann. Another advantage is that the process is fully auto matically without interaction with a user with special Expertise can be carried out.

Ferner wird es durch das erfindungsgemäße Verfahren möglich, eine durch die Trainingsdatensätze gegebene Abbildungsfunkti on sehr fein zu approximieren.Furthermore, the method according to the invention makes it possible a mapping function given by the training data sets approximate very finely.

Vorteilhafte Weiterbildungen des erfindungsgemäßen Verfahrens ergeben sich aus den abhängigen Ansprüchen.Advantageous further developments of the method according to the invention result from the dependent claims.

Es ist vorteilhaft, jeweils den weiteren Wahrscheinlichkeits vektor durch eine gleitende Mittelwertbildung über eine be liebige Zahl vorangegangener Wahrscheinlichkeitsvektoren zu bilden, womit eine feine Kontinuität in der Entwicklung der Individuen der einzelnen Generationen gewährleistet wird.It is advantageous to consider the further probability vector by a moving averaging over a be any number of previous probability vectors form, with which a fine continuity in the development of the Individuals of each generation is guaranteed.

Ferner ist es vorteilhaft, für die Wahrscheinlichkeitsvekto ren eine sogenannte Mutation durchzuführen, wodurch eine Fi xierung der Werte der Komponenten des Wahrscheinlichkeitsvek tors bei dem Wert 0 oder dem Wert 1 vermieden wird.It is also advantageous for the probability vector ren to carry out a so-called mutation, whereby a fi Fixing the values of the components of the probability vector tors with the value 0 or the value 1 is avoided.

Auch ist es vorteilhaft, daß die Werte des Gewichtvektors zu Beginn einer Optimierung einer neuen Generation Neuronaler Netze sich ergeben aus einer gleitenden Mittelwertbildung über eine beliebige Zahl von Gewichtvektoren vorangegangener Generationen Neuronaler Netze. Dabei ist es vorteilhaft, als jeweiligen Ausgangspunkt die Werte des Gewichtsvektors des jeweils besten Neuronalen Netzes der vorangegangenen Genera tion zu verwenden. Damit wird das Verfahren beschleunigt, da die Optimierung der Gewichtsvektoren verkürzt wird.It is also advantageous that the values of the weight vector increase Start of optimization of a new generation of neurons Networks result from a moving averaging over any number of weight vectors Generations of Neural Networks. It is advantageous as respective starting point the values of the weight vector of the best neural network of the previous genera tion to use. This speeds up the process because the optimization of the weight vectors is shortened.

Es ist ferner vorteilhaft, in der Fitnessfunktion zumindest die Komplexität der Netzarchitektur und/oder den Resttrai ningsfehler zu berücksichtigen, da diese Optimierungsparame ter eine sehr feine und sehr gute Approximation ermöglichen.It is also advantageous, at least in the fitness function the complexity of the network architecture and / or the rest of the trail errors due to this optimization parameter enable a very fine and very good approximation.

Die Verläßlichkeit wird weiterhin verbessert, wenn der Trai ningsdatensatz und der Validierungsdatensatz disjunkte Mengen von Daten sind. Reliability is further improved when the trai nings data set and the validation data set disjoint sets of data are.

Durch Verwendung eines Quasi-Newton Verfahrens (QNM) wird die Optimierung des jeweiligen Gewichtsvektors erheblich verbes sert und beschleunigt.By using a quasi-Newton method (QNM) the Optimization of the respective weight vector significantly verbes accelerates and accelerates.

Werden der Trainingsdatensatz und der Validierungsdatensatz in jedem Iterationsschritt mittels Kreuzvalidierung neu er mittelt, wird eine Anpassung an das "Rauschen" der Abbil dungsfunktion, welche durch die Datensätze bei endlich vielen Daten gegeben ist, verringert.Become the training record and the validation record in each iteration step using cross-validation averaged, an adaptation to the "noise" of the fig function, which is supported by the data records for a finite number Data given is reduced.

Ein Ausführungsbeispiel des erfindungsgemäßen Verfahrens ist in den Figuren dargestellt und wird im weiteren näher erläu tert.An embodiment of the method according to the invention is shown in the figures and will be explained in more detail below tert.

Es zeigenShow it

Fig. 1 ein Ablaufdiagramm, in dem die einzelnen Verfah rensschritte des erfindungsgemäßen Verfahrens dargestellt sind; Fig. 1 is a flowchart in which the individual procedural steps of the inventive method are shown;

Fig. 2 eine Skizze, in der ein Rechner, mit dem das Verfahren notwendigerweise durchgeführt wird, dargestellt ist. Fig. 2 is a sketch showing a computer with which the method is necessarily carried out.

Fig. 3 eine Skizze, in der ein Neuronales Netz beispiel haft dargestellt ist. Fig. 3 is a sketch in which a neural network is shown as an example.

In Fig. 1 sind die einzelnen Verfahrensschritte des erfin dungsgemäßen Verfahrens dargestellt.In Fig. 1, the individual process steps of the inventive method are shown.

In dem Verfahren wird ein evolutionäres Verfahren zur Bildung verschiedener Generationen Neuronaler Netze NE jeweils mit unterschiedlicher Netzarchitektur verwendet. Das Verfahren wird iterativ durchgeführt, jeweils für eine Generation Neu ronaler Netze NE. The process becomes an evolutionary process of education different generations of neural networks NE each with different network architecture used. The procedure is carried out iteratively, each for a new generation ronal networks NE.

In jedem Iterationsschritt t werden in einem ersten Schritt 101 die verschiedenen Neuronalen Netze NE einer Generation, die dem Iterationsschritt t entsprechen, von einem Rechner R, mit dem das erfindungsgemäße Verfahren notwendigerweise durchgeführt wird, nach dem jeweiligen evolutionären Verfah ren gebildet.In each iteration step t are in a first step 101 the different neural networks NE of a generation, which correspond to the iteration step t, from a computer R, with which the inventive method necessarily is carried out according to the respective evolutionary procedure ren formed.

Die Netzarchitektur jedes Neuronalen Netzes NE wird durch ei nen Architekturvektor x beschrieben, dessen Komponenten le diglich den Wert 0 oder den Wert 1 aufweisen. Eine Komponente des Architekturvektors x repräsentiert jeweils eindeutig je des Neuron NEU des Neuronalen Netzes NE und/oder jede Verbin dung V_ÿ zwischen den Neuronen NEU, die durch einen ersten Index i und einen zweiten Index j gekennzeichnet werden. Der erste Index i und der zweite Index j sind natürliche Zahlen (vgl. Fig. 3).The network architecture of each neural network NE is described by an architecture vector x , the components of which only have the value 0 or the value 1. A component of the architecture vector x uniquely represents each of the neurons NEU of the neural network NE and / or each connection V _ÿ between the neurons NEU, which are characterized by a first index i and a second index j. The first index i and the second index j are natural numbers (cf. FIG. 3).

Der Wert 0 zeigt an, daß entweder das entsprechende Neuron NEU nicht existiert oder daß die entsprechende Verbindung V_ÿ nicht existiert. Der Wert 1 dagegen zeigt an, daß entweder das entsprechende Neuron NEU existiert oder daß die entspre chende Verbindung V_ÿ existiert.The value 0 indicates that either the corresponding neuron NEU does not exist or that the corresponding connection V _ÿ does not exist. The value 1, on the other hand, indicates that either the corresponding neuron NEU exists or that the corresponding connection V _ÿ exists.

Zur Generierung der einzelnen Neuronalen Netze NE wird ein Wahrscheinlichkeitsvektor p _t verwendet, der die gleiche Mäch tigkeit (Anzahl von Komponenten) aufweist wie der Architek turvektor x. In dem Wahrscheinlichkeitsvektor p _t ist in jeder Komponente jeweils eine Wahrscheinlichkeit angegeben, mit der in jedem Iterationsschritt t bei der Generierung des jeweili gen Neuronalen Netzes NE ein Neuron NEU und/oder eine Verbin dung V_ÿ zwischen den Neuronen NEU generiert wird.To generate the individual neural networks NE, a probability vector p _{t is} used which has the same power (number of components) as the architecture vector x . In the probability vector p _t , a component is given in each component, with which in each iteration step t a neuron NEW and / or a connection V _{ÿ is} generated between the neurons NEW when generating the respective neural network NE.

Für das in Fig. 3 dargestellt einfache Beispiel eines Neuro nalen Netzes NE mit 5 Neuronen NEU1, NEU2, NEU3, NEU4 und NEU5, verteilt auf drei Schichten, einer Eingabeschicht IL, einer versteckten Schicht HL und einer Ausgabeschicht OL er gibt sich für den vereinfachten Fall, daß nur die Verbindun gen V_ÿ in dem Architekturvektor x beschrieben wird:For the simple example shown in FIG. 3 of a neuronal network NE with 5 neurons NEU1, NEU2, NEU3, NEU4 and NEU5, distributed over three layers, an input layer IL, a hidden layer HL and an output layer OL there is for the simplified Case that only the connections V _{ÿ are described} in the architecture vector x :

x = (V₁₁, V₁₂, V₁₃, V₁₄, V₁₅, V₂₃, V₂₄, V₂₅, V₃₄, V₃₅, V₄₅) = = (0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1) (1) x = (V₁₁, V₁₂, V₁₃, V₁₄, V₁₅, V₂₃, V₂₄, V₂₅, V₃₄, V₃₅, V₄₅) = = (0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1) (1)

Ferner wird den einzelnen Verbindungen V_ÿ des Neuronalen Net zes NE jeweils ein Gewicht w_ÿ zugeordnet. Jedes Eingangs signal an einem Anfang einer Verbindung V_ÿ, welcher durch den ersten Index i gekennzeichnet wird, wird mit dem ent sprechenden Gewicht w_ÿ multipliziert und dann dem durch den zweiten Index j gekennzeichneten Neuron NEj der Verbindung V_ÿ zugeführt. Die einzelnen Gewichte w_ÿ werden zu einem Ge wichtsvektor w zusammengefaßt.Furthermore, a weight w _{ÿ is} assigned to the individual connections V _{ÿ of} the neural network NE. Each input signal at the beginning of a connection V _ÿ , which is identified by the first index i, is multiplied by the corresponding weight w _ÿ and then supplied to the connection V _ÿ by the neuron NEj identified by the second index j. The individual weights w _ÿ are combined to form a weight vector w .

Für das einfache Beispiel aus Fig. 3 ergibt sich allgemein folgender Gewichtsvektor w:The following weight vector w generally results for the simple example from FIG. 3:

w = (w₁₁, w₁₂, w₁₃, w₁₄, w₁₅, w₂₃, w₂₄, w₂₅, w₃₄, w₃₅, w₄₅) (2) w = (w₁₁, w₁₂, w₁₃, w₁₄, w₁₅, w₂₃, w₂₄, w₂₅, w₃₄, w₃₅, w₄₅) (2)

Ein die Abbildungsfunktion des jeweiligen Neuronalen Netzes NE charakterisierender Abbildungsvektor w _sp ergibt sich aus dem Architekturvektor x und dem Gewichtsvektor w zu:A mapping vector w _sp characterizing the mapping function of the respective neural network NE results from the architecture vector x and the weight vector w :

w _sp = w ^T · x (3) w _sp = w ^T * x (3)

wobei mit
w ^T der transponierte Gewichtsvektor w bezeichnet wird.being with
w ^{T is} the transposed weight vector w .

Der Abbildungsvektor w _sp ergibt sich für den Beispielsfall zu:
w _sp = (0, 0, w₁₃, w₁₄, 0, w₂₃, w₂₄, 0, 0, w₃₅, w₄₅).The mapping vector w _sp results for the example case:
w _sp = (0, 0, w₁₃, w₁₄, 0, w₂₃, w₂₄, 0, 0, w₃₅, w₄₅).

Nachdem in jedem Iterationsschritt t die verschiedenen Neuro nalen Netze NE einer Generation, die dem Iterationsschritt t entsprechen, von dem Rechner R nach dem jeweiligen evolutio nären Verfahren gebildet wurden (101), werden für jedes Neuro nale Netz NE des Iterationsschritts t folgende Schritte durchgeführt (102):
Anhand eines vorgegebenen Trainingsdatensatzes TDS werden die einzelnen Gewichte w_ÿ des jeweiligen Neuronalen Netzes NE optimiert (103). Diese Trainingsphase kann mit unterschiedlich sten Trainingsverfahren durchgeführt werden, die dem Fachmann bekannt sind. Üblicherweise werden sogenannte Gradientenab stiegsverfahren verwendet zur Optimierung einer Fehlerfunkti on E, die die Abweichung eines jeweiligen Ausgabewerts y_l des Neuronalen Netzes NE zu einem vorgegebenen Sollwert t_l bei Anlegen eines Trainingsdatums z _l angibt. Mit einem Index l wird jeweils ein Trainingsdatum eindeutig gekennzeichnet.After the different neural networks NE of a generation corresponding to the iteration step t have been formed by the computer R in accordance with the respective evolutionary method (101) in each iteration step t, the following steps are carried out for each neuronal network NE of the iteration step t ( 102):
The individual weights w _{ÿ of} the respective neural network NE are optimized on the basis of a predetermined training data set TDS (103). This training phase can be carried out using a wide variety of training methods known to the person skilled in the art. So-called gradient descent methods are usually used to optimize an error function E, which indicates the deviation of a respective output value y _{l of} the neural network NE from a predetermined target value t _l when a training date z _{l is created} . A training date is clearly identified with an index l.

Für den Fall, daß der Trainingsdatensatzes TDS eine Anzahl s von Trainingsdaten z _l mit den Trainingsdaten z _l zugeordneten Sollwerten t_l aufweist, ergibt sich beispielsweise als Feh lerfunktion E:In the event that the training data set TDS has a number s of training data z _l with target values t _l assigned to the training data z _l , the following results, for example, as an error function E:

wobei
d eine beliebige natürliche Zahl ist.in which
d is an arbitrary natural number.

Eine nicht abschließende Übersicht über verschiedene Gradien tenabstiegsverfahren ist in [4] und in [5] zu finden. Das in dem Dokument [5] beschriebene Quasi-Newton-Verfahren (QNM) weist den Vorteil auf, daß das Verfahren sehr schnell und unter Erhalt guter Resultate durchführbar ist. Weitere Ver fahren zur Optimierung der Gewichte w_ÿ des jeweiligen Neuro nalen Netzes NE sind dem Fachmann bekannt und können ohne Einschränkungen in dem erfindungsgemäßen Verfahren eingesetzt werden.A non-exhaustive overview of various gradient descent methods can be found in [4] and in [5]. The Quasi-Newton method (QNM) described in document [5] has the advantage that the method can be carried out very quickly and with good results. Further methods for optimizing the weights w _{ÿ of} the respective neural network NE are known to the person skilled in the art and can be used without restrictions in the method according to the invention.

Nach Beendigung der Trainingsphase (103) wird aus einem Rest fehler, der sich ergibt aus dem Wert der Fehlerfunktion E am Ende der Trainingsphase und aus einem Validierungsfehler E_V, ein Wert einer Fitnessfunktion F bestimmt (104).After the end of the training phase (103), a value of a fitness function F is determined from a residual error that results from the value of the error function E at the end of the training phase and from a validation error E _V (104).

Der Validierungsfehler E_V wird anhand eines Validierungsda tensatzes VDS bestimmt. Für den Fall, daß der Validierungsda tensatzes VDS eine Anzahl r Validierungsdaten aufweist, er gibt sich der Validierungsfehler E_V zu:The validation error E _V is determined using a validation data record VDS. In the event that the validation data set VDS has a number of r validation data, it admits the validation error E _V to:

wobei mit
k jeweils ein Validierungsdatum eindeutig gekennzeichnet wird,
y_k ein Ausgabewert des Neuronalen Netzes NE bei Anlegen des Validierungsdatums bezeichnet wird,
t_k ein Soll-Wert des Neuronalen Netzes NE bei Anlegen des Va lidierungsdatums bezeichnet wird.being with
k each validation date is clearly identified,
y _{k is} an output value of the neural network NE when the validation date is created,
t _{k is} a target value of the neural network NE when the validation date is created.

Die Fitnessfunktion F ergibt sich für unterschiedliche Vari anten des erfindungsgemäßen Verfahrens auf unterschiedliche Weise. In die Fitnessfunktion F können der Restfehler und der Validierungsfehler E_V einfließen. Für den Spezialfall, daß nur der Validierungsfehler E_V in die Fehlerfunktion F ein fließt, ergibt sich für die Fitnessfunktion F:The fitness function F results for different variants of the method according to the invention in different ways. The residual error and the validation error E _{V can} flow into the fitness function F. For the special case that only the validation error E _{V flows} into the error function F, the result for the fitness function F is:

F = E_V (6).F = E _V (6).

Es ist jedoch in einer Variante des erfindungsgemäßen Verfah rens auch vorgesehen, die Komplexität des jeweiligen Neurona len Netzes NE in der Fitnessfunktion F zu berücksichtigen. Für diesen Fall ergibt sich beispielsweise für die Fitness funktion F:However, it is in a variant of the method according to the invention rens also provided the complexity of each neurona len network NE to be taken into account in the fitness function F. In this case there is, for example, fitness function F:

wobei mit
δ ein Einflußfaktor beschrieben wird, der eine beliebige na türliche Zahl ist mit der angegeben wird, in welchem Maß die Komplexität des jeweiligen Neuronalen Netzes NE in der Fit nessfunktion F berücksichtigt werden soll.being with
δ an influencing factor is described, which is an arbitrary natural number with which the extent to which the complexity of the respective neural network NE is to be taken into account in the fitness function F is specified.

Ferner ist in einer Variante des erfindungsgemäßen Verfahrens auch vorgesehen, sowohl den Restfehler und den Validierungs fehler E_V als auch die Komplexität des jeweiligen Neuronalen Netzes NE in der Fitnessfunktion F zu berücksichtigen. Die Fitnessfunktion F ergibt sich dann aus der Gleichung (6) und aus der Gleichung (7) zu:Furthermore, it is also provided in a variant of the method according to the invention to take into account both the residual error and the validation error E _V and the complexity of the respective neural network NE in the fitness function F. The fitness function F then results from equation (6) and from equation (7):

Mit x_ÿ wird jeweils eine durch die Indizes eindeutig gekenn zeichnete Komponente des Architekturvektors x bezeichnet.X _ÿ denotes a component of the architecture vector x which is clearly identified by the indices.

Ist der Wert der Fitnessfunktion F eines Neuronalen Netzes NE kleiner als eine vorgebbare Schranke SCH (106), so ist das ent sprechende Neuronale Netz NE das für den Anwendungsfall Neu ronale Netz NE mit optimaler Netzarchitektur, und das Verfah ren ist beendet (107). Is the value of the fitness function F of a neural network NE smaller than a predeterminable barrier SCH (106), this is ent speaking neural network NE that for the application New ronal NE network with optimal network architecture, and the process ren has ended (107).

Sind die Werte der Fitnessfunktion F aller Neuronalen Netze NE einer Generation t größer als die vorgebbare Schranke SCH, so ist die optimale Netzarchitektur noch nicht ermittelt, und es muß mindestens eine weitere Iteration t+1 durchgeführt werden.Are the values of the fitness function F of all neural networks NE of a generation t larger than the predefinable barrier SCH, the optimal network architecture has not yet been determined, and at least one further iteration t + 1 must be carried out will.

Für den weiteren Iterationsschritt t+1 wird ein weiterer Wahrscheinlichkeitsvektor p _t+1 bestimmt (105).A further probability vector p _{t + 1 is} determined for the further iteration step t + 1 (105).

Es sind unterschiedliche Varianten zur Bildung des weiteren Wahrscheinlichkeitsvektors P_t+1 aus dem Wahrscheinlichkeits vektor p _t vorgesehen. Beispielsweise können sich die einzel nen Komponenten p _{i, t+1} des weiteren Wahrscheinlichkeitsvek tors p _t+1 durch eine gleitende Mittelwertbildung über die entsprechenden Komponenten p_{i, t+1} mindestens eines vorange gangenen Wahrscheinlichkeitsvektors p _t ergeben. Dies kann vorteilhafterweise durch Anwendung einer beliebigen Funktion eines digitalen Filters erfolgen. Bei der Bestimmung des wei teren Wahrscheinlichkeitsvektors p _t+1 wird vorteilhafterweise jeweils der Wahrscheinlichkeitsvektor p _t verwendet, der dem Neuronalen Netz NE zugeordnet wurde, das den besten Wert der Fitnessfunktion F aufweist.Different variants are provided for forming the further probability vector P _{t + 1} from the probability vector p _t . For example, the individual components p _{i, t + 1 of} the further probability vector p _{t + 1 can} result from a moving averaging over the corresponding components p _{i, t + 1 of} at least one previous probability vector p _t . This can advantageously be done by using any function of a digital filter. When determining the further probability vector p _{t + 1} , the probability vector p _t that was assigned to the neural network NE that has the best value of the fitness function F is advantageously used in each case.

Es hat sich als sehr einfach und somit als vorteilhaft her ausgestellt, daß die einzelnen Komponenten p_{i, t+1} des weite ren Wahrscheinlichkeitsvektors p _t+1 gebildet werden nach der Vorschrift:It has proven to be very simple and therefore advantageous that the individual components p _{i, t + 1 of} the further probability vector p _{t + 1 are} formed according to the rule:

p_{i, t+1} = (1 - α)p_{i, t} + αx_{best, t} (9)p _{i, t + 1} = (1 - α) p _{i, t} + αx _{best, t} (9)

wobei mit
p_{i, t+1} jeweils die Komponente des weiteren Wahrscheinlich keitsvektors p _t+1 zum Iterationsschritt t+1 bezeichnet wird, t der Iterationsindex bezeichnet wird,
i ein Komponentenindex bezeichnet wird, mit dem jede Kompo nente des Wahrscheinlichkeitsvektors p _t oder des weiteren Wahrscheinlichkeitsvektors p _t+1 eindeutig gekennzeichnet wird,
α eine Abklingkonstante bezeichnet wird,
p_{i, t} jeweils eine Komponente des Wahrscheinlichkeitsvektors bezeichnet wird,
x_{i, best, t} eine Komponente des Architekturvektors des jeweils besten Neuronalen Netzes NE eines Iterationsschritts t ge kennzeichnet wird.being with
p _{i, t + 1} each denotes the component of the further probability vector p _{t + 1} for the iteration step _{t + 1} , t denotes the iteration index,
i denotes a component index with which each component of the probability vector p _t or the further probability vector p _{t + 1 is} uniquely identified,
α is a decay constant,
p _{i, t} each denotes a component of the probability vector,
x _{i, best, t} a component of the architecture vector of the best neural network NE of an iteration step t is identified.

Die Abklingkonstante α ist eine beliebige Zahl.The decay constant α is an arbitrary number.

Die einzelnen Verbindungen V_ÿ und/oder die einzelnen Neuro nen NEU werden in dem nächsten Iterationsschritt jeweils mit der Wahrscheinlichkeit generiert, die für die jeweilige Ver bindung V_ÿ und/oder das jeweilige Neuronen NEU in dem weite ren Wahrscheinlichkeitsvektor p _t+1 angegeben ist.The individual connections V _ÿ and / or the individual neurons NEW are each generated in the next iteration step with the probability that is specified for the respective connection V _ÿ and / or the respective neuron NEW in the further probability vector p _{t + 1} .

In einer Weiterbildung des erfindungsgemäßen Verfahrens ist es vorgesehen, bei der Bildung der Komponenten p_{i, t+1} des weiteren Wahrscheinlichkeitsvektors p _t+1 einen sogenannten Mutationsschritt zur Veränderung der Komponenten p_{i, t+1} durchzuführen.In a development of the method according to the invention, it is provided to carry out a so-called mutation step to change the components p _{i, t + 1} when forming the components p _{i, t + 1 of} the further probability vector p _{t + 1} .

Die Mutation dient beispielsweise dazu, eine Fixierung der Werte der Komponenten p_{i, t+1} auf den Wert 0 oder den Wert 1 zu vermeiden. Es hat sich folgende Vorschrift zur Mutation der Komponenten p_{i, t+1} als vorteilhaft herausgestellt:The mutation serves, for example, to avoid fixing the values of the components p _{i, t + 1} to the value 0 or the value 1. The following rule for mutating components p _{i, t + 1} has proven to be advantageous:

pm + 1_{i, t} = (1 - β)pm_{i, t} + βb (10)pm + 1 _{i, t} = (1 - β) pm _{i, t} + βb (10)

wobei mit
pm+1_{i, t+1} jeweils eine Komponente p_{i, t+1} des weiteren Wahr scheinlichkeitsvektors p _t+1 zum Iterationsschritt t+1 be zeichnet wird, bei dem eine Mutation der Komponente p_{i, t+1} durchgeführt wurde,
t der Iterationsindex bezeichnet wird,
i der Komponentenindex bezeichnet wird, mit dem jede Kompo nente p_{i, t+1} des Wahrscheinlichkeitsvektors p _t oder des wei teren Wahrscheinlichkeitsvektors p _t+1 eindeutig gekennzeich net wird,
β eine Mutationskonstante bezeichnet wird,
pm_{i, t} jeweils eine Komponente p_{i, t} des Wahrscheinlichkeits vektors p _t zum Iterationsschritt t bezeichnet wird, bei dem eine Mutation der Komponente p_{i, t} durchgeführt wird,
b eine vorgebbare Zahl im Intervall [0, 1] bezeichnet wird.being with
pm + 1 _{i, t + 1} a component p _{i, t + 1 of} the further probability vector p _{t + 1} for the iteration step _{t + 1} , in which a mutation of the component p _{i, t + 1 was} carried out,
t the iteration index is designated,
i denotes the component index with which each component p _{i, t + 1 of} the probability vector p _t or the further probability vector p _{t + 1 is} uniquely identified,
β is a mutation constant,
pm _{i, t} each denotes a component p _{i, t of} the probability vector p _t for the iteration step _t , in which a mutation of the component p _{i, t is} carried out,
b a specifiable number in the interval [0, 1] is designated.

Die Mutationskonstante β ist eine beliebige Zahl.The mutation constant β is an arbitrary number.

Ferner ist es in einer Weiterbildung des Verfahrens vorgese hen, die Komponenten w_{ÿ, t+1} eines weiteren Gewichtsvektors w _t+1 jedes Neuronalen Netzes NE zu Beginn der Optimierung der einzelnen Gewichte w_{ÿ, t+1} durch eine gleitende Mittelwert bildung über die entsprechenden Komponenten w_{ÿ, t} des Ge wichtsvektors w des vorangegangenen bezüglich der Fitness funktion F besten Neuronalen Netzes NE zu bestimmen. Dies kann vorteilhafterweise durch Anwendung einer beliebigen Funktion eines digitalen Filters erfolgen.Furthermore, it is provided in a further development of the method that the components w _{ÿ, t + 1 of} a further weight vector w _{t + 1 of} each neural network NE at the beginning of the optimization of the individual weights w _{ÿ, t + 1} by means of a moving averaging over the to determine corresponding components w _{ÿ, t of} the weight vector w of the preceding neural network NE with respect to the fitness function F. This can advantageously be done by using any function of a digital filter.

Es hat sich als sehr einfach und somit als vorteilhaft her ausgestellt, daß die einzelnen Komponenten w _{ÿ, t+1} des weite ren Wahrscheinlichkeitsvektors wt+1 gebildet werden nach der Vorschrift:It has proven to be very simple and therefore advantageous that the individual components w _{ÿ, t + 1 of} the further probability vector w t + 1 are formed according to the rule:

w_{ÿ, t+1} = (1 - η)w_{ÿ, t} + ηw_{best, t} w _{ÿ, t + 1} = (1 - η) w _{ÿ, t} + ηw _{best, t}

wobei mit
w_{i, t+1} jeweils eine Komponente des Gewichtsvektors w zu Be ginn der Optimierung des Gewichtsvektors w zum Iterations schritt t+1 bezeichnet wird,
t ein Iterationsindex bezeichnet wird,
i, j ein erster Index und ein zweiter bezeichnet werden, mit denen jede Komponente des Gewichtsvektors w eindeutig gekenn zeichnet wird,
η eine Gewichtsabklingkonstante bezeichnet wird,
w_{i, t} jeweils eine Komponente des Gewichtsvektors w zu Beginn der Optimierung des Gewichtsvektors w zum Iterationsschritt t bezeichnet wird,
w_{best, t} eine Komponente des Gewichtsvektors w des jeweils be sten Neuronalen Netzes NE eines Iterationsschritts t gekenn zeichnet wird.being with
w _{i, t + 1} each denotes a component of the weight vector w at the start of the optimization of the weight vector w for the iteration step t + 1,
t an iteration index is designated,
i, j a first index and a second are designated, with which each component of the weight vector w is uniquely identified,
η is a weight decay constant,
w _{i, t} each denotes a component of the weight vector w at the beginning of the optimization of the weight vector w for the iteration step t,
w _{best, t} is a component of the weight vector w of each be most neural network NE one iteration t marked in draws.

Zur weiteren Vereinfachung des erfindungsgemäßen Verfahrens ist es in einer Weiterbildung des Verfahrens vorgesehen, zur Generierung der Neuronalen Netze NE in den einzelnen Iterati onsschritten t das sogenannte Populationsbasierte Inkremen telle Lernverfahren (Population-Based Incremental Learning, PBIL) zu verwenden, das in dem Dokument [3] beschrieben ist.To further simplify the method according to the invention it is provided in a further development of the method for Generation of the neural networks NE in the individual iterations the so-called population-based increment telle learning processes (population-based incremental learning, PBIL) to use, which is described in the document [3].

Ferner ist es in einer Weiterbildung des Verfahrens vorteil haft, wenn der Trainingsdatensatz TDS und der Validierungsda tensatz VDS disjunkt sind. Damit wird das Ergebnis der Trai ningsphase und der Validierung weiter verbessert.It is also advantageous in a further development of the method liable if the training data set TDS and the validation da VDS are disjoint. This will be the result of the trai ning phase and validation further improved.

Eine weitere Verbesserung der Ergebnisse wird erzielt, wenn in jedem Iterationsschritt t der Trainingsdatensatz TDS und der Validierungsdatensatz VDS neu gebildet werden. Für den Fall, daß alle verfügbaren Daten schon in dem Trainingsdaten satz TDS oder in dem Validierungsdatensatz VDS Verwendung finden, ist es vorteilhaft, jeweils eine Kreuzvalidierung der beiden Datensätze durchzuführen, um in jedem Iterations schritt t einen "neuen" Trainingsdatensatz TDS und einen "neuen" Validierungsdatensatz VDS zu erhalten. Damit wird die Anpassung an das "Rauschen" der Daten erheblich verringert.A further improvement in results is achieved if in each iteration step t the training data record TDS and the VDS validation data record is newly formed. For the In case all available data is already in the training data set TDS or in the validation data set VDS use find it is advantageous to cross-validate each two records to perform in each iteration step t a "new" training data set TDS and one To get "new" validation data set VDS. With that the Adaptation to the "noise" of the data significantly reduced.

Die Kreuzvalidierung ist aus dem Dokument [6] bekannt.Cross validation is known from document [6].

In Fig. 2 ist der Rechner R dargestellt, mit dem das Verfah ren notwendigerweise durchgeführt wird. Weiterhin sind der Trainingsdatensatz TDS und der Validierungsdatensatz VDS sym bolisch dargestellt. Das optimierte Neuronale Netz NE wird dem Benutzer beispielsweise auf einem Bildschirm BS darge stellt. In Fig. 2, the computer R is shown, with which the procedural ren is necessarily carried out. Furthermore, the training data set TDS and the validation data set VDS are shown symbolically. The optimized neural network NE is presented to the user, for example on a screen BS.

In diesem Dokument wurden folgende Veröffentlichungen zi tiert:
[1] F. Hergert et al, A Comparison of Weight Elimination Me thods for Reducing Complexity in Neural Networks, Inter national Joint Conference on Neural Networks, Baltimore, S. 1-8, 1992
[2] H. Braun und P. Zagorski, ENZO-M - A Hybrid Approach for Optimizing Neural Networks by Evolution and Learning, Parallel Problem Solving from Nature, Y. Davidor et al (eds.), International Conference on Evolutionary Computa tion, Proceedings of the Third Conference on Parallel Problem Solving from Nature, Jerusalem, Israel, Oktober 9-14, ISBN 3-540-58484-6, Springer Verlag, S. 440-451, 1994
[3] S. Baluja und R. Caruana, Removing the Genetics from the Standard Genetic Algorithm, Proceedings of the Twelfth International Conference on Machine Learning, Lake Tahoe, CA, S. 1-11, Juli 1995
[4] A. Zell, Simulation Neuronaler Netze, Addison-Wesley, 1. Auflage, ISBN 3-89319-554-8, S. 105-178, 1994
[5] R. Fletcher, Practical Methods of Optimization, John Wiley, 2. Auflage, ISBN 0-471-91547-5, S. 49-57, 1987
[6] L. Breiman, Classification and Regression Trees, The Wadsworth statistics/probability series, J. Kimmel (ed.), ISBN 0-534-98053-8, S. 11, 1984The following publications have been cited in this document:
[1] F. Hergert et al, A Comparison of Weight Elimination Methods for Reducing Complexity in Neural Networks, International Joint Conference on Neural Networks, Baltimore, pp. 1-8, 1992
[2] H. Braun and P. Zagorski, ENZO-M - A Hybrid Approach for Optimizing Neural Networks by Evolution and Learning, Parallel Problem Solving from Nature, Y. Davidor et al (eds.), International Conference on Evolutionary Computation, Proceedings of the Third Conference on Parallel Problem Solving from Nature, Jerusalem, Israel, October 9-14, ISBN 3-540-58484-6, Springer Verlag, pp. 440-451, 1994
[3] S. Baluja and R. Caruana, Removing the Genetics from the Standard Genetic Algorithm, Proceedings of the Twelfth International Conference on Machine Learning, Lake Tahoe, CA, pp. 1-11, July 1995
[4] A. Zell, Neural Network Simulation, Addison-Wesley, 1st edition, ISBN 3-89319-554-8, pp. 105-178, 1994
[5] R. Fletcher, Practical Methods of Optimization, John Wiley, 2nd edition, ISBN 0-471-91547-5, pp. 49-57, 1987
[6] L. Breiman, Classification and Regression Trees, The Wadsworth statistics / probability series, J. Kimmel (ed.), ISBN 0-534-98053-8, p. 11, 1984

Claims

1. Method for iteratively determining an optimized network architecture of a neural network (NE) using a computer (R) with the following steps:

a) in each iteration step (t), neural networks (NE) of different network architectures are generated (101) using a probability vector ( p _t ), the generation being carried out by means of an evolutionary method, and each component in the probability vector ( p _t ) ( p _{i, t} ) of the probability vector ( p _t ) each describes the probability of forming neurons (NEW) and / or of forming a connection (V _ÿ ) of neurons (NEW) of a neural network (NE),
b) for each neural network (NE) the following steps are carried out (102):
a weight vector ( w ) of the neural network (NE) is optimized on the basis of a training data set (TDS) (103),
a value of a fitness function (F) is determined (104) for the optimized weight vector, at least one validation error (E _V ) determined with a validation data record (VDS) being taken into account in the fitness function (F),
c) the optimized network architecture of a neural network (NE) results from the neural network (NE) with a minimum value of the fitness function (F) if the value is below a predefinable barrier (SCH) (106, 107), and
d) a further probability vector ( p _{t + 1} ) for generating neural networks in a further iteration step (t + 1) is determined taking into account the values of the fitness function (F) (105) if the value of the fitness function (F) is not is below the predefinable barrier (SCH) (106).

2. The method according to claim 1, in which components (p _{i, t + 1} ) of the further probability vector ( p _{t + 1} ) are obtained by moving averaging over the corresponding components (p _{i, t} ) of at least one previous probability vector ( p _t ) can be determined.

3. The method according to claim 2, in which the moving averaging by any digital filter function is formed.

4. The method of claim 2, wherein the components (p _{i, t + 1} ) are each formed according to the rule: p _{i, t + 1} = (1 - α) p _{i, t} + αx _{best, t} being with
p _{i, t + 1} each denotes a component of the further probability vector for the iteration step t + 1,
t an iteration index is designated,
i is a component index with which each component of the probability vector is uniquely identified,
α is a decay constant,
p _{i, t} each denotes a component of the probability vector,
x _{best, t} a component of the architecture vector of the best neural network of an iteration step is identified.

5. The method according to any one of claims 1 to 4, in which a mutation of the components (p _{i, t + 1} ) is carried out in the formation of the further probability vector ( p _{t + 1} ).

6. The method according to claim 5, wherein the mutation is carried out according to the following rule: pm + 1 _{i, t} = (1 - β) pm _{i, t} + βbw whereby with
pm + 1 _{i, t + 1} each denotes a component of the probability vector for the iteration step t + 1, in which the component was mutated,
t an iteration index is designated,
i is a component index with which each component of the probability vector is uniquely identified,
β is a mutation constant,
pm _{i, t} each denotes a component of the probability vector for iteration step t, in which the component is mutated,
b a specifiable number from the interval [0, 1] is designated.

7. The method according to any one of claims 1 to 6, in which components (w _{ÿ, t + 1} ) of the weight vector ( w _{t + 1} ) at the beginning of the optimization of the weight vector ( w _{t + 1} ) by a moving average over the corresponding components ten (w _{ÿ, t} ) of at least one weight vector (w _t ) of a previous neural network (NE) are determined.

8. The method according to claim 7, in which the moving averaging by any digital filter function is formed.

9. The method according to claim 7, in which the components (w _{ÿ, t + 1} ) of the weight vector ( w _{t + 1} ) at the beginning of the optimization of the weight vector ( w _{t + 1} ) are each formed according to the rule: w _{ÿ, t +1} = (1 - η) w _{ÿ, t} + ηw _{best, t} being with
w _{ÿ, t + 1} each denotes a component of the weight vector at the beginning of the optimization of the weight vector for the iteration step t + 1,
t an iteration index is designated,
i, j are a first index and a second, with which each component of the probability vector is uniquely identified,
η is a weight decay constant,
w _{i, t} each denotes a component of the weight vector at the beginning of the optimization of the weight vector for the iteration step t,
w _{best, t} a component of the weight vector of the _best neural network of an iteration step is identified.

10. The method according to any one of claims 1 to 9, where in the fitness function (F) there is also a residual exercise errors in the respective neural network (NE) is done.

11. The method according to any one of claims 1 to 10, where in the fitness function (F) the complexity of the respective current neural network (NE) is taken into account.

12. The method according to any one of claims 1 to 11, in which the so-called for generating the neural networks (NE) called population-based incremental learning methods (Population-Based Incremental Learning, PBIL) is used.

13. The method according to any one of claims 1 to 12, where the validation data set (VDS) and the training da tset (TDS) are disjoint.

14. The method according to any one of claims 1 to 13, wherein the optimization of the weight vectors ( w ) is carried out with a gradient descent method.

15. The method according to any one of claims 1 to 14, wherein the optimization of the weight vectors (w) with the Qua si Newton method (QNM) is performed.

16. The method according to any one of claims 1 to 15, for the for the validation data set (VDS) and the trai nings data set (TDS) one in each iteration step (t) Cross validation is performed.