WO2017016538A1

WO2017016538A1 - Method for finding ideas

Info

Publication number: WO2017016538A1
Application number: PCT/DE2016/100302
Authority: WO
Inventors: Thomas Hartmann
Original assignee: Harting Ag & Co. Kg
Priority date: 2015-07-24
Filing date: 2016-07-07
Publication date: 2017-02-02
Also published as: DE102015112101A1

Abstract

In order to develop and improve products, in particular in the technical construction domain, a two-stage method for determining relevant patent documents is proposed, wherein natural persons are prompted to find ideas via the combination of the associated patent graphics (23). For the automatic selection of the patent documents, a text classification in a first stage and an image classification in a second stage are applied to a previously compiled basic quantity (1) of patent documents, wherein the associated text- (3) and image classifiers (7) can be trained beforehand. In particular, the text classifier (3) can have a focus on content. The image classifier (7) can have a strong formal character, and relate in particular to the complexity of the patent graphics (23) to be determined.

Description

Method for brainstorming

description

The invention is based on a method for brainstorming according to the preamble of independent claim 1.

Such methods can be used especially in the industrial environment to gain ideas for the development and improvement of products. In particular, the invention relates to the generation of innovations in the field of engineering design.

State of the art

For example, it is known from the document US Pat. No. 5,774,833 A to examine texts and images from patent literature in a computer using semantic methods.

The document WO 2008 / 156507A1 describes a method for automatic patent evaluation by a computer program.

In order to increase competitiveness, internal processes for the development of innovative products continue to be established in many companies.

Thus, document US 2012/0054281 A1 describes a method for improving the performance of collaborative group innovation by team building in a virtual environment.

The document US 5,774,833 A describes a method for the syntactic and semantic analysis of patent texts and drawings. From the publication FINZEN, Jan; KASPER, Harriet; KINTZ, Maximilien: INNOVATION MINING, Effective research on corporate strategically relevant information on the Internet. In: ISBN: 978-3-8396-0139-6, publication year 2010, in particular on pages 47 to 54 the use of patent databases for obtaining information is known.

Document US 2015/0121185 A1 discloses a computer-implemented method for displaying a graphical representation of a classification of a set of patent applications into several categories according to an attribute.

From the document "A methodical way to innovative technologies" (author Spies, K., publishing house of the Augustinus bookshop, Aachen, 1996) it is well-known, designers to present selected pictures from patents of technical products, in order thereby to make it possible for them To combine the technical principles presented there by restructuring and / or transfer to other applications to new ideas.This approach has proved particularly advantageous in mining, mechanical engineering and in the construction industry, ie in areas of engineering design In particular, it discloses that the issued patent drawings not only come from the respective area of responsibility, eg the IPC (International Patent Classification) section / class / subclass / etc., but instead cover the widest possible thematic field.

The publication ESSER, Daniel: Self-Learning Type Classification of Documents for Information Retrieval in Document Management. In: Large document, Dresden University of Technology, submitted to 10.09.2010, pp. 1 - 131 describes a system for classifying types by means of semantic analysis.

A disadvantage of this prior art is that the processes for innovation generation, especially in the field of small and medium-sized companies are often too expensive and continue to take against the background of ever-shorter product life cycles an undesirably large period of time.

task

The object of the invention is therefore to develop a cost-effective and time-saving method for innovation generation, in particular using a topic-based selection of patents from different technological fields of application.

In particular, for a set of patent documents to be viewed, the ratio of a number of useful patent documents to a total number of patent documents should be optimized with the least possible time and computation.

The object is achieved by a method of the type mentioned by the features of the characterizing part of the independent claim 1.

The brainstorming process includes the following steps: a. ) automatic analysis of patent documents, including patent texts and patent graphics, by a computer; b. ) automatic selection of relevant patent documents by the computer; c.) representation of the patent graphics of the selected patent documents; d.) viewing and combining features contained in the illustrated patent graphics to generate new ideas by natural persons; wherein for the selection of the patent documents in step b.) in a first stage a text classification with a text classifier having a text classification function, and in a second stage an image classification with an image classifier having an image classification function, are traversed.

Advantageously, in order to carry out the method, in particular steps a.) And b.), The computer executes a program stored in a corresponding memory of one or more processors likewise belonging to the computer, in particular one or more microprocessors.

Advantageous embodiments of the invention are specified in the subclaims.

The process is a method of quickly and inexpensively developing product ideas, particularly in the field of engineering design. Furthermore, the method is optimized in particular for improving corresponding, already existing products.

The advantage of the process is that even small and medium-sized enterprises are supported in their brainstorming phase because only a basic amount of patent documents available free of charge or at least very cheaply is needed and because the continue to be connected with only a small financial and time expenditure and with comparatively little computational effort.

The method thus has the advantage that such a brainstorming process can be carried out within a few days and thus meets the demand for short development periods. In particular, a.) The period of automatic analysis of the patent documents by the computer and b.) The period of automatic selection of the patent documents are extremely shortened, in particular by a suitable combination of the text classification with the image classification.

In an advantageous embodiment, the results of said two stages are linked with the Boolean operator "AND." This can be time-saving if at least parts of the method are repeatedly performed, for example for test purposes, or if documents already read into the computer repeatedly in different analyzes For example, the result of the classification, for example the image classification, can then also be stored as a classification function for the respective document quantity or the documents can for example be provided with a corresponding attribute.

If the method is expected to be carried out only once, it is particularly advantageous according to a further embodiment, when initially one of the two stages of patent documents to be examined is run through, and then only pass through the other of the two stages selected patent documents. In this way, not all patent documents need to go through both stages, saving time and computing capacity. Since the second stage relates to the graphics evaluation and is likely to be more time-consuming than the first stage, it is particularly advantageous if the first stage, which concerns the text classification, is carried out in chronological order.

Thus, it is thus particularly advantageous if patent documents are first analyzed on the basis of their patent texts by means of the text classification and subjected to a selection, and then only the patent graphics of the selected patent documents are analyzed by the image classification and subjected to a further selection.

Thus, the second stage is only traversed by the patent graphics of the patent documents selected to be relevant in the first stage according to their patent texts, thereby saving considerable computing capacity and time. Finally, in this way, only a fraction of the patent documents to be analyzed in the first stage must pass through the particularly compute-intensive second stage. For example, a basic set may include 10,000 to 20,000 patent documents to be examined. In practice, the selection of the first stage may be, for example, about 5% -10% of the patent documents, ie, for example, 1000 documents, so that in the particularly compute-intensive second stage, only these 1000 patent documents, i. by means of the image classification whose patent graphics must be examined, whereby in the second stage e.g. 90% - 95% of the computational effort can be saved.

For appropriate selection, the classifiers, i. the text and image classifier, formed as follows:

Preference is given to a so-called "training" of a classifier, in particular of the text classifier, with manually selected training programs. the corresponding classification function, in particular the text classification function.

In a preferred embodiment, therefore, to train at least the text classifier for the current brainstorming project, thematically relevant patent documents, e.g. determined from a public database as training documents. For this purpose, a relevant search criteria can be used. In empirical experiments it has been found that a technical function, e.g. "Sawing", "Screwing", Welding ", achieved a much better hit rate over corresponding devices, such as" saw "," screw "," welding device ".

From the set of hits, suitable patent documents can then be selected as training documents in the form of a manual final selection with little manual effort.

In a preferred embodiment, therefore, particularly relevant patent documents can be identified as relevant and used as training documents for training the text classifier.

Furthermore, non-relevant patent documents can also be identified as irrelevant and also used as training documents for training the text classifier.

In this case, for example, the so-called% ² method ("Chi-Square method") known to the person skilled in the art, which is described in more detail below, can be used for feature extraction.

The% ² method generally determines the independence of two variables - and in the present case the dependence of one variable Characteristic for a category - in this case relevant / not relevant - determined. The% ² method is therefore particularly well suited to finding the most meaningful features. χ ² can be determined as follows:

In this formula mean:

A: number of documents from a category c that contain a specific feature m;

B: number of documents that are not included in category c and that contain the specific feature m;

C: number of documents from category c that do not contain the characteristic m;

D: number of documents that are not in category c and that do not contain m;

N: total number of documents in the corresponding training amount.

In general, the dependencies of all characteristics used can be calculated for all categories used and then averaged. The features can now be ranked for each category. The more dependent the characteristic of this category, the higher it stands in the ranking. The use of the χ ² method also has the advantage that this already normalizes to the interval [0,1], which makes the results comparable.

In a particularly preferred embodiment, in the second stage, the image classification may use a formal feature, in particular the complexity of the patent drawings, as a selection criterion. This is particularly advantageous for several reasons:

On the one hand, a corresponding image classification function after a single creation can be used for all such methods, and thus can be regarded as generally prescribed for such methods independently of the respective topic of the idea generation project.

Training the image classifier in this case only needs to take place once. For this, a set of training documents with manually selected, complex and less complex graphics is assembled once, whereby the individual graphics for training the image classifier are marked as complex or less complex. Then the image classifier can be calculated once, eg with the said% ² method.

On the other hand, this criterion of the complexity of the graphics for the natural persons, so for example developers, designers, potential inventors, amateurs interested, etc., also particularly advantageous in the interpretation, since in particular the manual recognition by the selection of less complex graphics significantly improved and thus significantly reduces the manual intellectual effort involved in combining the elements contained therein. The complexity of the graphics can be determined, for example, by the ratio of their lines to the total area of the graphic. It is advantageously to be considered that patent drawings are basically line drawings, and that the strength of the lines is comparable among a large number of suitable patent documents. For example, the length and / or area of the lines may be set in proportion to the size, ie, diagonal or area, of the overall graphic. In particular, the number of corresponding pixels can be used.

Thus, the proportion of lines of current analysis tools, e.g. Image analysis software, easily recognized and evaluated with low computational effort.

Furthermore, patent graphics in which the proportion of lines on the overall graphic is below a certain predetermined value can be considered to be less complex. Patent graphics in which the proportion of lines in the overall graphic is greater than the predetermined value can be considered complex.

The said predetermined value can, in an advantageous embodiment, also be determined on the basis of the entire image material to be examined, e.g. be determined as the mean. Then automatically the least complex graphics in relation to the whole are selected.

The value can also be fixed by the method independently of the image material to be examined. For example, this value, for example based on the ratio of the areas, may be relatively small and may be 0.01% or 0.05% or 0.1%, but it may also be 0.25% or 0.5%, 1% or 2.5% or even at 5% or 7.5% or 10% or more, eg 15% or 20% or even more, eg 25%, 30%, 35% or even 40% % lie. This value can also be used to set absolutely absolute, how complex the image material may be in order to be part of the selection.

Thus, the image classification can use the complexity of the patent graphics as a selection criterion by selecting those patent documents in the patent graphics of which the ratio of lines to the total area of the graphic is less than those patent documents which are not selected.

In this way, the less complex patent graphics can be automatically selected according to a predetermined criterion with little computational effort.

In a further advantageous embodiment, the image classifier, analogous to the text classifier, can be generated by a selection of content-relevant and non-relevant and appropriately marked training documents. This may be useful if the images of relevant documents have significant similarities, which may be, e.g. refer in particular to the topic focus of the brainstorming project according to the IPC and / or CPC classification or comparable further classifications.

It is thus possible that the image classifier also has content-related components in a particularly advantageous embodiment, ie searches for specific geometric shapes, eg special so-called "mating faces", locking devices, circuit arrangements and associated symbols, etc., and selects corresponding patent documents the image analysis in an advantageous embodiment, for example, pattern recognition and / or pattern analysis method include.

There are several ways to train and apply the text classifier / classifiers:

A first approach is to use patent documents of various such sections, in particular IPC or CPC sections, separately in the training phase as training documents so as to obtain a separate text classifier for each section. In the subsequent selection phase then patent documents of the various sections can be selected with a respective section-specific text classifier.

Although this means a much higher configuration and computational effort, but it seems at first quite understandable that, for example, the so-called "Precision", which specifies the ratio of number of relevant documents / number of documents found, so in simple words "the accuracy" of each classifier indicates that, at least because of the uniform meanings of words within a section, this would have to significantly improve.

Surprisingly, this thesis was not confirmed at least by the test series carried out. Thus, at least in the investigations carried out, no appreciable influence could be found by the use of PC-section-specific text classifiers.

It has therefore proved to be particularly advantageous for reducing the manual effort and / or the overall considerable computational effort to pursue an alternative approach, which consists in the first stage, ie in the text classification, of patent law. Select documents from different IPC / CPC or possibly also from other Patenklassifikations sections using the same text classification function. In other words, a single classifier can be used for patent documents from different sections, eg ICP / PCP sections, without significantly degrading the result. As a result, both the manual effort and the computational cost of the computer can be significantly reduced both when configuring the system and during the actual regular choice of patent documents.

For the selection phase, when assembling the basic set of patent documents from a database, e.g. from a public database, for example with the term "locking" not only from the IPC section H "electrical engineering" and section F "mechanical engineering", but also in section B "working methods" and section A "Daily life requirement" patent documents are compiled. Thus, the text classifier can naturally also have a stronger content component, and the image classifier can preferably evaluate formal criteria, such as the said complexity.

The representation of the patent graphics for viewing and combining by natural persons, such as developers, designers, professionals, interested lay people, etc., further limited by no means to conventional exhibitions in the form of galleries in which prints of the graphics can be posted conventionally. In a further preferred embodiment, these graphics can be distributed, for example via a network, to computers of the subscribers, ie the said natural persons, and displayed there in the form of a program, eg as a slide show. It can the Order of the individual graphics, eg depending on a random number generator, vary for different participants. Also, the same graphics may be presented to the same participants multiple times but in a different order to trigger different associations and create combinations. Furthermore are

Configurations conceivable in which the participants select the computer monitor continuous graphics by mouse click or parts of it can combine and save.

embodiment

An embodiment of the invention is illustrated in the drawings and will be explained in more detail below. Show it:

1 shows a simplified sequence of a two-stage selection phase;

FIG. 2 shows a more detailed sequence of the two-stage selection phase; FIG.

3 is a process diagram of an associated training phase;

Fig. 4 shows a flow of an entire idea generation process;

5 a shows a classification of patent documents of different sections by means of a plurality of section-specific classifiers;

Fig. 5 b is a classification of patent documents of different sections by means of a common classifier.

The figures contain partially simplified, schematic representations. In part, identical reference numerals are used for the same but possibly not identical elements. Different views of the same elements could be scaled differently. 1 shows a basic, roughly simplified procedure of a two-stage patent classification in a so-called "selection phase." The term "selection phase" serves to distinguish from a so-called "training phase", which otherwise resembles the procedure, and means that those determined in the selection phase , relevant patent graphics represent a regular result that can later be spent on brainstorming by natural persons, such as developers, designers, professionals, interested lay people, etc.

In this two-stage patent classification, a first stage, comprising a text classifier 3, is first passed through patent documents of a base set 1 for analysis of the associated patent texts 2, and then a second stage, comprising an icon classifier 7, is selected only from the patent documents selected in the first stage Analysis of related patent graphics 6 go through.

The basic set 1 could theoretically consist of hundreds of thousands or even millions of patent documents. Since, however, only about 100 relevant patent graphics 23 are required for the brainstorming method, a significantly lower basic quantity 1 of, for example, 5,000 to 10,000 documents may be sufficient.

In order to obtain the basic quantity 1 from a public database, for example, a thematic preselection, for example by means of a keyword search and / or by a rough restriction of the IPC / CPC sections or associated patent classes, may have already taken place. This is particularly advantageous because the timely avoidance of completely irrelevant subject complexes saves a great deal of computing power, which makes the method much more efficient. In the first stage, the patent texts 2 of the patent documents of the base set 1 are selected with a text classifier 3 having a text classification function γ. As a result, a first subset 4 of textually relevant patent documents is selected from the basic quantity 1.

At the same time, a residual amount of 5 textually irrelevant patent documents is automatically generated thereby, which is no longer considered in the further process.

In the second stage, the patent graphics 6 belonging to the patent documents of the first subset 4 are then compiled. The patent documents of the first subset 4 are now by a second classification, namely an image classification with a Bildklassifi- 7, comprising an image classification function ε, in turn divided into two more so-called "categories", namely in a second subset 8 and in a further residual amount The second subset 8 now comprises those patent documents which are suitable both for their textual content and according to their patent graphics for manual analysis, combination and / or brainstorming .. In this context, these patent documents are also considered to be relevant documents and their graphics as relevant Patent graphics or also referred to as relevant patent images 23. The remaining quantity 9, formed from the patent documents of the first subset 4 which are not relevant according to their graphics, is no longer considered in the present process.

FIG. 2 shows a somewhat more detailed process diagram of the selection phase in the two-stage process.

In the first step 1 1, the patent texts 2 of the basic set 1, for example in the usual XML format, read. In one possible embodiment, this takes the form of full texts, comprehensive send the so-called "abstract", ie the abstract, furthermore the so-called "state of the art", the so-called "patent description", the so-called "exemplary embodiment" as well as the so-called "claims." However, preference may be given to the text section which is the prior art Furthermore, the reading-in of the claims can advantageously also be dispensed with because they have increasingly juristically embossed formulations and terms which, according to experience, are less well suited for the generation of ideas Thus, it is particularly advantageous if the text format enables this distinction, ie, identifies the different text sections accordingly.

In the second step 12, an extraction of features takes place. This can be done by determining complete concepts or by partial terms, ie so-called "n-grams." For example, the following trigrams (N = 3) are formed from the sentence "They walk to their home": "they, lau , to, to, to, to, to, to, to, to, to, to, to, use. "Other feature expressions are also possible in this context, eg consisting of several terms with defined word spacing up to grammatically defined sentence constructions with a subject / predicate In connection with or separately from this, it is also conceivable to merge synonyms in a meaningful way and depending on the task, it may be useful to use a so-called "stop-word filter", which is meaningless terms such as articles, conjunctions, prepositions , etc., as well as the use of a thesaurus for correcting possible spelling errors, such as those caused by the scanning of documents and a subsequent optical character recognition, For example, the so-called "OCR" (optical character recognition), can arise, possible and useful.Also the said χ ² - method can be used.

In the feature weighting in the third step 13, the frequency of the respective features can be used absolutely or normalized and, in particular, also be compared with their total frequency in the basic quantity 1.

According to a so-called "retrieval model", these weighted features can be analyzed and evaluated in the fourth step 14 by applying the text classification function γ.

The usual retrieval models used in practice are usually the so-called "Boolean model", the so-called "vector space model" and the so-called "probabilistic model":

These are characterized as follows:

The Boolean model, also known as "keyword search", is based on the approach of searching text documents for the presence or absence of keywords, searching for words using a single word, resulting in a set of documents containing these keywords The search terms can be combined by the logical operators "AND", "OR" and "NOT". A ranking of the result set is not possible.

The vector space model is based on the fact that both the search query and the documents of the result set are mapped as vectors in a high-dimensional space. The vectors of the request and each result is compared with each other. The more similar these vectors are, the more highly the relevancy of the particular document of the result set for the response to the query is estimated. This results in a ranking of the documents in the result set. To use the vector space model for the classification of documents, for example, the so-called "Support Vector Machine" can be used.

The probabilistic model is based on probability values. Finally, the biggest challenge in information retrieval in texts is the vagueness of the language, so there is no absolute assurance that a document is relevant to a request, so probabilities of relevance to documents are calculated Relevance is given as a similarity value The similarity here depends on the frequency of the search terms in the document The higher the calculated probability, the more relevant the document is estimated for the query The probability-based classification method, for example, the so-called "Naive-Bayes classification method" use Find.

According to their weighting, the patent documents can be assigned to a category by applying the text classification function γ to their patent texts 2 in the fifth step 15 by the fourth step 14. The category is in particular the first subset 4 described above and shown in FIG.

For the relevant patent documents found in this way in the sixth step 16, the associated patent graphics 6 are compiled in the seventh step 17. In the eighth step 18 these patent graphics 6 are read, for example as PNG files or in any other graphic format. The graphics of the textually irrelevant patent documents of the remaining quantity 5 are not read. This avoids the import of content-irrelevant graphics and thus leads to a lean and resource-saving process.

In the ninth step 19, a feature extraction from the patent graphics is performed with a so-called "image mining".

Analogous to text analysis, image mining can use digital images, e.g. the patent graphics 6 of the patent documents of the first subset 4, are sought in a targeted manner according to predetermined criteria. This is based on content-based image retrieval, the so-called "content based image retrieval." The content of an image is analyzed by a software algorithm, such as color, outlines, and textures, which can be used to extract features from the image as well as As soon as the image information is present as feature vectors, the methods for feature selection and in the tenth step for feature weighting 20 can now be used analogously to the text analysis described above.

Thus, by applying the image classification function ε in the eleventh step 21, the patent graphics in the twelfth step 22 can be assigned to different categories 8, 9. In particular, a binary classification takes place, ie the graphics are divided according to a yes / no decision between two different categories, namely between the second subset 8 and the associated further residual set 9. The second subset 8 then comprises the relevant patent documents, including the relevant ones Patentbil- 23, which are suitable for manual combination / analysis / brainstorming by natural persons.

In order to be able to perform such selection methods, however, the text classification function y and the image classification function ε must first be calculated.

3 shows the procedure for training the text 3 and the image classifier 7, that is, for calculating the respective classification function γ, ε.

For this purpose, in order to train the classifier, in this case the text classifier 3, first of all manually selected content documents are selected, which are referred to below as training documents. These training documents should not be part of the basic quantity 1 in order not to falsify the result by a so-called "overtraining." For example, these training documents can first be searched for with keywords in a public patent database, for example, and then selected manually. Thus, for example, about 250-500 documents are used as relevant training documents in this way Furthermore, a large number of non-relevant training documents are selected and selected in about the same number, ie also at least 100, preferably 250-500 documents These manually selected relevant and non-relevant training documents used in the following as training documents.

In the first step 11 ^'of the training of the text classifier 3, the texts belonging to these training documents are read into the computer and thereby marked as relevant or not relevant. In the second step 12 ^' , an extraction of features from these training documents takes place analogously to the second step 12 of the selection phase described above. In this case, the χ ² method known to the person skilled in the art and already described in detail above can be used as a simplified approach for the probabilistic model on the basis of frequency values instead of probabilities. There is also a considerable saving in computing power for the following reason:

Because many features would greatly slow down the process in both the training phase and the selection phase, it is advantageous to use only the most meaningful features, those features that most clearly distinguish the relevant from the irrelevant documents. In general, the independence of two variables - and in the present case the dependence of a feature on a category - in this case relevant / non-relevant - is determined by the% ² method. The χ ² method is therefore particularly well suited to finding these most meaningful features.

The% ² method can be used for feature extraction 12, 12 ^' , 19, 19 ^' in both the training and selection phases for both text and image analysis. Furthermore, the χ ² - method for calculating 30, 31 of the classification functions

ε, γ can be used.

In the third step 13 ^'of the training phase, the feature weighting 13 ^' takes place analogously to the feature weighting 13 of the selection phase. In particular, the frequency of occurrence of particular identifiers can be evaluated and can furthermore be set in relation to the total frequency of these identifiers. In the fourth step 14 ^' , the text classification function γ is calculated. In this case, the relevance of individual weighted features can be used, for example, with the χ ² method described above, in order to select the most relevant features based on the searched content.

The training of the image classifier 7 is substantially analogous to the aforementioned training of the text classifier 3. A suitable number> 100, e.g. between 250 and 500 relevant graphic documents manually selected as training documents according to their characteristics. These documents should preferably not belong to the patent documents of the base set 1 and in particular not to the first subset 4, so as not to falsify the result by said overtraining.

These training documents are identified in the first step 18 ^'of the training of the image classifier 7 as relevant or not relevant and read into the computer.

In the second step 19 ^' , the features are extracted by means of the abovementioned "image minimizing" and weighted in the third step 20 ^', eg using the χ ² method, in order to be able to calculate the classification function ε in the fourth step 21 ^' .

As an alternative to training the image classifier 7, this can also be fixed and can belong to the method. In particular, this makes sense if the image classifier 7 refers exclusively to formal features, such as the complexity of the graphics. He can then either once by training with training documents, which were selected according to these formal criteria, been mediated. Alternatively, a fixed value can be defined as a criterion, for example the ratio in the graph of recognized lines, eg their total length or their area, eg number of pixels, to the overall dimension of the graphic, eg their diagonal or area, eg number of their pixels. In addition, the complexity of the searched relevant patent graphics is then manually adjustable.

FIG. 4 exemplifies a possible overall sequence of a method for brainstorming.

In point I.), the basic quantity 1 of at least 500 patent documents with associated patent texts 2 of a specific language is compiled.

In point II.) Takes place independently of a manual selection of training documents depending on the topic of the technical problem. The training documents should not be included in the basic quantity 1.

In point III.) Training a particular binary Textklassi- fikators 3 with training documents by means of an automatic feature extraction from the training documents.

In item IV.), The text classifier 3 thus determined is applied to the patent documents 2 of the basic quantity 1. As a result, the first subset 4 is generated.

In point V.) patent graphics 6 are combined to the patent documents of the first subset 4.

In Item VI.) Are selected suitable manual patent graphics as training documents for the Bildklassifikator 7, the associated Patent documents preferably do not belong to the basic set 1 and in particular not to the first subset 4. The patent graphics may be relevant and irrelevant and appropriately labeled for training the image classifier 7.

In point VII.), An image classifier 7 is trained by the automatic feature extraction 19 ^' from these training documents. Alternatively, a predefined image classifier 7 can also be used, in which, in particular, a formal and therefore cross-subject criterion, for example the complexity of the images, is used as a selection criterion. in point VIII.), the image classifier 7 is applied to the patent graphics 6 of the first subset 4 so as to produce a second subset 8.

In point IX.), The patent graphics of the patent documents are issued to the second subset 8.

In point X.) optional technical solutions contained in the patent graphics of the second subset 8 can be combined with each other. This point is not necessary for the implementation of the procedure.

In point XI.), The patent graphics or combination of the solution modules from the optional point X.), For example, as images are output, for example by visibly unhooking the prints in a gallery or by graphical representation using a computer program, for example via a network or the like. In point XII.), The combinations are evaluated manually by the natural persons, ie participants of the brainstorming process.

Fig. 5 illustrates the difference between the selection by means of the section-specific text classification functions Yi, 2, 3 shown in Fig. 5a and the non-section-specific text classification function y shown in Fig. 5b.

In FIG. 5 a, the patent texts of individual IPC sections S1, S2, S3 are respectively selected via an associated text classifier 3 ^' , 3 ^" , 3 ^"' , having a corresponding text classification function γ- ₁ , γ ₂ , 73.

Subdivision quantities 4 ^' , 4 ^" , 4 ^"' are then respectively generated from the relevant patent documents determined in each case. These subset quantities 4 ^' , 4 ^" , 4 ^"' can then be combined again into the first subset 4. The non-relevant patent documents are supplied according to the associated residual amounts 5 ^' , 5 ^" , 5 ^"' , which are no longer considered for the process.

FIG. 5b shows how the patent documents of the three different IPC sections S1, S2, S3 are selected via a single common text classifier 3 for generating the first subset 4.

In practice, this selection via a single common Textklassifikator 3 according to the present test results works faster and without significant qualitative losses compared to the much more cumbersome section-specific selection by multiple text classifiers 3 ^' , 3 ^" , 3 ^"' . Method for finding ideas

LIST OF REFERENCE NUMBERS

1 basic set of patent documents

2 Patent texts of the patent documents of the basic quantity

3, 3 ^' , 3 ^" , 3 ^"' text classifier, section-specific text classifiers 4 first subset

4 ^' , 4 ^" , 4 ^"' subset quantities of the first subset

5, 5 ^' , 5 ^" , 5 ^"' residual quantity (s)

6 Patent graphics of the patent documents to be analyzed

7 image classifier

8 second subset

9 more remaining quantity

1 1 Reading the patent texts of the basic quantity into the computer

1 1 ^' Reading the training documents into the computer

12,12 ^' Feature extraction (text)

13.13 ^' feature weighting (text)

14 Apply the text classification function

14 ^' Calculation of the text classification function

15 Allocation of relevant patent documents to the first subset

16 relevant patent texts

17 Compilation of relevant patent graphics

18 Reading in to examine patent graphics

18 ^' reading the patent graphics of the training documents 19, 19 ^' feature extraction (graphic)

20, 21 Feature weighting (graphic) 21 Apply the image classification function

21 ^' Calculation of image classification function

22 Assignment of relevant patent graphics to the second subset

23 relevant patent graphics

Ύ, y ^, Y2, Y3 text classification function, section-specific text classification functions ε image classification function

S1, S2, S3 Patent texts of individual patent certification sections, (e.g.

IPC sections, CPC sections)

I.) - XII.) Process steps of a possible overall course of a

Method for brainstorming

Claims

Method for finding ideas

claims

Method for brainstorming, in particular for developing and improving products in the field of technical construction, comprising the following steps:

a. ) automatic analysis of patent documents comprising patent texts (2) and patent graphics (6) by a computer; b. ) automatic selection of relevant patent documents by the

Computer;

c. ) Representation of the patent graphics (23) of the selected patent documents;

d. Considering and combining features contained in the illustrated patent graphics to generate new ideas by natural persons;

characterized in that

for selecting the patent documents in method step b) in a first stage a text classification with a text classifier (3), comprising a text classification function (γ), and in a second stage an image classification with an image classifier (7), comprising an image classification function (ε), to go through.

A method according to claim 1, characterized in that the

Results of the two stages can be linked with the Boolean operator "AND".

A method according to claim 1, characterized in that initially one of the two stages is run through, and that then only the patent documents selected thereby pass through the respective other of the two stages. A method according to claim 3, characterized in that first passing through the first stage of patent documents belonging to a basic set (1), wherein a part of the patent documents is selected as belonging to a first subset (4), and then the second stage is executed only by the patent documents of the first subset (4) is passed through.

Method according to one of the preceding claims, characterized in that the text classification function (γ) is determined by training the text classifier (3) with manually selected training documents.

Method according to claim 5, characterized in that said training documents are previously determined manually on the basis of a search criterium relevant to the respective brainstorming, in particular on the basis of a technical function, and identified in each case as relevant or not relevant.

Method according to one of the preceding claims, characterized in that the image classification uses the complexity of the patent graphics (6) as a selection criterion.

Method according to claim 7, characterized in that the

Image classification function (ε) is specified.

Method according to one of the preceding claims, characterized in that the image classification function (ε) is determined by training the image classifier (7) with manually selected training documents.

10. The method according to any one of the preceding claims, characterized in that in the text classification patent documents from different sections (S1, S2, S3) are selected with the same text classification function (γ).