WO1982000442A1

WO1982000442A1 - Ideographic word selection system

Info

Publication number: WO1982000442A1
Application number: PCT/US1981/001017
Authority: WO
Inventors: R Johnson
Original assignee: R Johnson
Priority date: 1980-08-01
Filing date: 1981-07-30
Publication date: 1982-02-18
Also published as: JPS57501254A

Abstract

An ideographic word selection system distinguishes between the homonyms of a language as the operator inputs (10) the phonetic spelling of the desired character or word along with one or more related words for that character as necessary for unique selections from among homonyms. An electronically retrievable dictionary (14) includes each ideographic character (16, 17) to be used in the system along with several related words, thus providing for system flexibility for operators of different backgrounds and mnemonic preferences. A comparison of the operator input (10) with the dictionary (14) provides a unique word selection.

Description

IDEOGRAPHIC WORD SELECTION SYSTEM The present invention is directed to an ideographic word selection system, and specifically to a character processo which can rapidly enter Chinese, Japanese or Korean charac ters into a computer system, for example for printing purposes, the foregoing being done from a keyboard having limited number of keys.

Word or character processing for Oriental languages such a Japanese, Chinese, Korean, etc, has been difficult because of the structure of the written language; that is, there is no limited alphabet, rather thousands of different ideo¬ graphic words and characters. Other languages, such as Arabic or Farsi, have a written alphabet but also have numerous different ways of writing each letter; the result¬ ing written language is difficult to process using a key¬ board for entry because of the number of different characte which could be used. Moreover, if the pronunciation of a character or a word is used to access that character, a large set of homonyms will be produced because of the similar pronunciations of other characters or words. If these are, for example, displayed on a cathode ray tube, it would still be very slow and cumbersome to make specific selections from those visible. Thus, all Chinese and Japanese and other similar character systems using a phonetic character system must have a two-stage process: the entering of the symbols, whether Roman or not, for the pronunciation of the character, and the further selection from among homonyms. The second stage selection is the homonym problem. All devices up to the present have proposed solutions of the homonym problem which are clumsy and slow.

For example, U.S. patent 4,096,934 granted January 27, 1978 to Kir ser et al. discloses a method and apparatus for reproducing desired ideographs, where a phonetic spelling the desired ideograph, along with a characteristic identi¬ fication of this desired ideograph, is used by a computer identify the desired ideograph. Such characteristic iden¬ tification, as illustrated by several pages of tables, is based on an operator making a judgment about the geometric shape of the character. This is a slow procedure and one which is liable to errors when operators try to increase their speed. A suggested alternative of Kirmser, which is just mentioned briefly as opposed to the geometry method described at length, is to use the suggested meaning of th character described. Besides being slow and inefficient, illustrated by the large table of Kirmser, if the operator does not key in exactly the right geometric description, t method will fail. In other words, it is inflexible and there is no allowance for the necessarily varying back¬ grounds of different operators.

Thus, it is an object of the present invention to provide a improved apparatus and method of ideographic word selectio

In accordance with the above object, there is provided an ideographic word selection system for selecting a desired word of a language having a relatively unlimited number of ideographic words by use of a keyboard having a limited number of keys. This system comprises a method for elec¬ tronically storing information representing the ideographi words of a selected number of words of a language, where each ideographic word has associated with it its phonetic spelling, as well as the phonetic spelling of several word related in one or more ways to the ideographic word and th computerized graphic representation of the word. Other information, such as English equivalent words, may also be stored with the word in electronic storage. The method includes the use of the above-mentioned keyboard, for inputting information, such as the phonetic spelling of a desired ideographic word, and also inputting the phonetic spelling of at least one of the words related to the ideographic word. The stored and inputted information are then compared to select the desired ideographic word. Alternatively, the selection of one word from among homony can be used to preselect the same words by subsequent entr of the pronunciation alone. Thus repeated access to the same word can be secured without the need for repeated ent of a related word following entry of the pronunciation. This optional method is especially suited to the data entr of specific text, in which repeated words are very likely and for which this method of entry is especially advanta- geous. Use of a word or character having the same pronun¬ ciation as a previous different word or character would the require only that the related word be entered after the pronunciation.

From a method standpoint, there is provided a method of selecting a desired word by use of a keyboard, from an electronically stored list of a selected number of ideo- ographic words of a language. This is done by the followin s.teps: 1) storing in association with each ideographic word the phonetic spelling of the ideographic word, along with

C PI the phonetic spelling of several words related to the ideographic word; 2) inputting, by the use of the keyboard the phonetic spelling of a desired ideographic word and als the phonetic spelling of at least one word related to the desired ideographic word; and 3) comparing the stored and inputed information to uniquely select the desired ideo¬ graphic word.

From another apparatus standpoint, there is provided a word selection system which includes a table of electronically retrievable data. This table includes data representing th ideographic words of a large number of desired words from a language, data representing the phonetic spelling of each ideographic word, and data representing the phonetic spell- ing of several words related to each ideographic word.

Figure 1 is a block diagram embodying the system of th present invention;

Figure 2 illustrates a table of electronically retrie- veable data which is used in the memory of the present invention;

Figure 3 is a drawing in tabular form, illustrating an example of how characters would be selected in a

Japanese character processor; and

Figure 4 is a drawing in tabular form, illustrating ho ideographic characters would be selected in Chinese.

The present invention is built around a linguistic and mnemonic feature -of the relations between a spoken and a written language; there is often more than one way to pronounce a character, and it is easy for anyone who knows language to think of synonyms or other related words when

SE.EA presented with a character. In the present application, "word" and "character" are used interchangeably.

Very briefly, in the present invention, an operator first enters on a keyboard the phonetic representation (Roman or kana) of a character, and then the phonetic representation of at least one other word - either a second pronunciation of the character, or a related word of some kind. The phrase "related word," which is used for the second entry, is defined as a word which an ordinary person skilled in t use of a language would think of when encountering the ideographic character. Although most people would think o a variety of different responses, the total number of different responses to a request to think of a related wor would be at least finite, and probably small.

The present invention makes use of the mnemonic device of asking the operator to enter a second word related to the first, or an alternative pronunciation which will vary according to the operator and may vary on different occa¬ sions with the same operator. The system then responds by accessing the desired character, distinguishing from among homonyms by having already provided a data base containing all of the common related words from which an operator migh be expected to choose, or which might occur to an operator to enter.

The crucial point of linguistics making this device possibl is that people who share a language also share associations with the words used in that language. Thus, Japanese are likely to be able to think of no more than three to six related words for- a given character when presented with tha character. Thus, a data base in which a given character wa arrayed with the six most common related words makes it possible to sort out from among the homonyms the desired

OMPI character when the second word is entered. Again, the ent of a phonetic representation of a character (the first wor does not uniquely select from among homonyms. But the arr which is constructed in electronic memory (which will be described below) makes a selection from among homonyms quickly and easily. Moreover, if there is more than one o the homonyms having a given related word, then the desired word will be accessed uniquely by either: 1) entering a second related word, or 2) picking visually from a cathode ray tube display which of the homonyms displayed is the desired character to be entered. This case of more than one of the homonyms having the same related word is antici pated to be relatively rare. It would not materially lowe the average speed at which data are entered.

The following example in relation to English illustrates t way in which homonyms might cause difficulty in the trio o words To, Too, and Two. A phonetic rendering of each woul be identical to the other; one could use (TU) for any of the three. Regardless of the context of the word in text, the desired specific word can be selected by any one of several compound words. The use of (TU/NUMBER) , (TU/COUNT) (TU/DOUBLE), (TU/TWICE), (TO/TWIN) would produce from the "dictionary" the same desired word: two. It does not- matter which of these the operator enters as the compound word. The "dictionary" (which would be electronic in this situation) contains all of the related words to the word "two". Moreover, none of them are associated with the wor "to" or "too". In other words, in the electronically stor dictionary, in conjunction with the phonetic spelling of each of the ideographic words, there are several related words.

This obviously implies that the operator has much flexi- bility in that there is not a single unique solution as in the above Kirmser patent.. Moreover, it allows for varying backgrounds of operators. If some rare example occurred where a desired word/character had the same pronunciation for both parts of its compound word as another character, two means of resolution are possible: (1) yet another compound word can be entered. The possiblity of a triple homonym (that is, two characters with identically pronunce synonyms having another set of identically pronunced syno¬ nyms) is very remote. (2) All possible characters to whic the duplicate compound words refer will be displayed on th screen of the cathode ray tube, and numbered, and the operator is enabled to select one of them by a number.

Thus, as used above, a "compound word" is composed of (a) the phonetic spelling in Roman or other phonetic symbols, such as kana, of the desired character, and (b) the simila phonetic spelling of a synonym or related word.

Figure 1 illustrates the block diagram of the system and embodies the present invention as it might be applied to a Chinese or Japanese language character processor. A key¬ board 10 contains a limited alphabet or a limited number o keys of either Roman or katakana. That is, in the case of Japanese, it would be katakana, and in the case of .Chinese, Pin-yin Roman. Associated with the keyboard is a cathode ray tube (CRT), display screen 11, and a hard copy printer 12. A computer and storage device 13 interrelates and controls all of the units of the system which has as a las unit a key element, which is a dictionary which is related to the language being processed. Details of the dictionary are shown in Figure 2 and further explained in Figures 3 an 4.

In general operation, an operator who is skilled in the particular language which is being used, types in a phoneti spelling in Roman or katakana of a desired ideographic wor The operator next inputs the phonetic spelling of at least one word related to the desired ideographic word. There¬ upon, this compound word input is compared to dictionary words, and printer 12 will type that particular word. Alternatively, it is stored for future use. The CRT displa screen 11 may be used where there is not a unique solution, and where perhaps an additional related word must be entere or for further instructions to the operator.

The control details of such system, except for the unique dictionary, are of course illustrated in the Kirmser patent In addition, there are commercially available on the market CRT display units associated with printers for at least printing and displaying Japanese ideographic characters known as Kanji characters.

As explained above, the construction of the compound word/ character dictionary is a unique feature of the present invention. Such dictionary is shown in Figure 2. As illustrated in Figure 2 in column numbered 16 of the dic¬ tionary, each ideographic character stored in the dictionar is given a sequential character number. These are asso¬ ciated with a character dot matrix graphics data set 17 containing sufficient binary control words to print or display the ideographic character. Next, there are two columns, 18 and 19, which contain two different possible phonetic spellings of the associated characters. One is designated "On-Yomi" and the other "Kun-Yomi". This is because the Japanese language features a double system of usual pronunciation: the character may be pronounced according to a Chinese fashion (the On-Yomi) or the Japanes manner (Kun-Yomi). The character which is pronounced in th Chinese manner will have homonyms very different from those it would have in the Japanese style of pronunciation. The present invention thus allows either pronunciation to be used for the first word entered, and the remaining pronun¬ ciation may be used as though it were a related word. Or, the operator may choose one of the pronunciations and then use a related word. Finally, in columns 20 A through 20F, there are entered phonetic spellings of up to six related words.

The system is structured so that the keyboard entry of the first part of the compound word/character accesses identic character phonetic spellings of either the Chinese or Japanese style spellings of the pronunciations of the desired Kanji (in the case of a Japanese processor) charac ter. If, for example, the first half of the compound word/character is an On-Yomi, a unique selection can be accomplished by entering the alternate spelling in Kun-Yom or one of the associated related words. As discussed abov in either case, a unique solution is provided, and a word will be selected.

When a character has been selected through identity betwee the keyboard entry and the two elements in the dictionary through a matrix of On-Yomi, Kun-Yomi and related words/ synonyms expressed in phonetic characters, the number of this character from column 16 is transferred to memory or the printer directly. Thereafter, these character numbers can access the printer repetoire for printing purposes or b use of column 17, the character can be recalled to the CRT 11. Thus, after the ideographic character is selected, it may easily be manipulated by the use of the character number column 16.

In general, the dictionary is constructed so that additiona entries of characters, their numbers, graphics, On-Yomi and Kun-Yomi, and related words and synonyms can be made, limited only by the extent of available memory.

Q!v:FI Y.'IFO The foregoing is generally applicable to the Japanese language version of this character process. In the Chinese version, and also Korean, these depend on a Roman character keyboard and Roman phonetic system for operation. Chinese dialects most in use are Mandarin and Cantonese. Thus, separate versions of the dictionary would have to be pre¬ pared for each dialect. Also, there are at least two com¬ peting systems of Romanization in Chinese. These are the Wade-Giles and the Pin-yin; separate versions of the dic- tionary of Figure 2 would have to be prepared for each of these. The Chinese systems then would be expressed as Mandarin-Wade-Giles, Cantonese-Wade-Giles, Mandarin-Pin-yin and Cantonese-Pin-yin. For the Japanese version, only one standard pronunciation exists as a dominant dialect, and th keyboard can either be Kana or Romanji. In the Japanese version, the operator must be able to translate a character into its pronunciation. Most Japanese would read a charact in either its On-Yomi or Kun-Yomi pronunciation; a word would be entered with its customary pronunciation. A related word would come quickly to mind in either case and could be entered through the keyboard as its Kana or Romani pronunciation.

Thus the advantage of the present system is that a context is provided for a character independent of the text in whic the character is used. The context (the second half of the compound word/character - the related word) enables machine selection of a unique character in almost every instance. And, when it does not, either the character is not in the repetoire of the dictionary and must be added, or all possible characters from the dictionary are displayed on th CRT for further selection of the operator. This last will be very rare.

O PI - The table of Figure 3 illustrates a dictionary in the for of Figure 2 for a Japanese character processor. Twenty characters are listed. The English translation is listed for each. As discussed above, in Japanese a character may be pronounced according to the Chinese fashion (the On-Yom or the Japanese manner (the Kun-Yomi). Thus, a character which is pronounced in the Chinese manner will have homony very different from those it would have in the Japanese style of pronunciation. The present invention allows eith pronunciation to be used for the first word entered, and t remaining pronunciation may be used as though it were a related word. Or the operator may choose one of the relat pronunciations and then use a related word. An example of the many different paths used to access the same is given for characters numbered 5 and 6, namely "KA" and "KE".

These are identical characters having two different Chines and Japanese style pronunciations. The following key-stro entry sequences each uniquely access the desired character (either character 5 or 6 from Figure 3, since they are identical) :

1. ka ya

2. ke ya

3. ka ke

4. ke ,ka

5. ka ju

6. ke ju

7. ka kyo

8. ke kyo

9. ka ken

10. ke ken

In the actual invention, the possibilities would be much greater, since up to six related words would be used. Thus an operator is given great flexibility, and may adapt

OMPI individual mnemonic preferences to the selection process quickly and easily regardless of which related word comes t mind.

If a double homonym occurs, this is handled as discussed above, by either display of the few remaining homonyms on the CRT or entry of yet another related word. The visual check here helps to insure against error, as it would in th entry of uniquely selected characters.

Referring now to the table of Figure 4, this illustrates a Chinese character processor, where the initial ideographic shape of the character is illustrated in 1 - 20. A new Latin Romanized phonetic spelling (Pin-Yin) in the mandarin dialect is illustrated, along with three related words, and an English translation. The characters themselves are stored as numbers as discussed above, specifically 1-20 in this simplified format, which can be routed to a printer an to the associated dot matrix graphics, for display on the CRT. For example, the specific selection of a Chinese character by the present invention is as follows:

1. The operator enters the character "bi" from a standard keyboard having Roman letters. (The keyboard is standard and represents the state of the art in computer peripheral manufacture. Japanese user would use either Roman or katakana keyboards.)

At this point, the operator has only accessed the set of homonyms which are all pronounced "bi". No unique character selection has taken place. This particular table shows four homonyms (9, 10, 11, and 12) having "bi" for a pronunciation. Actually, there are many more homonyms in the "bi" group.

fϋfE

OMFI 2. The operator enters a space from the space bar.

3. The operator enters "maobi" and uniquely selects the character #9 from the table used here as illustration.

The operator has performed a total of eight keystrokes, rapidly entering the pronunciations of both the desired character/word and a related word. Note that the related word from the table is a compound word containing "bi." Most of the related words in the table will contain elemen similar to the word/charcter to be selected, and greater speed in typing can be secured by using a memory key to contain the pronunciation of the word/character being accessed and then releasing it in a single keystroke.

Another example is:

1. The operator enters "feng" and enters a space.

2. The operator enters "genghuang" and has accessed the Chinese character whose English trans- slation is "Phoenix.

All of the entered words are from among the approximately 480 unique pronunciations in Chinese. Thus, since entry is limited to these words, operators will become very pro- ficient in a short time.

In both the case of the Japanese,^' Chinese or in fact other ideographic character memories, provision can be made for a operator to add a related word if it is not already in the array, once the desired character has been located among it

OMPI homonyms. The operator could also, by specifying the character through pronunciation and related words, use the graphics capability system of an associated computer ter¬ minal entry to draw the character and add its repetoire to the system. Thus, the present invention supplies an impro ideographic word selection system.

O PI

Claims

WHAT IS CLAIMED IS:

1. An ideographic word selection system for selecting a desired word of a language having a relatively unlimited number of ideographic words by use of a keyboard having a limited number of keys comprising:

means for electronically storing information repre¬ senting the ideographic words of a selected number of said words of said language, each of said ideographic words having associated with it the phonetic spelling of the ideographic word along with the phonetic spell ing of a plurality of words related to the ideo¬ graphic word;

means, including said keyboard, for inputting infor¬ mation into such system which is the phonetic spellin of a desired ideographic word and also inputting information which is the phonetic spelling of at leas one of said words related to the desired ideographic word, as necessary for unique selection;

and means for comparing said stored and inputted info mation to select said desired ideographic word.

2. A method of selecting a desired word, by use of a keyboard, from an electronically stored list of a selected number of ideographic words of a language comprising the following steps:

storing in association with each of said ideographic words the phonetic spelling of the ideographic word along with the phonetic spelling of a plurality of words related to said ideographic word;

O FI inputting, by use of said keyboard, information which is the phonetic spelling of a desired ideographic word and additional information which is the phonetic spelling of at least one related word to the desired ideographic word;

and comparing said stored and inputted information to uniquely select said desired ideographic word.

3. Apparatus for use with an ideographic word selection system comprising: a table of electronically retrievable data including;

data representing the ideographic words of a large number of desired words from a language and including;

data representing the phonetic spelling of each ideo¬ graphic word and including;

data representing the spelling of a plurality of words related to each ideographic word.

4. A system as in Claim 1 where said stored information includes alternate phonetic spellings for certain ideo- graphic words and said inputted information may inσlude sai alternate spellings and said comparing means may utilize only this information to select the desired ideographic word.

5. A system as in claim 4 where said language is Japanese and said alternate phonetic spellings are On-yomi and Kun-yomi, where comparing is not confined only to such pronunciations.

O PI ^■16a-

6. A system as in claim 1 when said language is Japanese, said ideographic words are Kanji, and on said keyboard is the Katakana alphabet and other kana alphabets.

-17-

7. A system as in claim 1 where said language is Chinese and on said keyboard are Roman characters suitable for Pin-yin.

8. A system as in claim 1 including a cathode ray tube (CRT) for displaying selected words where said comparison does not produce a unique solution.

9. A system as in claim 1 where said means for inputting information allows another word related to a desired ideo¬ graphic word to be inputed if said comparing means does not produce a unique solution.

10. A system as in Claim 1 including means for further saving keystrokes by allowing a word/character to be access by pronunciation alone if the immediate previous selection accomplished by that pronunciation or pronunciation/related word produced the word/character now desired again.