US20060217956A1 - Translation processing method, document translation device, and programs - Google Patents
Translation processing method, document translation device, and programs Download PDFInfo
- Publication number
- US20060217956A1 US20060217956A1 US11/197,508 US19750805A US2006217956A1 US 20060217956 A1 US20060217956 A1 US 20060217956A1 US 19750805 A US19750805 A US 19750805A US 2006217956 A1 US2006217956 A1 US 2006217956A1
- Authority
- US
- United States
- Prior art keywords
- annotation
- translation
- document
- type
- dictionary
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/169—Annotation, e.g. comment data or footnotes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
Definitions
- a machine translation is performed by using a computer to replace character (words) with another character (words), by analyzing the characters and applying dictionary data or a predetermined algorithm to thereby translate from a specific language to a different language. If a text is not stored in a computer-readable format, in other words, if character information is not included in the text, prior to translation process, it is necessary to perform an OCR process for reading a printed text by a scanner device, to perform a character recognition process, and to extract character information.
- the present invention has been made in view of the above circumstances.
- FIG. 2 is a diagram explaining a flow of the processes executed in document translation device 1 .
- FIG. 3 D is a diagram showing one example of a text that is being edited.
- FIG. 4 is a diagram showing correspondences between types of annotations and editing styles.
- FIG. 5 is a diagram showing a table of correspondences between a designated word, a dictionary to be used, and a priority order of dictionaries to be used
- Document structure analysis unit 101 uses a predetermined algorithm, performs a layout analysis for a document scanned by input unit 12 , and determines a layout structure of the document as image data. More specifically, the document structure analysis unit 101 determines whether both a word and a symbol (additional information such as illusion, ruled line, or memo (hereafter referred to as annotation)) are included in the document. If annotation is included in the document, an area including character portions and an area including annotation portions are separated.
- a word and a symbol additional information such as illusion, ruled line, or memo (hereafter referred to as annotation)
- document structure analysis unit 101 For image data of a document to which annotation is added, document structure analysis unit 101 , annotation recognition unit 102 , character recognition unit 103 , and translation processing unit 104 are used to perform a translation process for annotated and character portions; wherein, a function for extracting information relating to the type of the annotation, to words in an original text to which annotation is to be added, and to the translated words for each annotation is realized. Details of the process performed in control unit 10 will be given below.
- the functions of each unit realized in control unit 10 may be realized by each individual processor, or by one processor running a plurality of software applications.
- FIG. 2 ⁇ FIG. 5 one operational example of document translation device 1 will be explained. It is to be noted that necessary information is pre-stored in translation rule table Tr shown in FIG. 4 and dictionary table Tp shown in FIG. 5 .
- FIG. 2 is a diagram showing a flow of a registration process of characteristic information.
- a user inputs a predetermined inspection to specify both the original language and the type of language to be translated, sets a document which the user wants to translate (hereinafter, such a document will be referred to as translation object document) on a scanner device, and scans the document to acquire image data (step S 10 ).
- translation object document a document which the user wants to translate
- FIG. 3 A is a diagram showing one example of an original text which constitute a translation object.
- the area including characters is identified by analyzing the document structure of the acquired image data (step S 11 ), and character information is extracted after character recognition process (step S 12 ). Then, translation process is performed on the extracted character information (step S 13 ) and the translation result is output to display unit 14 (step S 14 ).
- the dictionary data used in the translation process is set in advance. Specifically, an English-Japanese dictionary 111 , which is a standard dictionary, is selected.
- One example of the translated text is shown in FIG. 3 B .
- Control unit 10 displays on a display screen of display unit 14 a message such as “Translation completed. If there is any editing object portion, please designate it”, thereby urging a user to confirm such a portion.
- a user refers to a display screen to check whether there are any mistranslations or any portion on which an unsuitable translation process has been performed.
- a user adds an annotation, corresponding to the editing style the user desires, to the mistranslated portion (step S 15 ).
- FIG. 3 C the process will be shown in detail. In the figure, an example is shown wherein a user identifies an stable translation process at five parts in total: “big-endian (no translation)”, “little-endian (no translation)”, “osteogenesis protein”, “heroic story medal”, and “interpreter”.
- “osteogenesis protein” corresponds to “BMP”; therefore, if a user considers that direct application of the original text is the best (namely, editing “osteogenesis protein” as “BMP”), an annotation process such as underlining “osteogenesis protein” is performed.
- “interpreter” if a user desires to apply a definition given a subsequent priority among alternative words included in English-Japanese dictionary 111 , a highlight is applied to the translated “interpreter”.
- image data corresponding to the text added annotations shown in FIG. 3 C is generated and editing process for the image data (retranslation process) is initiated (step S 20 ).
- document structure analysis for the image data is performed at document stub analysis unit 101 , and character information and annotation are separated and extracted (step S 21 ).
- annotation recognition unit 102 determines for each annotation the translated portion to which annotations are added and the type of the annotations (step S 22 ). It is to be noted that, annotation is added (“image process” in the example of FIG. 3 ( b )), a character recognition process is performed to identify the character.
- the word includes a specified word “image” which is registered in a dictionary table Tp; therefore, a dictionary is used in the order of English-Japanese dictionary 111 , Japanese-English dictionary 112 , and Image processing term dictionary 113 .
- the previously used English-Japanese dictionary 111 is excluded as a candidate.
- Japanese-English dictionary 112 which is next in order of priority is excluded, because the dictionary is used only for Japanese-English translation Consequently, it is determined that the translation process is performed by applying image processing term dictionary 113 which is third in order of priority to the editing object word (COM). As a result, “CGM (Computer Graphic Metafile)” is selected as a translation for “CGM” registered in image processing term dictionary 113 .
- a user confirms the translated document and corrects the mistranslated part by specie both the portion that is to be edited and the editing style, using an annotation.
- it is possible to acquire a translation with high quality in a short time, without placing an excessive burden on a user.
- the original text with an attached annotation is read by a scanner, and the type of the annotation and the portion to which the annotation is added are identified so that the translation style is determined (whether the original text is preferable, which dictionary is to be used, and a priority order) after referring to both translation rule table Tr and dictionary table Tp.
- translation process is omitted one time; therefore, the present embodiment is more effective in a case that a user is able to predict the part where a mistranslation is likely to happen after checking the original text.
- a document including the text may also be printed on such as a paper so that a user is able to write the annotation on the paper. In such a case, it is required to rescan the document with the annotation so that image data of the document is acquired.
- the present invention provides a translation processing method including: registering a type of annotation with a corresponding translation rule; identifying a document to be processed; extracting an annotation added to a text element from the identified document; identifying a type of the extracted annotation added to the text element; and translating the text element according to the registered translation rule corresponding to the identified type of the extracted annotation.
- a user specifies a part that is to be an edition object so that a desired translation rule is applied to the part at the time of translation, thereby improving the quality of translation.
- a translation processing method of the present invention wherein the type of annotation is registered with a corresponding translation rule in a table.
- the translation rule includes designation of a dictionary used in a translation process, or the dictionary is used according to a priority of the dictionary.
Abstract
A translation processing method comprising: registering a type of annotation with a corresponding translation rule in a table; identifying a text to be processed; extracting a type of annotation and character information from the text identified at the identifying step; identifying a text element to which the annotation extracted at the extracting step is to be added; determining a translation rule corresponding to the type of annotation by referring to the table; and translating the text element identified in the annotation identifying step, by applying the translation rule determined at the translation rule determining step is provided.
Description
- 1. Field of the Invention
- The present invention relates to a method for improving the quality of a machine translation.
- 2. Description of the Related Art
- As a result of significant advances in global electronic communication, for machine translation from one language to another is increasing. A machine translation is performed by using a computer to replace character (words) with another character (words), by analyzing the characters and applying dictionary data or a predetermined algorithm to thereby translate from a specific language to a different language. If a text is not stored in a computer-readable format, in other words, if character information is not included in the text, prior to translation process, it is necessary to perform an OCR process for reading a printed text by a scanner device, to perform a character recognition process, and to extract character information.
- One advantage of machine translation is that it is possible to translate a large amount of document in a short time; a disadvantage is that the quality of the translated document is usually of a relatively low standard. One reason for this disadvantage is that the machine translation process uses rules such as dictionary data or algorithms, and these rules are not flexibly adaptable depending on a type of a document to be translated; or example, a business document or a technical document. As a result, some of the translated words do not convey the original meaning. Therefore, to improve the quality of a machine-translated text it is necessary for a person to check the translated text and replace the unsuitable translated word to a suitable word. There exist several techniques for assisting a person related to correcting a machine-translated text. It is known to provide a technique wherein translations of specific words in an original text are displayed between the lines of the original text. It is also known to provide a technique wherein specific words in an original text and their translations are listed.
- According to the techniques described above, it is possible to display on a screen an original text in contrast with machine-translated text, thereby making it easier for a person to rewrite a machine-translated text. However, a problem exist that it is necessary for a person to manually input suitable translations for every unsuitable translation. This problem reduces any advantage of performing a machine translation.
- The present invention has been made in view of the above circumstances.
- To address the stated problems described above, the present invention provides a translation processing method including: registering a type of annotation with a corresponding translation rule in a table; identifying a document to be processed; extracting an annotation added to a text element from the identified document; identifying a type of the extracted annotation added to the text element; and translating the text element according to the registered translation rule corresponding to the identified type of the extracted annotation.
- Embodiments of the present invention will be described in detail based on the figures, wherein:
-
FIG. 1 is a diagram showing the configuration ofdocument translation device 1 according to one embodiment of the present invention. -
FIG. 2 is a diagram explaining a flow of the processes executed indocument translation device 1. -
FIG. 3 A is a diagram showing one example of an original text which is a translation object, -
FIG. 3 B is a diagram showing one example of a text during processing of a translations -
FIG. 3 C is a diagram showing one example of a text during a process of being edited -
FIG. 3 D is a diagram showing one example of a text that is being edited. -
FIG. 4 is a diagram showing correspondences between types of annotations and editing styles. -
FIG. 5 is a diagram showing a table of correspondences between a designated word, a dictionary to be used, and a priority order of dictionaries to be used - Refining next to the drawings, preferred embodiments of the present invention will be explained
FIG. 1 is a diagram showing the functional configuration ofdocument translation device 1 according to one embodiment of the present invention. As shown in the figure,document translation device 1 having: acontrol unit 10; amemory 11; aninput unit 12; anoperation unit 13; adisplay unit 14; and anoutput unit 15.Control unit 10 has a processor for causing a CPU to control each unit indocument translation device 1. Furthermore,control unit 10 has a documentstructure analysis unit 101, anannotation recognition unit 102, acharacter recognition unit 103, and atranslation processing unit 104. Documentstructure analysis unit 101, using a predetermined algorithm, performs a layout analysis for a document scanned byinput unit 12, and determines a layout structure of the document as image data. More specifically, the documentstructure analysis unit 101 determines whether both a word and a symbol (additional information such as illusion, ruled line, or memo (hereafter referred to as annotation)) are included in the document. If annotation is included in the document, an area including character portions and an area including annotation portions are separated. -
Annotation recognition unit 102 performs a predetermined analysis process of image data of an area, excluding separated and extracted characters, to determine the type of annotation and the portion where the annotation is added (namely, elements that form a text such as a word and a term). The type of annotation that is extracted includes items such as a sticky tag, a moving border, an underline, a highlight, a leader line, and a note (words inserted between lines of an original text). Information relating to a type of annotation and a portion to which the annotation is to be added are stored inmemory 11.Character recognition unit 103 performs a character recognition process on an area separated and extracted by documentstructure analysis unit 101 and extracts character information (a lexical token) to store them inmemory 11.Translation processing unit 104 uses dictionary data stored inmemory 11 and a predetermined algorithm to substitute character information extracted bycharacter recognition unit 103 so as to perform a translation process in which the language of the document is translated to a language specified by a user. The text data being translated and the relations between the words in an original text and the words in translation are stored inmemory 11. - For image data of a document to which annotation is added, document
structure analysis unit 101,annotation recognition unit 102,character recognition unit 103, andtranslation processing unit 104 are used to perform a translation process for annotated and character portions; wherein, a function for extracting information relating to the type of the annotation, to words in an original text to which annotation is to be added, and to the translated words for each annotation is realized. Details of the process performed incontrol unit 10 will be given below. The functions of each unit realized incontrol unit 10 may be realized by each individual processor, or by one processor running a plurality of software applications. -
Memory 11 is a storage device such as RAM, ROM, and hard disk; the memory stores dictionary database DB or other reference data used when performing the above process atcontrol unit 10. As shown inFIG. 1 , database DB storesvarious dictionary data 111˜115 which may be used in a translation process. Database DB further stores translation rule table Tr (described in detail later) storing a type of annotation in correspondence with an editing style. Database DB further stores dictionary table Tp (described in detail later) storing the correspondence between a specific word and a priority order in which dictionaries are to be used in translating the word. -
Input unit 12 refers to, for example, a scanner device which scans documents printed on paper as digital image data and provides the data to bothcontrol unit 10 andmemory 11. Operation unit 13 b refers to an input device such as a keyboard or a mouse; the operation unit is used when a user ofdocument translation device 1 specifies a document to be translated, writes information in a dictionary table Tp and a translation rule table Tr, specifies a portion to be edited, or inputs any other necessary information. The input instruction or information is provided to controlunit 10.Display unit 14 has a processor for drawing (not shown) and a display device such as a liquid crystal display (not shown); the display unit, when given an instruction fromcontrol unit 10, displays on a screen an original text, a document undergoing translation, or various types of messages for a user. A user refers to a display screen ofdisplay unit 14 and inputs instructions throughinput unit 12 so as to havedocument translation device 1 executing various processes.Output unit 15 is a printer for printing the edited script on paper, a communication interface for providing to a printing device text data acquired after additional information editing pr s have been performed, or a storage device for storing text data in a storage medium such as a flash memory or a CD-ROM. - Referring next to
FIG. 2 ˜FIG. 5 , one operational example ofdocument translation device 1 will be explained. It is to be noted that necessary information is pre-stored in translation rule table Tr shown inFIG. 4 and dictionary table Tp shown inFIG. 5 . -
FIG. 2 is a diagram showing a flow of a registration process of characteristic information. As shown in the figure, a user inputs a predetermined inspection to specify both the original language and the type of language to be translated, sets a document which the user wants to translate (hereinafter, such a document will be referred to as translation object document) on a scanner device, and scans the document to acquire image data (step S10). In the description below, an example is given with respect to a case wherein English text is translated into Japanese.FIG. 3 A is a diagram showing one example of an original text which constitute a translation object. Referring again toFIG. 2 , the area including characters is identified by analyzing the document structure of the acquired image data (step S11), and character information is extracted after character recognition process (step S12). Then, translation process is performed on the extracted character information (step S13) and the translation result is output to display unit 14 (step S14). It is to be noted that the dictionary data used in the translation process is set in advance. Specifically, an English-Japanese dictionary 111, which is a standard dictionary, is selected. One example of the translated text is shown inFIG. 3 B .Control unit 10 displays on a display screen of display unit 14 a message such as “Translation completed. If there is any editing object portion, please designate it”, thereby urging a user to confirm such a portion. - Referring again to
FIG. 2 , a user refers to a display screen to check whether there are any mistranslations or any portion on which an unsuitable translation process has been performed. When identifying a mistranslation, a user adds an annotation, corresponding to the editing style the user desires, to the mistranslated portion (step S15). Referring toFIG. 3 C , the process will be shown in detail. In the figure, an example is shown wherein a user identifies an stable translation process at five parts in total: “big-endian (no translation)”, “little-endian (no translation)”, “osteogenesis protein”, “heroic story medal”, and “interpreter”. The “big-endian” and “little-endian” are technical computer terms; therefore, no suitable translation is included in English-Japanese dictionary 111 used in the translation process. For this reason, a term “no suitable word exists” is added to the text “Osteogenesis protein”, “heroic story medal”, and “interpreter” are incorrectly translated as “BMP”, “CGW”, and “interpretation”, respectively. When identifying a mistranslation, as an editing object portion, a user adds a predetermined annotation to the translation by use of a mouse or a keyboard. - More specifically, as shown in
FIG. 4 , annotation corresponding to the editing style that a user desires is added. For example, when a user wishes to keep “big-endian” and “little-endian” as they are, because they are technical computer term a and are usually used in their original language (namely, the user wishes to edit “big-endian (no translation)” as “big-endian” and “little-endian (no translation)” as “little-endian”), moving borders are added to the words as an annotation. In the original text, “osteogenesis protein” corresponds to “BMP”; therefore, if a user considers that direct application of the original text is the best (namely, editing “osteogenesis protein” as “BMP”), an annotation process such as underlining “osteogenesis protein” is performed. As for “interpreter”, if a user desires to apply a definition given a subsequent priority among alternative words included in English-Japanese dictionary 111, a highlight is applied to the translated “interpreter”. As for “heroic story medal”, when a user selects a dictionary suited to the field of the document and wishes to apply a translation registered in the dictionary (such as “CGM (Computer Graphic Metafile)”), a leader line and a word designating the field of the document (in the present case, “image processing”) are added as annotation The annotation may also be displayed around the translated text as shown in the display screen ofFIG. 3 C , so that a user is able to keep in mind the corresponding section of the application By checking the correspondence shown inFIG. 4 , a user is able to identify the type of annotation corresponding to the desired editing style. - Referring again to
FIG. 2 , when a user inputs a predetermined instruction to determine an editing object portion and its annotation and complete the process of adding desired annotation to the desired editing object portion, image data corresponding to the text added annotations shown inFIG. 3 C is generated and editing process for the image data (retranslation process) is initiated (step S20). Then, document structure analysis for the image data is performed at documentstub analysis unit 101, and character information and annotation are separated and extracted (step S21). Following step S21,annotation recognition unit 102 determines for each annotation the translated portion to which annotations are added and the type of the annotations (step S22). It is to be noted that, annotation is added (“image process” in the example ofFIG. 3 (b)), a character recognition process is performed to identify the character. - The process then proceeds to step S23, wherein, a translation rule table Tr is referred to and the editing style corresponding to the identified annotation type is determined. In this step, when a note is identified in the table as an annotation, the document structure analysis unit refers to a dictionary table Tp to determine the dictionary corresponding to the character included in the note and the priority order for using each dictionary.
FIG. 5 illustrates the storage contents of a dictionary table Tp. As shown in the figure, dictionary table Tp is registered with a usable dictionary and its priority order in correspondence with a specified word. For example, if a note of “image processing” is added, the word includes a specified word “image” which is registered in a dictionary table Tp; therefore, a dictionary is used in the order of English-Japanese dictionary 111, Japanese-English dictionary 112, and Imageprocessing term dictionary 113. In other words, for translating the word which is the object of the note (“heroic story medal” in the example ofFIG. 3 C ; referred to CGM in an original text), the previously used English-Japanese dictionary 111 is excluded as a candidate. “Japanese-English dictionary 112” which is next in order of priority is excluded, because the dictionary is used only for Japanese-English translation Consequently, it is determined that the translation process is performed by applying imageprocessing term dictionary 113 which is third in order of priority to the editing object word (COM). As a result, “CGM (Computer Graphic Metafile)” is selected as a translation for “CGM” registered in imageprocessing term dictionary 113. - Refer again to
FIG. 2 , when the editing style is determined, an editing process in accordance with the editing style (translation process) is performed (step S24).FIG. 3 D shows a text wherein the above described editing object portions (five in total) are each edited in accordance with a corresponding editing style.Control unit 10 then displays on a display screen of display unit 14 a message such as “Editing (retranslation) process is completed. To add any editing object portions, please specify them again”, thereby encouraging a user to check the editing result. In the case of determining that the editing was not satisfactory or indicating another mistranslation in another part of the text, a user inputs a predetermined instruction. In response to the instruction, the process returns to step S15 ofFIG. 2 so as to again accept the designation of editing object portion. When satisfied with the edited contents, the user inputs a predetermined instruction to terminate the translation process. The accepted translation is output in a predetermined manner (step S25). - As described above, by using
document translation device 1, a user confirms the translated document and corrects the mistranslated part by specie both the portion that is to be edited and the editing style, using an annotation. Thus, it is possible to acquire a translation with high quality in a short time, without placing an excessive burden on a user. - <Modifications>
- The present invention is not limited to the embodiments described above, and may be modified in various ways. The modifications will be shown below. In the embodiments described above, a standard dictionary (English-Japanese dictionary 111) is used by
document translation device 1 for performing a translation process (temporarily translation process) and a user specifies an editing object portion after checking the translation result; in another embodiment an annotation may also be added to an original text and the translation process may be performed on the basis of the annotation. Namely, the original text with an attached annotation is read by a scanner, and the type of the annotation and the portion to which the annotation is added are identified so that the translation style is determined (whether the original text is preferable, which dictionary is to be used, and a priority order) after referring to both translation rule table Tr and dictionary table Tp. In this embodiment, translation process is omitted one time; therefore, the present embodiment is more effective in a case that a user is able to predict the part where a mistranslation is likely to happen after checking the original text. - When adding an annotation to a temporally translated text, a document including the text may also be printed on such as a paper so that a user is able to write the annotation on the paper. In such a case, it is required to rescan the document with the annotation so that image data of the document is acquired.
- Furthermore, in the embodiments described above, an editing (retranslation) process is performed after specifying every editing object portion; however, an editing process may also be performed each time an annotation is added to an editing object portion.
- Needless to say, the contents of a document, the type of annotation, the specific wording of a note, and the dictionary used are not limited as in the case described above.
- To address the stated problems described above, the present invention provides a translation processing method including: registering a type of annotation with a corresponding translation rule; identifying a document to be processed; extracting an annotation added to a text element from the identified document; identifying a type of the extracted annotation added to the text element; and translating the text element according to the registered translation rule corresponding to the identified type of the extracted annotation. According to an embodiment of the invention, a user specifies a part that is to be an edition object so that a desired translation rule is applied to the part at the time of translation, thereby improving the quality of translation.
- In other embodiment, a translation processing method of the present invention wherein the type of annotation is registered with a corresponding translation rule in a table.
- In an embodiment, the translation rule includes designation of a dictionary used in a translation process, or the dictionary is used according to a priority of the dictionary.
- In an embodiment, the present invention provides a document translation device comprising: memory that stores a type of annotation with a corresponding translation rule in a table; identifying part that identifies a document to be processed; extracting part that extracts a type of annotation and character information from the document identified at the identifying part; annotation identifying part that identifies a text element to which the annotation extracted at the extracting step is to be added; translation rule determining part that determines a translation rule corresponding to the type of annotation by referring to the table; and translation performing part that translates the text element identified in the annotation identifying pan, by apply the translation rule determined at the translation rule determining part.
- In an embodiment, the present invention provides a computer readable program that enable a computer to act as: a memory that stores a type of annotation with a corresponding translation rule; an identifying part that identifies a document to be processed; an extracting part that extracts an annotation added to a text element from the document identified by the identifying part; an annotation identifying part that identifies a type of the annotation added to the text element extracted by the extracting part; and translation performing part that translates the text element according to the translation rule corresponding to the type of the annotation identified by the annotation identifying part.
- The foregoing description of the embodiments of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously, many modifications and variations will be apparent to practitioners skilled in the art. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, thereby enabling others skilled in the art to understand the invention for various embodiments, and with the various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and heir equivalents.
- The entire disclosure of Japanese Patent Application No. 2005-90203 filed on Mar. 25, 2005 including specification, claims, drawings and abstract is incorporated herein by reference in its entirety.
Claims (13)
1. A translation processing method comprising:
registering a type of annotation with a corresponding translation rule;
identifying a document to be processed;
extracting an annotation added to a text element from the identified document;
identifying a type of the extracted annotation added to the text element; and
translating the text element according to the registered translation rule corresponding to the identified type of the extracted annotation.
2. The translation processing method according to claim 1 , wherein the type of annotation is registered with a corresponding translation rile in a table.
3. The translation processing method of claim 1 , wherein the translation rule includes designation of a dictionary used in a translation process.
4. The translation processing method of claim 3 , wherein the dictionary is used according to a priority of the dictionary.
5. A document translation device comprising:
a memory that stores a type of annotation with a corresponding translation rule;
an identifying part that identifies a document to be processed;
an extracting part that extracts an annotation added to a text element from the document identified by the identifying part;
an annotation identifying part that identifies a type of the annotation added to the text element extracted by the extracting pat; and
translation performing part that translates the text element according to the translation rule corresponding to the type of the annotation identified by the annotation identifying part.
6. The document translation device according to claim 5 , wherein the type of annotation is registered with a corresponding translation rule in a table.
7. The document translation device according to claim 5 , wherein the translation rule includes designation of a dictionary used in a translation process.
8. The document translation device according to claim 7 , wherein the dictionary is used according to a priority of the dictionary.
9. A computer readable program that enable a computer to act as:
a memory that stores a type of annotation with a corresponding translation rule;
an identifying part that identifies a document to be processed;
an extracting part that extracts an annotation added to a text element from the document identified by the identifying part;
an annotation identifying part that identifies a type of the annotation added to the text element extracted by the extracting part; and
translation performing part that translates the text element according to the translation rule corresponding to the type of the annotation identified by the annotation identifying part.
10. The computer readable program according to claim 9 , wherein the type of annotation is registered with a corresponding translation rule in a table.
11. The computer readable program according to claim 9 , wherein the translation rule includes designation of a dictionary used in a translation process.
12. The computer readable program according to claim 11 , wherein the dictionary is used according to a priority of the dictionary.
13. A translation processing method comprising:
registering a type of annotation with a corresponding translation rule in a table;
identifying a document to be processed;
extracting a type of annotation and character information from the document identified at the identifying step;
identifying a text element to which the annotation extracted at the extracting step is to be added;
determining a translation rule corresponding to the type of annotation by referring to the table; and
translating the text element identified in the annotation identifying step, by applying the translation rule determined at the translation rule determining step.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2005090203A JP2006276915A (en) | 2005-03-25 | 2005-03-25 | Translating processing method, document translating device and program |
JP2005-090203 | 2005-03-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
US20060217956A1 true US20060217956A1 (en) | 2006-09-28 |
Family
ID=37015511
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/197,508 Abandoned US20060217956A1 (en) | 2005-03-25 | 2005-08-05 | Translation processing method, document translation device, and programs |
Country Status (3)
Country | Link |
---|---|
US (1) | US20060217956A1 (en) |
JP (1) | JP2006276915A (en) |
CN (1) | CN1838113A (en) |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060218484A1 (en) * | 2005-03-25 | 2006-09-28 | Fuji Xerox Co., Ltd. | Document editing method, document editing device, and storage medium |
US20070172130A1 (en) * | 2006-01-25 | 2007-07-26 | Konstantin Zuev | Structural description of a document, a method of describing the structure of graphical objects and methods of object recognition. |
US20090125542A1 (en) * | 2007-11-14 | 2009-05-14 | Sap Ag | Systems and Methods for Modular Information Extraction |
US20090132477A1 (en) * | 2006-01-25 | 2009-05-21 | Konstantin Zuev | Methods of object search and recognition. |
US20090158137A1 (en) * | 2007-12-14 | 2009-06-18 | Ittycheriah Abraham P | Prioritized Incremental Asynchronous Machine Translation of Structured Documents |
US20100057439A1 (en) * | 2008-08-27 | 2010-03-04 | Fujitsu Limited | Portable storage medium storing translation support program, translation support system and translation support method |
US20110013806A1 (en) * | 2006-01-25 | 2011-01-20 | Abbyy Software Ltd | Methods of object search and recognition |
US20130304452A1 (en) * | 2012-05-14 | 2013-11-14 | International Business Machines Corporation | Management of language usage to facilitate effective communication |
CN103500158A (en) * | 2013-10-08 | 2014-01-08 | 北京百度网讯科技有限公司 | Method and device for annotating electronic document |
US20140250219A1 (en) * | 2012-05-30 | 2014-09-04 | Douglas Hwang | Synchronizing translated digital content |
CN104125548A (en) * | 2013-04-27 | 2014-10-29 | 中国移动通信集团公司 | Method of translating conversation language, device and system |
US8908969B2 (en) | 2006-08-01 | 2014-12-09 | Abbyy Development Llc | Creating flexible structure descriptions |
US9015573B2 (en) | 2003-03-28 | 2015-04-21 | Abbyy Development Llc | Object recognition and describing structure of graphical objects |
US9224040B2 (en) | 2003-03-28 | 2015-12-29 | Abbyy Development Llc | Method for object recognition and describing structure of graphical objects |
JP2016062452A (en) * | 2014-09-19 | 2016-04-25 | 富士ゼロックス株式会社 | Information processing apparatus and program |
US20160147745A1 (en) * | 2014-11-26 | 2016-05-26 | Naver Corporation | Content participation translation apparatus and method |
US9881003B2 (en) * | 2015-09-23 | 2018-01-30 | Google Llc | Automatic translation of digital graphic novels |
US10262117B2 (en) * | 2014-10-29 | 2019-04-16 | Ricoh Company, Limited | Information processing system, information processing apparatus, and information processing method |
US10691326B2 (en) | 2013-03-15 | 2020-06-23 | Google Llc | Document scale and position optimization |
US20200210530A1 (en) * | 2018-12-28 | 2020-07-02 | Anshuman Mishra | Systems, methods, and storage media for automatically translating content using a hybrid language |
US11074400B2 (en) * | 2019-09-30 | 2021-07-27 | Dropbox, Inc. | Collaborative in-line content item annotations |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101620680B (en) * | 2008-07-03 | 2014-06-25 | 三星电子株式会社 | Recognition and translation method of character image and device |
CN102227723B (en) | 2008-11-27 | 2013-10-09 | 国际商业机器公司 | Device and method for supporting detection of mistranslation |
CN102495835A (en) * | 2011-10-21 | 2012-06-13 | 传神联合(北京)信息技术有限公司 | Tag protection method |
Citations (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4623985A (en) * | 1980-04-15 | 1986-11-18 | Sharp Kabushiki Kaisha | Language translator with circuitry for detecting and holding words not stored in dictionary ROM |
US4791587A (en) * | 1984-12-25 | 1988-12-13 | Kabushiki Kaisha Toshiba | System for translation of sentences from one language to another |
US4954984A (en) * | 1985-02-12 | 1990-09-04 | Hitachi, Ltd. | Method and apparatus for supplementing translation information in machine translation |
US5111398A (en) * | 1988-11-21 | 1992-05-05 | Xerox Corporation | Processing natural language text using autonomous punctuational structure |
US5214583A (en) * | 1988-11-22 | 1993-05-25 | Kabushiki Kaisha Toshiba | Machine language translation system which produces consistent translated words |
US5222160A (en) * | 1989-12-28 | 1993-06-22 | Fujitsu Limited | Document revising system for use with document reading and translating system |
US5303151A (en) * | 1993-02-26 | 1994-04-12 | Microsoft Corporation | Method and system for translating documents using translation handles |
US5349368A (en) * | 1986-10-24 | 1994-09-20 | Kabushiki Kaisha Toshiba | Machine translation method and apparatus |
US5361205A (en) * | 1991-08-01 | 1994-11-01 | Fujitsu Limited | Apparatus for translating lingual morphemes as well as the typographical morphemes attached thereto |
US5528491A (en) * | 1992-08-31 | 1996-06-18 | Language Engineering Corporation | Apparatus and method for automated natural language translation |
US5541837A (en) * | 1990-11-15 | 1996-07-30 | Canon Kabushiki Kaisha | Method and apparatus for further translating result of translation |
USRE35464E (en) * | 1986-11-28 | 1997-02-25 | Sharp Kabushiki Kaisha | Apparatus and method for translating sentences containing punctuation marks |
US5687383A (en) * | 1994-09-30 | 1997-11-11 | Kabushiki Kaisha Toshiba | Translation rule learning scheme for machine translation |
US5692073A (en) * | 1996-05-03 | 1997-11-25 | Xerox Corporation | Formless forms and paper web using a reference-based mark extraction technique |
US5970455A (en) * | 1997-03-20 | 1999-10-19 | Xerox Corporation | System for capturing and retrieving audio data and corresponding hand-written notes |
US5974371A (en) * | 1996-03-21 | 1999-10-26 | Sharp Kabushiki Kaisha | Data processor for selectively translating only newly received text data |
US6163785A (en) * | 1992-09-04 | 2000-12-19 | Caterpillar Inc. | Integrated authoring and translation system |
US6167366A (en) * | 1996-12-10 | 2000-12-26 | Johnson; William J. | System and method for enhancing human communications |
US6182027B1 (en) * | 1997-12-24 | 2001-01-30 | International Business Machines Corporation | Translation method and system |
US6208956B1 (en) * | 1996-05-28 | 2001-03-27 | Ricoh Company, Ltd. | Method and system for translating documents using different translation resources for different portions of the documents |
US6278967B1 (en) * | 1992-08-31 | 2001-08-21 | Logovista Corporation | Automated system for generating natural language translations that are domain-specific, grammar rule-based, and/or based on part-of-speech analysis |
US20010029455A1 (en) * | 2000-03-31 | 2001-10-11 | Chin Jeffrey J. | Method and apparatus for providing multilingual translation over a network |
US6470306B1 (en) * | 1996-04-23 | 2002-10-22 | Logovista Corporation | Automated translation of annotated text based on the determination of locations for inserting annotation tokens and linked ending, end-of-sentence or language tokens |
US20020169592A1 (en) * | 2001-05-11 | 2002-11-14 | Aityan Sergey Khachatur | Open environment for real-time multilingual communication |
US6900819B2 (en) * | 2001-09-14 | 2005-05-31 | Fuji Xerox Co., Ltd. | Systems and methods for automatic emphasis of freeform annotations |
US20050149316A1 (en) * | 2003-03-14 | 2005-07-07 | Fujitsu Limited | Translation support device |
US6996520B2 (en) * | 2002-11-22 | 2006-02-07 | Transclick, Inc. | Language translation system and method using specialized dictionaries |
US20060100849A1 (en) * | 2002-09-30 | 2006-05-11 | Ning-Ping Chan | Pointer initiated instant bilingual annotation on textual information in an electronic document |
US20060167992A1 (en) * | 2005-01-07 | 2006-07-27 | At&T Corp. | System and method for text translations and annotation in an instant messaging session |
US20060277332A1 (en) * | 2002-12-18 | 2006-12-07 | Yukihisa Yamashina | Translation support system and program thereof |
US7369986B2 (en) * | 2003-08-21 | 2008-05-06 | International Business Machines Corporation | Method, apparatus, and program for transliteration of documents in various Indian languages |
-
2005
- 2005-03-25 JP JP2005090203A patent/JP2006276915A/en active Pending
- 2005-08-05 US US11/197,508 patent/US20060217956A1/en not_active Abandoned
- 2005-09-06 CN CNA2005101026029A patent/CN1838113A/en active Pending
Patent Citations (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4623985A (en) * | 1980-04-15 | 1986-11-18 | Sharp Kabushiki Kaisha | Language translator with circuitry for detecting and holding words not stored in dictionary ROM |
US4791587A (en) * | 1984-12-25 | 1988-12-13 | Kabushiki Kaisha Toshiba | System for translation of sentences from one language to another |
US4954984A (en) * | 1985-02-12 | 1990-09-04 | Hitachi, Ltd. | Method and apparatus for supplementing translation information in machine translation |
US5349368A (en) * | 1986-10-24 | 1994-09-20 | Kabushiki Kaisha Toshiba | Machine translation method and apparatus |
USRE35464E (en) * | 1986-11-28 | 1997-02-25 | Sharp Kabushiki Kaisha | Apparatus and method for translating sentences containing punctuation marks |
US5111398A (en) * | 1988-11-21 | 1992-05-05 | Xerox Corporation | Processing natural language text using autonomous punctuational structure |
US5214583A (en) * | 1988-11-22 | 1993-05-25 | Kabushiki Kaisha Toshiba | Machine language translation system which produces consistent translated words |
US5222160A (en) * | 1989-12-28 | 1993-06-22 | Fujitsu Limited | Document revising system for use with document reading and translating system |
US5541837A (en) * | 1990-11-15 | 1996-07-30 | Canon Kabushiki Kaisha | Method and apparatus for further translating result of translation |
US5361205A (en) * | 1991-08-01 | 1994-11-01 | Fujitsu Limited | Apparatus for translating lingual morphemes as well as the typographical morphemes attached thereto |
US6278967B1 (en) * | 1992-08-31 | 2001-08-21 | Logovista Corporation | Automated system for generating natural language translations that are domain-specific, grammar rule-based, and/or based on part-of-speech analysis |
US5528491A (en) * | 1992-08-31 | 1996-06-18 | Language Engineering Corporation | Apparatus and method for automated natural language translation |
US6163785A (en) * | 1992-09-04 | 2000-12-19 | Caterpillar Inc. | Integrated authoring and translation system |
US6658627B1 (en) * | 1992-09-04 | 2003-12-02 | Caterpillar Inc | Integrated and authoring and translation system |
US5303151A (en) * | 1993-02-26 | 1994-04-12 | Microsoft Corporation | Method and system for translating documents using translation handles |
US5687383A (en) * | 1994-09-30 | 1997-11-11 | Kabushiki Kaisha Toshiba | Translation rule learning scheme for machine translation |
US5974371A (en) * | 1996-03-21 | 1999-10-26 | Sharp Kabushiki Kaisha | Data processor for selectively translating only newly received text data |
US6470306B1 (en) * | 1996-04-23 | 2002-10-22 | Logovista Corporation | Automated translation of annotated text based on the determination of locations for inserting annotation tokens and linked ending, end-of-sentence or language tokens |
US5692073A (en) * | 1996-05-03 | 1997-11-25 | Xerox Corporation | Formless forms and paper web using a reference-based mark extraction technique |
US6208956B1 (en) * | 1996-05-28 | 2001-03-27 | Ricoh Company, Ltd. | Method and system for translating documents using different translation resources for different portions of the documents |
US6167366A (en) * | 1996-12-10 | 2000-12-26 | Johnson; William J. | System and method for enhancing human communications |
US5970455A (en) * | 1997-03-20 | 1999-10-19 | Xerox Corporation | System for capturing and retrieving audio data and corresponding hand-written notes |
US6182027B1 (en) * | 1997-12-24 | 2001-01-30 | International Business Machines Corporation | Translation method and system |
US20010029455A1 (en) * | 2000-03-31 | 2001-10-11 | Chin Jeffrey J. | Method and apparatus for providing multilingual translation over a network |
US20020169592A1 (en) * | 2001-05-11 | 2002-11-14 | Aityan Sergey Khachatur | Open environment for real-time multilingual communication |
US6900819B2 (en) * | 2001-09-14 | 2005-05-31 | Fuji Xerox Co., Ltd. | Systems and methods for automatic emphasis of freeform annotations |
US20060100849A1 (en) * | 2002-09-30 | 2006-05-11 | Ning-Ping Chan | Pointer initiated instant bilingual annotation on textual information in an electronic document |
US6996520B2 (en) * | 2002-11-22 | 2006-02-07 | Transclick, Inc. | Language translation system and method using specialized dictionaries |
US20060277332A1 (en) * | 2002-12-18 | 2006-12-07 | Yukihisa Yamashina | Translation support system and program thereof |
US20050149316A1 (en) * | 2003-03-14 | 2005-07-07 | Fujitsu Limited | Translation support device |
US7369986B2 (en) * | 2003-08-21 | 2008-05-06 | International Business Machines Corporation | Method, apparatus, and program for transliteration of documents in various Indian languages |
US20060167992A1 (en) * | 2005-01-07 | 2006-07-27 | At&T Corp. | System and method for text translations and annotation in an instant messaging session |
Cited By (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9224040B2 (en) | 2003-03-28 | 2015-12-29 | Abbyy Development Llc | Method for object recognition and describing structure of graphical objects |
US9015573B2 (en) | 2003-03-28 | 2015-04-21 | Abbyy Development Llc | Object recognition and describing structure of graphical objects |
US7844893B2 (en) * | 2005-03-25 | 2010-11-30 | Fuji Xerox Co., Ltd. | Document editing method, document editing device, and storage medium |
US20060218484A1 (en) * | 2005-03-25 | 2006-09-28 | Fuji Xerox Co., Ltd. | Document editing method, document editing device, and storage medium |
US8750571B2 (en) | 2006-01-25 | 2014-06-10 | Abbyy Development Llc | Methods of object search and recognition |
US20070172130A1 (en) * | 2006-01-25 | 2007-07-26 | Konstantin Zuev | Structural description of a document, a method of describing the structure of graphical objects and methods of object recognition. |
US20090132477A1 (en) * | 2006-01-25 | 2009-05-21 | Konstantin Zuev | Methods of object search and recognition. |
US20110013806A1 (en) * | 2006-01-25 | 2011-01-20 | Abbyy Software Ltd | Methods of object search and recognition |
US8571262B2 (en) | 2006-01-25 | 2013-10-29 | Abbyy Development Llc | Methods of object search and recognition |
US8908969B2 (en) | 2006-08-01 | 2014-12-09 | Abbyy Development Llc | Creating flexible structure descriptions |
US7987416B2 (en) * | 2007-11-14 | 2011-07-26 | Sap Ag | Systems and methods for modular information extraction |
US20090125542A1 (en) * | 2007-11-14 | 2009-05-14 | Sap Ag | Systems and Methods for Modular Information Extraction |
US9418061B2 (en) * | 2007-12-14 | 2016-08-16 | International Business Machines Corporation | Prioritized incremental asynchronous machine translation of structured documents |
US20090158137A1 (en) * | 2007-12-14 | 2009-06-18 | Ittycheriah Abraham P | Prioritized Incremental Asynchronous Machine Translation of Structured Documents |
US20100057439A1 (en) * | 2008-08-27 | 2010-03-04 | Fujitsu Limited | Portable storage medium storing translation support program, translation support system and translation support method |
US9460082B2 (en) * | 2012-05-14 | 2016-10-04 | International Business Machines Corporation | Management of language usage to facilitate effective communication |
US20130304452A1 (en) * | 2012-05-14 | 2013-11-14 | International Business Machines Corporation | Management of language usage to facilitate effective communication |
US9442916B2 (en) * | 2012-05-14 | 2016-09-13 | International Business Machines Corporation | Management of language usage to facilitate effective communication |
US9317500B2 (en) * | 2012-05-30 | 2016-04-19 | Audible, Inc. | Synchronizing translated digital content |
US20140250219A1 (en) * | 2012-05-30 | 2014-09-04 | Douglas Hwang | Synchronizing translated digital content |
US10691326B2 (en) | 2013-03-15 | 2020-06-23 | Google Llc | Document scale and position optimization |
CN104125548A (en) * | 2013-04-27 | 2014-10-29 | 中国移动通信集团公司 | Method of translating conversation language, device and system |
CN103500158A (en) * | 2013-10-08 | 2014-01-08 | 北京百度网讯科技有限公司 | Method and device for annotating electronic document |
JP2016062452A (en) * | 2014-09-19 | 2016-04-25 | 富士ゼロックス株式会社 | Information processing apparatus and program |
US10262117B2 (en) * | 2014-10-29 | 2019-04-16 | Ricoh Company, Limited | Information processing system, information processing apparatus, and information processing method |
US10713444B2 (en) | 2014-11-26 | 2020-07-14 | Naver Webtoon Corporation | Apparatus and method for providing translations editor |
US9881008B2 (en) * | 2014-11-26 | 2018-01-30 | Naver Corporation | Content participation translation apparatus and method |
US20160147746A1 (en) * | 2014-11-26 | 2016-05-26 | Naver Corporation | Content participation translation apparatus and method |
US10496757B2 (en) | 2014-11-26 | 2019-12-03 | Naver Webtoon Corporation | Apparatus and method for providing translations editor |
US20160147745A1 (en) * | 2014-11-26 | 2016-05-26 | Naver Corporation | Content participation translation apparatus and method |
US10733388B2 (en) * | 2014-11-26 | 2020-08-04 | Naver Webtoon Corporation | Content participation translation apparatus and method |
US9881003B2 (en) * | 2015-09-23 | 2018-01-30 | Google Llc | Automatic translation of digital graphic novels |
US20200210530A1 (en) * | 2018-12-28 | 2020-07-02 | Anshuman Mishra | Systems, methods, and storage media for automatically translating content using a hybrid language |
US11074400B2 (en) * | 2019-09-30 | 2021-07-27 | Dropbox, Inc. | Collaborative in-line content item annotations |
US20210326516A1 (en) * | 2019-09-30 | 2021-10-21 | Dropbox, Inc. | Collaborative in-line content item annotations |
US11537784B2 (en) * | 2019-09-30 | 2022-12-27 | Dropbox, Inc. | Collaborative in-line content item annotations |
US20230111739A1 (en) * | 2019-09-30 | 2023-04-13 | Dropbox, Inc. | Collaborative in-line content item annotations |
US11768999B2 (en) * | 2019-09-30 | 2023-09-26 | Dropbox, Inc. | Collaborative in-line content item annotations |
Also Published As
Publication number | Publication date |
---|---|
JP2006276915A (en) | 2006-10-12 |
CN1838113A (en) | 2006-09-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20060217956A1 (en) | Translation processing method, document translation device, and programs | |
US7783472B2 (en) | Document translation method and document translation device | |
US7844893B2 (en) | Document editing method, document editing device, and storage medium | |
US7712028B2 (en) | Using annotations for summarizing a document image and itemizing the summary based on similar annotations | |
US20060217958A1 (en) | Electronic device and recording medium | |
US20060285746A1 (en) | Computer assisted document analysis | |
US10884771B2 (en) | Method and device for displaying multi-language typesetting, browser, terminal and computer readable storage medium | |
US20060217959A1 (en) | Translation processing method, document processing device and storage medium storing program | |
US20040202352A1 (en) | Enhanced readability with flowed bitmaps | |
JP4999938B2 (en) | Document image generation apparatus, document image generation method, and computer program | |
US20160124813A1 (en) | Restoration of modified document to original state | |
JP5528420B2 (en) | Translation apparatus, translation method, and computer program | |
JP2006268372A (en) | Translation device, image processor, image forming device, translation method and program | |
Elanwar et al. | Extracting text from scanned Arabic books: a large-scale benchmark dataset and a fine-tuned Faster-R-CNN model | |
JP2008282094A (en) | Character recognition processing apparatus | |
JP5483526B2 (en) | Machine translation system and machine translation method | |
JP5604276B2 (en) | Document image generation apparatus and document image generation method | |
CN113177421A (en) | Method, device, equipment and storage medium for quality inspection of translation document | |
CN112364640A (en) | Entity noun linking method, device, computer equipment and storage medium | |
JP4350566B2 (en) | Machine translation system | |
CN117391045B (en) | Method for outputting file with portable file format capable of copying Mongolian | |
JP2013182459A (en) | Information processing apparatus, information processing method, and program | |
JP2006277108A (en) | Information providing method, document editing device and program | |
JP2005208687A (en) | Multi-lingual document processor and program | |
JP2021163159A (en) | Sentence extraction apparatus and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: FUJI XEROX CO., LTD., JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:NAGAO, TAKASHI;TATENO, MASAKAZU;TANAKA, KEI;AND OTHERS;REEL/FRAME:016865/0057;SIGNING DATES FROM 20050707 TO 20050719 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |