WO1999028831A1 - Preformatted allographic arabic text for html documents - Google Patents
Preformatted allographic arabic text for html documents Download PDFInfo
- Publication number
- WO1999028831A1 WO1999028831A1 PCT/US1998/025201 US9825201W WO9928831A1 WO 1999028831 A1 WO1999028831 A1 WO 1999028831A1 US 9825201 W US9825201 W US 9825201W WO 9928831 A1 WO9928831 A1 WO 9928831A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- false
- text
- text data
- call
- allographic
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
- G06F40/129—Handling non-Latin characters, e.g. kana-to-kanji conversion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/53—Processing of non-Latin text
Definitions
- the present invention relates to a method and apparatus for rendering allographic Arabic text on internet or HTML (hyper-text markup language) documents such as home pages and web sites.
- HTML hyper-text markup language
- Typical home pages and web sites which are currently in use are designed for Roman or Latin script languages such as English.
- Arabic script is very different from Roman script in a number of ways.
- Arabic is written from right to left instead of from left to right, which causes, of course, significant differences with respect to justification and line wrapping.
- the form of Arabic letters depends upon the position of the letter within the word. The same Arabic letter may have a different form depending upon whether the letter or character is a first character, a middle character, or an end character, or if the character stands alone.
- the forms are typically referred to as initial, medial, final, and stand-alone. These multiple forms are referred to as allographs.
- the rules for mapping a letter or grapheme into its allographs are called graphotactics.
- Conventional solutions for representing Arabic characters include treating them either as graphics or as text. Commonly, the most current solution is to treat Arabic text documents as graphical images. Such a solution is platform independent, due to the fact that pixel-handling is a platform independent phenomenon.
- graphical representation of characters requires a significant amount of information and calculation, a significant amount of memory is necessary, and a significant amount of time is necessary in order to render the characters.
- Every letter is represented as one character regardless of its position, and a specialized operating system or program is capable of dealing with appropriate right-left directional representation and graphotactics.
- the platform-dependent nature of this solution creates significant limitations on its viability. While home pages or documents can be created using, for example, Arabic Windows (tm), such documents can only viewed by computers having Arabic Windows (tm), or having a localized version of a browser capable of dealing with the directional representation and graphotactics.
- the present invention is a method of treating Arabic text as text, in a manner which is platform independent, efficient from a memory and CPU speed perspective, is compatible with any other Arabic representation system, and enables benefits to be realized from advances in Roman script web typography.
- Arabic script is considered and handled in an allographic manner; every form of a particular Arabic letter is considered to be a different character to solve the graphotactics problem, and the text is preformatted in order to solve the problems which are typically associated with the right-left direction problem.
- the present invention therefore, allows Arabic to be treated by browsers as if it were English.
- the invention encodes Arabic text in such a way that viewing would require a font only.
- the invention has been implemented in a special text editor which essentially converts a conventional computer into an Arabic text editor.
- HTML pages which are created using the invention are viewable by any computer that has access to the necessary font.
- the font can either be downloaded to the machine or embedded in the HTML document as an object, otherwise known as a dynamic font.
- the present invention therefore, includes a novel configuration for encoding Arabic text.
- the invention is embodied, therefore, as a method of displaying text from a computer readable medium, and an apparatus for displaying text from a computer readable medium, with the text being displayed onto a computer display.
- the method comprises the steps of storing text data in a graphemic format, and converting the graphemic text data comprising individual characters to allographic text data comprising individual characters using at least 8 bit encoding.
- the allographic text data is stored on a computer readable medium, and the allographic text data is then rendered on a display using a font.
- the text data is stored as Arabic script characters.
- the rendering step can comprise one-to-one mapping of the allographic text data to glyphs.
- An apparatus includes a storing device for storing text data in a graphemic format, converting means for converting the graphemic text data, using at least 8 bit encoding, and a second storing device is provided for storing the allographic text data.
- Rendering means are provided for rendering the allographic text data as text on a display using a font.
- Figure 2 illustrates a text rendering process according to the present invention
- Figure 3 illustrates Arabic text as rendered according the present invention
- Figure 4 illustrates the text of Figures 3, but without having the necessary font applied
- Figure 5 is a conventional ASCII table; and Figure 6 is a table showing 8 bit representations of Arabic characters according to the invention.
- Arabic script typically used for writing the Arabic language, can also be used for writing other languages such as Persian, Urdu, Pashto, Sindhi, and Kurdish. Arabic script appears as cursive writing, whether handwritten or printed. The resulting handwritten traditions are such that the same letter may be written in different forms, depending upon how the character joins with neighboring characters. Typically, the encoding is such that a total of 54-58 graphemes are encoded as approximately 109 allographs according to the present invention. These graphemes include 28 basic characters, 8 to 12 additional characters, 10 numerals, and 8 vowel characters. According to prior art such as Unicode, each letter receives only one Unicode character value, independent of the form.
- the Unicode standard encodes characters, and the characters are resident as memory strings in internal memory or on disk storage.
- a character When a character is rendered on a screen, it is called a glyph.
- a character can render as one glyph, or part of one glyph.
- the specific glyph corresponding to the appropriate letter form is determined by the character context.
- a repertoire of glyphs in Unicode comprises a font. Referring to Figure 1, the Unicode standard, as an example, contains information wherein characters are encoded, and stored on a disk as characters in a graphemic format.
- the characters are taken from memory and subjected to a text conversion/rendering process wherein a set of conversion rules and a font are applied to the characters on the disk, to render a document on the screen.
- the document on the screen is essentially displayed in real time as a series of glyphs.
- the present invention treats documents in a significantly different manner than the prior art text handling system such as that which is discussed above.
- a document which is stored on disk or in memory as a plurality of characters in a graphemic format is subjected to a conversion process wherein the graphemes are converted by a series of conversion rules into allographs, according to predetermined graphotactics.
- the resulting document is then stored on the disk as a series of allographs.
- a text rendering process is performed, wherein graphotactics are not used. Only one-to-one mapping is performed on the document to the screen, using a font.
- the conversion occurs only once per document, while the rendering occurs every time a user accesses the document.
- the conversion/rendering process wherein a grapheme is converted to a glyph occurs every time the document is accessed. This results in slower system operation and rendering time.
- the present invention solves this problem by using the first as well as the second halves of the extended ASCII table, which consists of 8 bits, thereby using space typically reserved for the ASCII character set.
- This provides the invention with the number of locations needed to represent at least the 109 allographs needed for good quality Arabic text publishing. This representation results in 8 bit encoding of the allographs.
- a character table according to the present invention is shown in Figure 6.
- the invention is directed to a method of and apparatus for displaying text from a computer memory onto a computer display, with the method comprising the steps of storing the text data in a graphemic format, then converting the graphemic text data, which is made up of individual characters, to allographic text data made up of individual characters, using at least 8-bit encoding.
- the allographic text data is then stored in a suitable memory.
- the final step is rendering the text on a display using a font. It should be noted that the present invention utilizes a rendering step wherein the allographic text data is mapped to glyphs in a one-to-one mapping configuration.
- the step of converting from graphemes to allographs includes a step of appropriately Preformatting the text for right-to-left display.
- Documents which are prepared in this fashion can be made available for accessing by any computer on the internet through normal web server technology.
- a client or computer wishing to view the Arabic document would access the particular page by entering in the address or "URL" for the page.
- the specific Arabic character information or font information can be loaded as a dynamic font with the page information, so that the client computer receives the necessary encoding information to properly view the Arabic characters.
- the specific Arabic character information can be downloaded separately.
- Figure 3 is an illustration of Arabic text rendered as glyphs on a computer screen through an internet browser, with the Arabic text having been encoded according to the present invention.
- the text is, as noted previously, treated in a textual format, and is capable of being searched, edited, and linked.
- Figure 4 illustrates the text as stored on disk as allographs. When subjected to the text rendering process and font information illustrated in Figure 2, the glyphs of Figure 3 are rendered. If the text rendering process were not performed using the appropriate font, the allographs would appear on the screen as illustrated in Figure 4.
- the downloading of the font or the supply of the dynamic font to the computer upon which the Arabic text is rendered ensures appropriate viewing.
- An example of the conversion rules for converting the graphemes to allographs are shown in the attached code page.
- the code page, written in Visual Basic (tm) includes data regarding features of the characters that are used by the conversion rules to determine the allographs.
- the letter "gayn" has the features of being connectable to other letters, does not act as a space, is not a vowel, and has a different form and therefore a different code for when the letter appears initially, medially, finally, or alone.
- complex determining software such as Arabic Windows (tm) which attempts to determine an appropriate form based upon a character's position, is not necessary.
- the invention is directed to a method and a computer system which displays text from a computer readable medium onto a computer display.
- the invention provides a new and unique way of preparing and viewing HTML documents which have non-Roman characters, such as Arabic.
- non-Roman characters such as Arabic.
- defir.eletrer (co ⁇ cia, False, False, False, 53, S3, 63, 535 Call defi-ele"a ( semicolon. False, False, False, 52, S3, 53, 53 ) Call defineleeeer(period. False, False, False, 63, 63, 63) Call defineleceer(questSDark, False, F-alse, False, 63, 63, 63, 63)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU17036/99A AU1703699A (en) | 1997-12-03 | 1998-12-03 | Preformatted allographic arabic text for html documents |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US6744297P | 1997-12-03 | 1997-12-03 | |
US60/067,442 | 1997-12-03 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO1999028831A1 true WO1999028831A1 (en) | 1999-06-10 |
WO1999028831A9 WO1999028831A9 (en) | 1999-09-16 |
Family
ID=22076010
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1998/025201 WO1999028831A1 (en) | 1997-12-03 | 1998-12-03 | Preformatted allographic arabic text for html documents |
Country Status (2)
Country | Link |
---|---|
AU (1) | AU1703699A (en) |
WO (1) | WO1999028831A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1298629A2 (en) * | 2001-09-26 | 2003-04-02 | MAN Nutzfahrzeuge Aktiengesellschaft | Method of displaying arabic characters in a display of a vehicle |
DE10147541A1 (en) * | 2001-09-26 | 2003-04-10 | Man Nutzfahrzeuge Ag | Representation of Greek and Hebrew letters in a display of a motor vehicle |
WO2006021973A2 (en) * | 2004-08-23 | 2006-03-02 | Geneva Software Technologies Limited | A system and a method for a sim card based multi-lingual messaging application |
WO2008042845A1 (en) * | 2006-10-02 | 2008-04-10 | Google Inc. | Displaying original text in a user interface with translated text |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4176974A (en) * | 1978-03-13 | 1979-12-04 | Middle East Software Corporation | Interactive video display and editing of text in the Arabic script |
US5412771A (en) * | 1992-02-07 | 1995-05-02 | Signature Software, Inc. | Generation of interdependent font characters based on ligature and glyph categorizations |
US5556282A (en) * | 1994-01-18 | 1996-09-17 | Middlebrook; R. David | Method for the geographical processsing of graphic language texts |
-
1998
- 1998-12-03 WO PCT/US1998/025201 patent/WO1999028831A1/en active Application Filing
- 1998-12-03 AU AU17036/99A patent/AU1703699A/en not_active Abandoned
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4176974A (en) * | 1978-03-13 | 1979-12-04 | Middle East Software Corporation | Interactive video display and editing of text in the Arabic script |
US5412771A (en) * | 1992-02-07 | 1995-05-02 | Signature Software, Inc. | Generation of interdependent font characters based on ligature and glyph categorizations |
US5556282A (en) * | 1994-01-18 | 1996-09-17 | Middlebrook; R. David | Method for the geographical processsing of graphic language texts |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1298629A2 (en) * | 2001-09-26 | 2003-04-02 | MAN Nutzfahrzeuge Aktiengesellschaft | Method of displaying arabic characters in a display of a vehicle |
DE10147541A1 (en) * | 2001-09-26 | 2003-04-10 | Man Nutzfahrzeuge Ag | Representation of Greek and Hebrew letters in a display of a motor vehicle |
DE10147540A1 (en) * | 2001-09-26 | 2003-04-10 | Man Nutzfahrzeuge Ag | Representation of Arabic script in a display of a motor vehicle |
EP1298629A3 (en) * | 2001-09-26 | 2005-06-29 | MAN Nutzfahrzeuge Aktiengesellschaft | Method of displaying arabic characters in a display of a vehicle |
WO2006021973A2 (en) * | 2004-08-23 | 2006-03-02 | Geneva Software Technologies Limited | A system and a method for a sim card based multi-lingual messaging application |
WO2006021973A3 (en) * | 2004-08-23 | 2006-10-05 | Geneva Software Technologies L | A system and a method for a sim card based multi-lingual messaging application |
WO2008042845A1 (en) * | 2006-10-02 | 2008-04-10 | Google Inc. | Displaying original text in a user interface with translated text |
US7801721B2 (en) | 2006-10-02 | 2010-09-21 | Google Inc. | Displaying original text in a user interface with translated text |
US8095355B2 (en) | 2006-10-02 | 2012-01-10 | Google Inc. | Displaying original text in a user interface with translated text |
US8577668B2 (en) | 2006-10-02 | 2013-11-05 | Google Inc. | Displaying original text in a user interface with translated text |
US9547643B2 (en) | 2006-10-02 | 2017-01-17 | Google Inc. | Displaying original text in a user interface with translated text |
US10114820B2 (en) | 2006-10-02 | 2018-10-30 | Google Llc | Displaying original text in a user interface with translated text |
Also Published As
Publication number | Publication date |
---|---|
WO1999028831A9 (en) | 1999-09-16 |
AU1703699A (en) | 1999-06-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2421478C (en) | Method for selecting a font | |
US8707164B2 (en) | Integrated document viewer | |
US8201088B2 (en) | Method and apparatus for associating with an electronic document a font subset containing select character forms which are different depending on location | |
US5781714A (en) | Apparatus and methods for creating and using portable fonts | |
US6966029B1 (en) | Script embedded in electronic documents as invisible encoding | |
US6199080B1 (en) | Method and apparatus for displaying information on a computer controlled display device | |
US6565609B1 (en) | Translating data into HTML while retaining formatting and functionality for returning the translated data to a parent application | |
US8209600B1 (en) | Method and apparatus for generating layout-preserved text | |
US20140108897A1 (en) | Method and apparatus for document conversion | |
US20040160443A1 (en) | Method and apparatus for typographic glyph construction including a glyph server | |
CN105005472B (en) | The method and device of Uyghur Character is shown on a kind of WEB | |
JP2006114012A (en) | Optimized access to electronic document | |
US20040202352A1 (en) | Enhanced readability with flowed bitmaps | |
CN109933751B (en) | Image-text drawing method and device, computer-readable storage medium and computer equipment | |
WO1999028831A1 (en) | Preformatted allographic arabic text for html documents | |
CN102169478B (en) | For presenting the apparatus and method of multi-language text | |
CN111143749A (en) | Webpage display method, device, equipment and storage medium | |
Shirali-Shahreza et al. | Persian/arabic unicode text steganography | |
Mudur et al. | An architecture for the shaping of Indic texts | |
US20100017708A1 (en) | Information output apparatus, information output method, and recording medium | |
Shirali-Shahreza | Pseudo-space Persian/Arabic text steganography | |
Thomas et al. | Enhancing composite digital documents using xml-based standoff markup | |
Probets et al. | Substituting outline fonts for bitmap fonts in archived PDF files | |
Hardie | From legacy encodings to Unicode: the graphical and logical principles in the scripts of South Asia | |
JPS6385695A (en) | Serial character generation system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
AK | Designated states |
Kind code of ref document: C2 Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: C2 Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
COP | Corrected version of pamphlet |
Free format text: PAGES 9-11 AND 14-16, SEQUENCE LISTING, REPLACED BY NEW PAGES 9-11 AND 14-16; PAGES 4/6-6/6, DRAWINGS, REPLACED BY NEW PAGES 4/6-6/6; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
NENP | Non-entry into the national phase |
Ref country code: KR |
|
122 | Ep: pct application non-entry in european phase |