WO1999028831A1 - Preformatted allographic arabic text for html documents - Google Patents

Preformatted allographic arabic text for html documents Download PDF

Info

Publication number
WO1999028831A1
WO1999028831A1 PCT/US1998/025201 US9825201W WO9928831A1 WO 1999028831 A1 WO1999028831 A1 WO 1999028831A1 US 9825201 W US9825201 W US 9825201W WO 9928831 A1 WO9928831 A1 WO 9928831A1
Authority
WO
WIPO (PCT)
Prior art keywords
false
text
text data
call
allographic
Prior art date
Application number
PCT/US1998/025201
Other languages
French (fr)
Other versions
WO1999028831A9 (en
Inventor
Nizar Yahya Habash
Original Assignee
University Of Maryland
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University Of Maryland filed Critical University Of Maryland
Priority to AU17036/99A priority Critical patent/AU1703699A/en
Publication of WO1999028831A1 publication Critical patent/WO1999028831A1/en
Publication of WO1999028831A9 publication Critical patent/WO1999028831A9/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • G06F40/129Handling non-Latin characters, e.g. kana-to-kanji conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/53Processing of non-Latin text

Definitions

  • the present invention relates to a method and apparatus for rendering allographic Arabic text on internet or HTML (hyper-text markup language) documents such as home pages and web sites.
  • HTML hyper-text markup language
  • Typical home pages and web sites which are currently in use are designed for Roman or Latin script languages such as English.
  • Arabic script is very different from Roman script in a number of ways.
  • Arabic is written from right to left instead of from left to right, which causes, of course, significant differences with respect to justification and line wrapping.
  • the form of Arabic letters depends upon the position of the letter within the word. The same Arabic letter may have a different form depending upon whether the letter or character is a first character, a middle character, or an end character, or if the character stands alone.
  • the forms are typically referred to as initial, medial, final, and stand-alone. These multiple forms are referred to as allographs.
  • the rules for mapping a letter or grapheme into its allographs are called graphotactics.
  • Conventional solutions for representing Arabic characters include treating them either as graphics or as text. Commonly, the most current solution is to treat Arabic text documents as graphical images. Such a solution is platform independent, due to the fact that pixel-handling is a platform independent phenomenon.
  • graphical representation of characters requires a significant amount of information and calculation, a significant amount of memory is necessary, and a significant amount of time is necessary in order to render the characters.
  • Every letter is represented as one character regardless of its position, and a specialized operating system or program is capable of dealing with appropriate right-left directional representation and graphotactics.
  • the platform-dependent nature of this solution creates significant limitations on its viability. While home pages or documents can be created using, for example, Arabic Windows (tm), such documents can only viewed by computers having Arabic Windows (tm), or having a localized version of a browser capable of dealing with the directional representation and graphotactics.
  • the present invention is a method of treating Arabic text as text, in a manner which is platform independent, efficient from a memory and CPU speed perspective, is compatible with any other Arabic representation system, and enables benefits to be realized from advances in Roman script web typography.
  • Arabic script is considered and handled in an allographic manner; every form of a particular Arabic letter is considered to be a different character to solve the graphotactics problem, and the text is preformatted in order to solve the problems which are typically associated with the right-left direction problem.
  • the present invention therefore, allows Arabic to be treated by browsers as if it were English.
  • the invention encodes Arabic text in such a way that viewing would require a font only.
  • the invention has been implemented in a special text editor which essentially converts a conventional computer into an Arabic text editor.
  • HTML pages which are created using the invention are viewable by any computer that has access to the necessary font.
  • the font can either be downloaded to the machine or embedded in the HTML document as an object, otherwise known as a dynamic font.
  • the present invention therefore, includes a novel configuration for encoding Arabic text.
  • the invention is embodied, therefore, as a method of displaying text from a computer readable medium, and an apparatus for displaying text from a computer readable medium, with the text being displayed onto a computer display.
  • the method comprises the steps of storing text data in a graphemic format, and converting the graphemic text data comprising individual characters to allographic text data comprising individual characters using at least 8 bit encoding.
  • the allographic text data is stored on a computer readable medium, and the allographic text data is then rendered on a display using a font.
  • the text data is stored as Arabic script characters.
  • the rendering step can comprise one-to-one mapping of the allographic text data to glyphs.
  • An apparatus includes a storing device for storing text data in a graphemic format, converting means for converting the graphemic text data, using at least 8 bit encoding, and a second storing device is provided for storing the allographic text data.
  • Rendering means are provided for rendering the allographic text data as text on a display using a font.
  • Figure 2 illustrates a text rendering process according to the present invention
  • Figure 3 illustrates Arabic text as rendered according the present invention
  • Figure 4 illustrates the text of Figures 3, but without having the necessary font applied
  • Figure 5 is a conventional ASCII table; and Figure 6 is a table showing 8 bit representations of Arabic characters according to the invention.
  • Arabic script typically used for writing the Arabic language, can also be used for writing other languages such as Persian, Urdu, Pashto, Sindhi, and Kurdish. Arabic script appears as cursive writing, whether handwritten or printed. The resulting handwritten traditions are such that the same letter may be written in different forms, depending upon how the character joins with neighboring characters. Typically, the encoding is such that a total of 54-58 graphemes are encoded as approximately 109 allographs according to the present invention. These graphemes include 28 basic characters, 8 to 12 additional characters, 10 numerals, and 8 vowel characters. According to prior art such as Unicode, each letter receives only one Unicode character value, independent of the form.
  • the Unicode standard encodes characters, and the characters are resident as memory strings in internal memory or on disk storage.
  • a character When a character is rendered on a screen, it is called a glyph.
  • a character can render as one glyph, or part of one glyph.
  • the specific glyph corresponding to the appropriate letter form is determined by the character context.
  • a repertoire of glyphs in Unicode comprises a font. Referring to Figure 1, the Unicode standard, as an example, contains information wherein characters are encoded, and stored on a disk as characters in a graphemic format.
  • the characters are taken from memory and subjected to a text conversion/rendering process wherein a set of conversion rules and a font are applied to the characters on the disk, to render a document on the screen.
  • the document on the screen is essentially displayed in real time as a series of glyphs.
  • the present invention treats documents in a significantly different manner than the prior art text handling system such as that which is discussed above.
  • a document which is stored on disk or in memory as a plurality of characters in a graphemic format is subjected to a conversion process wherein the graphemes are converted by a series of conversion rules into allographs, according to predetermined graphotactics.
  • the resulting document is then stored on the disk as a series of allographs.
  • a text rendering process is performed, wherein graphotactics are not used. Only one-to-one mapping is performed on the document to the screen, using a font.
  • the conversion occurs only once per document, while the rendering occurs every time a user accesses the document.
  • the conversion/rendering process wherein a grapheme is converted to a glyph occurs every time the document is accessed. This results in slower system operation and rendering time.
  • the present invention solves this problem by using the first as well as the second halves of the extended ASCII table, which consists of 8 bits, thereby using space typically reserved for the ASCII character set.
  • This provides the invention with the number of locations needed to represent at least the 109 allographs needed for good quality Arabic text publishing. This representation results in 8 bit encoding of the allographs.
  • a character table according to the present invention is shown in Figure 6.
  • the invention is directed to a method of and apparatus for displaying text from a computer memory onto a computer display, with the method comprising the steps of storing the text data in a graphemic format, then converting the graphemic text data, which is made up of individual characters, to allographic text data made up of individual characters, using at least 8-bit encoding.
  • the allographic text data is then stored in a suitable memory.
  • the final step is rendering the text on a display using a font. It should be noted that the present invention utilizes a rendering step wherein the allographic text data is mapped to glyphs in a one-to-one mapping configuration.
  • the step of converting from graphemes to allographs includes a step of appropriately Preformatting the text for right-to-left display.
  • Documents which are prepared in this fashion can be made available for accessing by any computer on the internet through normal web server technology.
  • a client or computer wishing to view the Arabic document would access the particular page by entering in the address or "URL" for the page.
  • the specific Arabic character information or font information can be loaded as a dynamic font with the page information, so that the client computer receives the necessary encoding information to properly view the Arabic characters.
  • the specific Arabic character information can be downloaded separately.
  • Figure 3 is an illustration of Arabic text rendered as glyphs on a computer screen through an internet browser, with the Arabic text having been encoded according to the present invention.
  • the text is, as noted previously, treated in a textual format, and is capable of being searched, edited, and linked.
  • Figure 4 illustrates the text as stored on disk as allographs. When subjected to the text rendering process and font information illustrated in Figure 2, the glyphs of Figure 3 are rendered. If the text rendering process were not performed using the appropriate font, the allographs would appear on the screen as illustrated in Figure 4.
  • the downloading of the font or the supply of the dynamic font to the computer upon which the Arabic text is rendered ensures appropriate viewing.
  • An example of the conversion rules for converting the graphemes to allographs are shown in the attached code page.
  • the code page, written in Visual Basic (tm) includes data regarding features of the characters that are used by the conversion rules to determine the allographs.
  • the letter "gayn" has the features of being connectable to other letters, does not act as a space, is not a vowel, and has a different form and therefore a different code for when the letter appears initially, medially, finally, or alone.
  • complex determining software such as Arabic Windows (tm) which attempts to determine an appropriate form based upon a character's position, is not necessary.
  • the invention is directed to a method and a computer system which displays text from a computer readable medium onto a computer display.
  • the invention provides a new and unique way of preparing and viewing HTML documents which have non-Roman characters, such as Arabic.
  • non-Roman characters such as Arabic.
  • defir.eletrer (co ⁇ cia, False, False, False, 53, S3, 63, 535 Call defi-ele"a ( semicolon. False, False, False, 52, S3, 53, 53 ) Call defineleeeer(period. False, False, False, 63, 63, 63) Call defineleceer(questSDark, False, F-alse, False, 63, 63, 63, 63)

Abstract

A method and apparatus for displaying text from a computer readable medium onto a computer display stores text data in a graphemic format (21), then converts (23) the graphemic text data, being made up of individual characters, to allographic text data (24) which is also made up of individual characters, using at least 8 bit encoding. The allographic text data (24) is then stored. The allographic text data (24) is then rendered as text on a display using a font.

Description

TITLE OF THE INVENTION:
PREFORMATTED ALLOGRAPHIC ARABIC TEXT FOR HTML DOCUMENTS BACKGROUND OF THE INVENTION: Field of the Invention:
The present invention relates to a method and apparatus for rendering allographic Arabic text on internet or HTML (hyper-text markup language) documents such as home pages and web sites. Description of the Related Art: Typical home pages and web sites which are currently in use are designed for Roman or Latin script languages such as English. Arabic script, however, is very different from Roman script in a number of ways. In addition to the significant differences in the shape of the characters, Arabic is written from right to left instead of from left to right, which causes, of course, significant differences with respect to justification and line wrapping. Additionally, the form of Arabic letters depends upon the position of the letter within the word. The same Arabic letter may have a different form depending upon whether the letter or character is a first character, a middle character, or an end character, or if the character stands alone. The forms are typically referred to as initial, medial, final, and stand-alone. These multiple forms are referred to as allographs. The rules for mapping a letter or grapheme into its allographs are called graphotactics. Conventional solutions for representing Arabic characters include treating them either as graphics or as text. Commonly, the most current solution is to treat Arabic text documents as graphical images. Such a solution is platform independent, due to the fact that pixel-handling is a platform independent phenomenon. However, since such graphical representation of characters requires a significant amount of information and calculation, a significant amount of memory is necessary, and a significant amount of time is necessary in order to render the characters. Additionally, graphical representation of text makes it difficult or impossible for the text to be appropriately searched by a search engine, or linked to other sites. The treatment of Arabic characters as text would seem to be the most desirable solution. A text based system is efficient with respect to memory space and rendering time, and makes it possible to search and link as with other text documents. Existing encodings of Arabic include ISO (International Standards Organization), ASMO (Arab Standards and Metrology Organization), CP-1256 (Microsoft (tm) encoding for Arabic Windows (tm)), and Unicode. However, the current state of the art is such that a specific platform is needed in order to either create or view documents in Arabic. Each of the known encodings discussed above requires a special hardware or software configuration which must be prepared before any Arabic reading or writing can take place. Every letter is represented as one character regardless of its position, and a specialized operating system or program is capable of dealing with appropriate right-left directional representation and graphotactics. The platform-dependent nature of this solution, however, creates significant limitations on its viability. While home pages or documents can be created using, for example, Arabic Windows (tm), such documents can only viewed by computers having Arabic Windows (tm), or having a localized version of a browser capable of dealing with the directional representation and graphotactics. SUMMARY OF THE INVENTION:
The present invention, therefore, is a method of treating Arabic text as text, in a manner which is platform independent, efficient from a memory and CPU speed perspective, is compatible with any other Arabic representation system, and enables benefits to be realized from advances in Roman script web typography. These advantages are created by a method wherein Arabic script is considered and handled in an allographic manner; every form of a particular Arabic letter is considered to be a different character to solve the graphotactics problem, and the text is preformatted in order to solve the problems which are typically associated with the right-left direction problem. The present invention, therefore, allows Arabic to be treated by browsers as if it were English. The invention encodes Arabic text in such a way that viewing would require a font only. The invention has been implemented in a special text editor which essentially converts a conventional computer into an Arabic text editor. HTML pages which are created using the invention are viewable by any computer that has access to the necessary font. The font can either be downloaded to the machine or embedded in the HTML document as an object, otherwise known as a dynamic font. The present invention, therefore, includes a novel configuration for encoding Arabic text.
The invention is embodied, therefore, as a method of displaying text from a computer readable medium, and an apparatus for displaying text from a computer readable medium, with the text being displayed onto a computer display. The method comprises the steps of storing text data in a graphemic format, and converting the graphemic text data comprising individual characters to allographic text data comprising individual characters using at least 8 bit encoding. The allographic text data is stored on a computer readable medium, and the allographic text data is then rendered on a display using a font. In a preferred embodiment, the text data is stored as Arabic script characters. Furthermore, the rendering step can comprise one-to-one mapping of the allographic text data to glyphs. An apparatus according to the present invention includes a storing device for storing text data in a graphemic format, converting means for converting the graphemic text data, using at least 8 bit encoding, and a second storing device is provided for storing the allographic text data. Rendering means are provided for rendering the allographic text data as text on a display using a font. BRIEF DESCRIPTION OF THE DRAWINGS: Figure 1 illustrates a text rendering process according to the prior art;
Figure 2 illustrates a text rendering process according to the present invention;
Figure 3 illustrates Arabic text as rendered according the present invention; Figure 4 illustrates the text of Figures 3, but without having the necessary font applied;
Figure 5 is a conventional ASCII table; and Figure 6 is a table showing 8 bit representations of Arabic characters according to the invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS:
In order to more clearly understand the present invention, a brief discussion of textual representation and terminology is appropriate. Writing systems, in any language, are broken down into units; graphemes are known as the smallest units in any writing system which are capable of causing a contrast in meaning. In the English alphabet, therefore, switching from "hook" to "book" introduces a meaning change. Therefore, "h" and "b" are each graphemes. There is no prescribed form for a grapheme. The grapheme for a particular letter or sound may appear as a capital letter, a small letter, a parenthetical notation, or other form depending upon the particular handwriting style or type face which is chosen. Each of the possible forms for a grapheme is known as a graph. When graphs are considered to be variations of a particular grapheme, they are considered to be "allographs."
Arabic script, typically used for writing the Arabic language, can also be used for writing other languages such as Persian, Urdu, Pashto, Sindhi, and Kurdish. Arabic script appears as cursive writing, whether handwritten or printed. The resulting handwritten traditions are such that the same letter may be written in different forms, depending upon how the character joins with neighboring characters. Typically, the encoding is such that a total of 54-58 graphemes are encoded as approximately 109 allographs according to the present invention. These graphemes include 28 basic characters, 8 to 12 additional characters, 10 numerals, and 8 vowel characters. According to prior art such as Unicode, each letter receives only one Unicode character value, independent of the form. The Unicode standard encodes characters, and the characters are resident as memory strings in internal memory or on disk storage. When a character is rendered on a screen, it is called a glyph. A character can render as one glyph, or part of one glyph. The specific glyph corresponding to the appropriate letter form (initial, medial, final, etc.) is determined by the character context. A repertoire of glyphs in Unicode comprises a font. Referring to Figure 1, the Unicode standard, as an example, contains information wherein characters are encoded, and stored on a disk as characters in a graphemic format. When text is sought to be rendered on the screen, the characters are taken from memory and subjected to a text conversion/rendering process wherein a set of conversion rules and a font are applied to the characters on the disk, to render a document on the screen. The document on the screen is essentially displayed in real time as a series of glyphs.
Referring to Figure 2, however, it can be seen that the present invention treats documents in a significantly different manner than the prior art text handling system such as that which is discussed above. According to the invention, a document which is stored on disk or in memory as a plurality of characters in a graphemic format is subjected to a conversion process wherein the graphemes are converted by a series of conversion rules into allographs, according to predetermined graphotactics. The resulting document is then stored on the disk as a series of allographs. When a user accesses the document for the purposes of reading it, a text rendering process is performed, wherein graphotactics are not used. Only one-to-one mapping is performed on the document to the screen, using a font. According to this process, the conversion occurs only once per document, while the rendering occurs every time a user accesses the document. According to the prior art of Figure 1, however, the conversion/rendering process wherein a grapheme is converted to a glyph occurs every time the document is accessed. This results in slower system operation and rendering time.
Attempts have been made to utilize a process similar to the process of Figure 2, however using only 7 bits of a conventional 8 bit code page such as an extended ASCII table. A conventional extended ASCII table is shown in Figure 5. Specifically, the allographic characters in this previous attempt were assigned locations in the second half of the extended ASCII table, keeping the first half of the table identical to the ASCII character set. Since appropriate Arabic text requires a minimum of 109 allographs, however, this 7 bit attempt was too limited. It ignored specific characters such as vowel marks, and required combining of certain letter forms. This resulted in a system with limited Arabic text representation ability. For example, Koranic text requires more ability than this 7 bit system provided. The present invention solves this problem by using the first as well as the second halves of the extended ASCII table, which consists of 8 bits, thereby using space typically reserved for the ASCII character set. This provides the invention with the number of locations needed to represent at least the 109 allographs needed for good quality Arabic text publishing. This representation results in 8 bit encoding of the allographs. A character table according to the present invention, therefore, is shown in Figure 6.
The invention, therefore, is directed to a method of and apparatus for displaying text from a computer memory onto a computer display, with the method comprising the steps of storing the text data in a graphemic format, then converting the graphemic text data, which is made up of individual characters, to allographic text data made up of individual characters, using at least 8-bit encoding. The allographic text data is then stored in a suitable memory. The final step is rendering the text on a display using a font. It should be noted that the present invention utilizes a rendering step wherein the allographic text data is mapped to glyphs in a one-to-one mapping configuration.
Additionally, in the case of Arabic text, the step of converting from graphemes to allographs includes a step of appropriately Preformatting the text for right-to-left display. Documents which are prepared in this fashion can be made available for accessing by any computer on the internet through normal web server technology. A client or computer wishing to view the Arabic document would access the particular page by entering in the address or "URL" for the page. When the page begins loading, the specific Arabic character information or font information can be loaded as a dynamic font with the page information, so that the client computer receives the necessary encoding information to properly view the Arabic characters. In the alternative, the specific Arabic character information can be downloaded separately. Any computer capable of viewing documents with dynamic fonts, therefore, will be capable of viewing the Arabic documents, and will not require a specific hardware/software platform or any plug-ins. Figure 3 is an illustration of Arabic text rendered as glyphs on a computer screen through an internet browser, with the Arabic text having been encoded according to the present invention. The text is, as noted previously, treated in a textual format, and is capable of being searched, edited, and linked. Figure 4 illustrates the text as stored on disk as allographs. When subjected to the text rendering process and font information illustrated in Figure 2, the glyphs of Figure 3 are rendered. If the text rendering process were not performed using the appropriate font, the allographs would appear on the screen as illustrated in Figure 4. The downloading of the font or the supply of the dynamic font to the computer upon which the Arabic text is rendered ensures appropriate viewing. An example of the conversion rules for converting the graphemes to allographs are shown in the attached code page. The code page, written in Visual Basic (tm), includes data regarding features of the characters that are used by the conversion rules to determine the allographs. For example, the letter "gayn" has the features of being connectable to other letters, does not act as a space, is not a vowel, and has a different form and therefore a different code for when the letter appears initially, medially, finally, or alone. Because the present invention is capable of treating the different forms of each letter as a different character, complex determining software, such as Arabic Windows (tm) which attempts to determine an appropriate form based upon a character's position, is not necessary.
As noted previously, the invention is directed to a method and a computer system which displays text from a computer readable medium onto a computer display. In view of the recent popularity of internet publishing and HTML documents, the invention provides a new and unique way of preparing and viewing HTML documents which have non-Roman characters, such as Arabic. Although the above discussion of the embodiments of the invention are illustrative in nature, and it would be evident to a person of ordinary skill in the art that a number of modifications could be made to the invention while still remaining within the spirit and scope of the invention. For example, although Arabic text and Arabic characters are discussed, the invention could be used on other languages or other character sets which have similarities to Arabic text and Arabic characters.
For a more clear understanding of the metes and bounds of the present invention, reference should be made to the appended claims.
Global Const: lcaaf% = 223
Global Const laam% = 225
Global Const miim% = 227
Global Const mιun% = 228
Global Cαnst aa% = 229
Global Const aw% = 230
Global Const alifnaqsura% = 23ό
Global Const yaa% = 237
Global Const: fatiιa% = 243
Global Const kasra% = 246
Global Const darπma% = 245
Global Const fat .atan iin% = 240
Global Const kasrataπwiir.% = 242
Global Const daπmatan iin% = 241
Global Const shadάa% = 248
Global Const suk ur.% = 250
Global Const corama% = 151 Global Const s<=micoiαn% = 186 Global Const period% = 220 Global Const <τuestnark% = 191 Global Const exclaιr.ark% = 33
Global Const laaπι_al f% = 1
Global Const laaιπ_al ±nadda% = 2
Global Const iaaπ_al fharιzaup% = 3
Global Const laan_al ,f aιπr.adn% = 4
Global Const qaaf_ acca% = 5 Global Const giin% = 5 Global Const vii% = 7 Global Const pii% = 8 Global Const oo% = 9 Global Const ee% = 11 Global Const alif_taa% = 12 Global Const a if_saad% = 14
Global Const ctaarαark% = 15 Global Const revshadda% = 16
'Copyright (c) 1997 Nizar Habas ' UrJversiy of Maryland at College Park 'All Rights Reserved
Sub defineletter (x, xconnects, xisspace, xisvowel, xinit, xsiid, xfisal, xalone)
--tter(x) .connects = xconnects letter (x) .isspace = xisspace etter (x) .isvowei = xisvowel etter(x) .ir.it = xinit etter (x) .mid = xmid etter (x) .final = xfinal letter (x) .alone = xalone
Ξr-d Sub
Sub defineletters ( ) 'Copyright (c) 1997 Nirar Habash 'ϋniversiy of Maryland at College Park 'All igh s Reserved ' miCiace
For x = 1 To 255
Call cefia.elecrer(x. False, True, False, x, x, χ( x) Nexc
' definelec-ar (hamza, eaxsieccs , isspace. isvowel, isie,mid, final, alone)
Call defineistcer (13 , False, True, False, 1 , 13, 13, 13)
Call defiπeieczer (hamza, False, True, False, 136, 135, 1 45, 35)
Call definelecrer(ali±nadda. False, False, False, S3,' 69, 62, 68)
Call defineiecrer {aiif amzaup , False, False, False, 6ό, 67, 67, 66)
Call defir-e ecrer (wawπamza. False, False, False, 140, 14C,'l4θ' 140)
Call defir-elec-ar (alif-ι?τπracn. False, False, False, 70, 71, 7 ' 70)
Call definelac-er(yaa fiπιza. True, False, False, 137, 137, ~l3a7' 139)
Call defiaelec-e (aii , False, False, False, 54, 55, 65, 54)
Call de ir-elec ar (baa, True, False, False, 72, 72, 73, 73)
Call defir-eiecrer ( caamarbuca, False, False, False, 132, 133, 133, 132)
Call defir.ele.::.ar(caa. True, False, Falsa, 74, 74, 75, 75)
Call definele^-ar (thaa. True, False, False, 7S, 75, 77, 77)
Call defir.elecrer (jiis, True, False, False, 73, 73, 79, 80)
Call defi2.elet-.rer (cbaa, True, False, False, 31, 31, 32, 33)
Call defineleϊzer ( baa. True, False, False, 84, 84, 35, 36)
Call defirieieccar (daa . False, False, False, 37, 37, 37, 37)
Call defir-elaccar (ciaal , FaJ.se, False, Falsa, 38, 83, 33, 38)
Call defir-eiec ar (raa. False, False, False, 89, 89, 39, 89)
Call definele e (zaay, False, False, False, 90, 90, 90, 90 )
Call defir-elecze (siir.. True, False, False, 91, 91, 92, 92)
Call defizvβiec er (sbiiTi, True, False, False, 93, 93, 94, 94)
Call άefir-eleccar (saa , True, False, False, 95, 95, 95, 95) Call defzLnele zer (daac. True, False, False, 97, 97, 98, 98)
Call defi-π.elec~ar (c~a , True, False, False, 99, 99, 99, 99) Call defir-e az-ar (c a , True, False, False, 100, 100, 100, 100)
Call defir.elac er (ayr.. True, False, False, 101, 102, 103, 104) Call defi elerzer (gayn, True, False, False, 105, 106, 107, 108)
Call defire ec-er ( f a, True, False, False, 109, 110, 111, ill) Call defineleczer (ςaaf , True, False, False, 112, 113, 114, 114)
Call definelec ar (kaaf, True, False, False, 115, 115, 115, us)
Call define ec-er (laa . True, False, False, 117, 117, 118, 118)
Call defiteleczer (rniix, True, False, False, 119, 119, 120, 120)
Call da inelecrer (r-ur.. True, False, False, 121, 121, 122, 122)
Call defiiielaczarCnaa, True, False, False, 122, 124, 125, 126)
Call definelec ar (waw, False, False, False, 123, 123, 128, 123)
Call defineleczer (alifnaqs ra. False, False, False, 135, 134, 134, 1 )
Call defi-αelec ar (yaa, True, False, False, 129, 129, 130, 131)
Call deficeieczar (facia, False, False, True, 170, 170, 170, 170) Call defineleczer ( asra. False, False, True, 172, 172, 172, 172) Call definelecrer (Gamm , False, False, True, 171, 171, 17"! , ι - ι \
Call definelecrar (shadda, False, Falsa, False, 62, 63, 63, 63) Call defi elac er (sukur., False, False, False, 63, 63, 63, 63)
Call defir.eletrer (coπcia, False, False, False, 53, S3, 63, 535 Call defi-ele"a ( semicolon. False, False, False, 52, S3, 53, 53) Call defineleeeer(period. False, False, False, 63, 63, 63, 63) Call defineleceer(questSDark, False, F-alse, False, 63, 63, 63, 63)
Call defineleeeer(spaceb.ar, False, True, F-alse, 32, 32, 32, 32)
Call defineleeeer(laam_aiif, False, False, False, 141, 142, 142, 141) ■Call defineleeeer(laω_alif-nadda, F-alse, False, False, 145, 146, 146, 145) Call defineleeeer(laam_alifhamzaup. False, False, False, 143, 144, 144, 143) Call definelecc-3r(laam_εLlifhaιnza«dϊi, F.alse, False, False, 147, 143, 148, 147)
Call defineleceer(qaaf_h.amza. True, False, False, 149, 150, 151, 151) Call defineleeeer(gii . True, False, False, 152, 152, 153, 154) Call defineleeeer{vii, True, False, False, 155, 155, 156, 159) Call defineleeear.pii. True, False, False, 162, 152, 163, 163) Call defineleecer(oo, False, False, False, 164, 154, 154, 164) Call defineleeeer(ee. True, False, False, 165, 165, 165, 167) Call defineleceer(alif_eaa, False, True, False, 158, 168, 168, 168) Call defineleeeer(alif_saad. False, False, True, 169, 159, 159, 169)
Call defineieccertceaamark, False, False, True, 182, 182, 182, 182) Call defineieceertrevs adda, False, False, True, 131, 131, 181, 181)
End Sub
Fnceion flip (x As String)
For i = ύen(x) To 1 SCep -1 y = y T Hid(x, i, 1) Nexe flip = y id Funceion
-senior. flip2 (x As String) x = fiip(x) done = False
While Not done pos = InStrCl, x, Chr(13)) If oos = 0 Then
"y = x + Gar (13) + Chr(lO) + y done = True Else
If oos = Len(x) Then y"= Mid(x, 1, pos - 1) + Chr(13) -> Chr(lO) - y done = True Else y = Mid(x, 1, pos - 1) + Chr(13! * Chr(lO) + y x = Mid(x, pas - 1, en(x) - pos) End If End If
W-and
£ii?2 = y Funceion subseieuce (a, x, b)
If (a = -1) Then before_connece = False Else before_coπnece = leceer(a) .conneces End If
If (b = -1) Then afeer_space = True Ξlse afeer_soace = leceer(b) .isspace End If
If (befαre_cαnnece = False) Then If (afeer_space = False) Then subseieuce - leeeer(x) . inie
Exic Funceion Else subseieuce = leeeer(x) .alone
Exic Funceion End If Else
If (afcar_space = False) Then subseieuce = leccar(x) .mid
Exic Funceio Ξlse subseieuce = leccer(x) .final
Exic Funceion End If
jmc iccio
Funceion cacchpaceem (a, b) c = -1 If a = 13 And b = 10 Then c = 13
If a = laam Then Selecc Case b Case alif: c = laam_aiif
Case aiifhamzaup : c = laam_aiifhamzaup
Case aiifhamzacn: c = laam_aiifhamzadn
Case ali±nadda: c = laam_ali fmaάda End Selecc
End If
If a = σaaf And b = exclaπτπ-k Then c = qaaf_ham2a
If a = jiim And b = exciamark Then c = giim
If a = faa And b = exciamark Then c = vii
If a = baa And b = axciasiark Then c = pii
If a = aw And b = -axclaark Th-an c = oo
If a = yaa And b = excl?τπark Th-an c = ee ,
If a = saad And b = exciamark Then c = alif_saad
If a = caa And b = exciamark Then c = alif_Caa
If a = shiin And b = exciamark Then c = revshadda
If a = ccaa And b = exciamark Then c = ccaamark
cacchpaceem = c
End Funceion
Sub cacchpaccems () cexeone = " " a = Asc(Mid(arabic.Texc, 1, 1) )
For i = 2 To Le ( rable .Texc) found = False a = Asc (Mid ( arabic . exe, i - 1, 1) ) b - Asc (Mid ( .arabic. T-axc, i, 1) ) c = -1 c = cacchpaceem (a, b)
If c > -1 Then found = True
If found Then eexeαne = Cexeone + Chr(c) i » i + 1
If i = L.sn(.arabic.Texe) Then cexeone = cexeone *• Mid(arabic.Texe, i, 1) End If Else cexeone = cexeone + Chr(a) End If
Nexc
If found = False Then
Cexeone = eaxeone -r Chr(b)-
End If
End Sub
Sub Coιπmandl_Click ( ) converc cexel.Texc = eexrcwo cexc2. Texc = newcexc
End Sub
Sub converc ( )
' Cexeone = arabic . taxc cexccwo = " "
If i-4-an(arabic.Texe) > 0 Then Call cacchnaceems
a = -1
For i = 1 To Le (eaxeone) x = Asc (Mid(cexeone, i, 1))
If i < Le (Cexeone) Then b = Asc (Mid (cexeone, i - 1, 1))
If (leccer(b) .isvowel And i + 2 <= Len( Cexeone) ) Ther. b = Asc (Mid (cexeone, i + 2, 1))
End If Ξlse b ■ -1 End If z = subseieuce (a, x, b) cexecwo = cexecwo + Chr(∑) a = x
Nexc newe-axe = fiip2(e>3xeeo) clipboard.SecTexc neweexe End If
End Sub
Sub Form_Load () Call defineieccers TITLE.Visible = True TITLE.Enabled = True
End Sub
Funceion nums (x As Scring) z = "
For i = 1 To Len(x) z = 2 * Ser(Asc(Mid(x, i, 1) ) )
Figure imgf000017_0001
≤nα Funceion
Sub oldconverc () 'Cexeone = arabic. Cexc ' lacir..Texe = " " cexecwo = " "
Call cacchpaccems
a = -1
For i = 1 To Le (Cexeone) x = Asc (Mid(cexeone, i, 1))
If i < Le (cexeone) Then b = Asc (Mid(cexeone, i - I, 1))
If (leceer(b) .isvowel And i + 2 <= Le (Cexeone) ) Then b = Asc (Mid(cexeone, i + 2, 1) )
End If Else b » -1 End If
2 = subseieuce (a, x, b) cexecwo = eexeewo + Chr(z) Nexc
'lacin.Texc = fli (cexecwo)
'Copyrighe (c) 1397 Nizar Habash 'Universiy of Maryland ae College Park 'All Righcs Reserved
End Sub

Claims

CLAIMS:
1. A method of displaying text from a computer readable medium onto a computer display, said method comprising the steps of: storing text data in a graphemic format; converting the graphemic text data comprising individual characters to allographic text data comprising individual characters using at least 8 bit encoding; storing the allographic text data on a computer readable medium; rendering the allographic text data on a display using a font.
2. A method as recited in claim 1 , wherein said text data is stored as Arabic script characters.
3. A method as recited in claim 2, wherein the rendering step comprises one-to-one mapping of the allographic text data to glyphs.
4. An apparatus for displaying text from a computer readable medium onto a computer display, said apparatus comprising: a storing device for storing text data in a graphemic format; converting means for converting the graphemic text data comprising individual characters to allographic text data comprising individual characters, using at least 8 bit encoding; a second storing device for storing the allographic text data; rendering means for rendering the allographic text data as text on a display using a font.
5. An apparatus as recited in claim 4, wherein said first storing device stores the text data in a graphemic format as Arabic script characters.
6. An apparatus as recited in claim 5, wherein the rendering means comprises mapping means for one-to-one mapping of the allographic text data to glyphs.
PCT/US1998/025201 1997-12-03 1998-12-03 Preformatted allographic arabic text for html documents WO1999028831A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU17036/99A AU1703699A (en) 1997-12-03 1998-12-03 Preformatted allographic arabic text for html documents

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US6744297P 1997-12-03 1997-12-03
US60/067,442 1997-12-03

Publications (2)

Publication Number Publication Date
WO1999028831A1 true WO1999028831A1 (en) 1999-06-10
WO1999028831A9 WO1999028831A9 (en) 1999-09-16

Family

ID=22076010

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1998/025201 WO1999028831A1 (en) 1997-12-03 1998-12-03 Preformatted allographic arabic text for html documents

Country Status (2)

Country Link
AU (1) AU1703699A (en)
WO (1) WO1999028831A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1298629A2 (en) * 2001-09-26 2003-04-02 MAN Nutzfahrzeuge Aktiengesellschaft Method of displaying arabic characters in a display of a vehicle
DE10147541A1 (en) * 2001-09-26 2003-04-10 Man Nutzfahrzeuge Ag Representation of Greek and Hebrew letters in a display of a motor vehicle
WO2006021973A2 (en) * 2004-08-23 2006-03-02 Geneva Software Technologies Limited A system and a method for a sim card based multi-lingual messaging application
WO2008042845A1 (en) * 2006-10-02 2008-04-10 Google Inc. Displaying original text in a user interface with translated text

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4176974A (en) * 1978-03-13 1979-12-04 Middle East Software Corporation Interactive video display and editing of text in the Arabic script
US5412771A (en) * 1992-02-07 1995-05-02 Signature Software, Inc. Generation of interdependent font characters based on ligature and glyph categorizations
US5556282A (en) * 1994-01-18 1996-09-17 Middlebrook; R. David Method for the geographical processsing of graphic language texts

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4176974A (en) * 1978-03-13 1979-12-04 Middle East Software Corporation Interactive video display and editing of text in the Arabic script
US5412771A (en) * 1992-02-07 1995-05-02 Signature Software, Inc. Generation of interdependent font characters based on ligature and glyph categorizations
US5556282A (en) * 1994-01-18 1996-09-17 Middlebrook; R. David Method for the geographical processsing of graphic language texts

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1298629A2 (en) * 2001-09-26 2003-04-02 MAN Nutzfahrzeuge Aktiengesellschaft Method of displaying arabic characters in a display of a vehicle
DE10147541A1 (en) * 2001-09-26 2003-04-10 Man Nutzfahrzeuge Ag Representation of Greek and Hebrew letters in a display of a motor vehicle
DE10147540A1 (en) * 2001-09-26 2003-04-10 Man Nutzfahrzeuge Ag Representation of Arabic script in a display of a motor vehicle
EP1298629A3 (en) * 2001-09-26 2005-06-29 MAN Nutzfahrzeuge Aktiengesellschaft Method of displaying arabic characters in a display of a vehicle
WO2006021973A2 (en) * 2004-08-23 2006-03-02 Geneva Software Technologies Limited A system and a method for a sim card based multi-lingual messaging application
WO2006021973A3 (en) * 2004-08-23 2006-10-05 Geneva Software Technologies L A system and a method for a sim card based multi-lingual messaging application
WO2008042845A1 (en) * 2006-10-02 2008-04-10 Google Inc. Displaying original text in a user interface with translated text
US7801721B2 (en) 2006-10-02 2010-09-21 Google Inc. Displaying original text in a user interface with translated text
US8095355B2 (en) 2006-10-02 2012-01-10 Google Inc. Displaying original text in a user interface with translated text
US8577668B2 (en) 2006-10-02 2013-11-05 Google Inc. Displaying original text in a user interface with translated text
US9547643B2 (en) 2006-10-02 2017-01-17 Google Inc. Displaying original text in a user interface with translated text
US10114820B2 (en) 2006-10-02 2018-10-30 Google Llc Displaying original text in a user interface with translated text

Also Published As

Publication number Publication date
WO1999028831A9 (en) 1999-09-16
AU1703699A (en) 1999-06-16

Similar Documents

Publication Publication Date Title
CA2421478C (en) Method for selecting a font
US8707164B2 (en) Integrated document viewer
US8201088B2 (en) Method and apparatus for associating with an electronic document a font subset containing select character forms which are different depending on location
US5781714A (en) Apparatus and methods for creating and using portable fonts
US6966029B1 (en) Script embedded in electronic documents as invisible encoding
US6199080B1 (en) Method and apparatus for displaying information on a computer controlled display device
US6565609B1 (en) Translating data into HTML while retaining formatting and functionality for returning the translated data to a parent application
US8209600B1 (en) Method and apparatus for generating layout-preserved text
US20140108897A1 (en) Method and apparatus for document conversion
US20040160443A1 (en) Method and apparatus for typographic glyph construction including a glyph server
CN105005472B (en) The method and device of Uyghur Character is shown on a kind of WEB
JP2006114012A (en) Optimized access to electronic document
US20040202352A1 (en) Enhanced readability with flowed bitmaps
CN109933751B (en) Image-text drawing method and device, computer-readable storage medium and computer equipment
WO1999028831A1 (en) Preformatted allographic arabic text for html documents
CN102169478B (en) For presenting the apparatus and method of multi-language text
CN111143749A (en) Webpage display method, device, equipment and storage medium
Shirali-Shahreza et al. Persian/arabic unicode text steganography
Mudur et al. An architecture for the shaping of Indic texts
US20100017708A1 (en) Information output apparatus, information output method, and recording medium
Shirali-Shahreza Pseudo-space Persian/Arabic text steganography
Thomas et al. Enhancing composite digital documents using xml-based standoff markup
Probets et al. Substituting outline fonts for bitmap fonts in archived PDF files
Hardie From legacy encodings to Unicode: the graphical and logical principles in the scripts of South Asia
JPS6385695A (en) Serial character generation system

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: C2

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: C2

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

COP Corrected version of pamphlet

Free format text: PAGES 9-11 AND 14-16, SEQUENCE LISTING, REPLACED BY NEW PAGES 9-11 AND 14-16; PAGES 4/6-6/6, DRAWINGS, REPLACED BY NEW PAGES 4/6-6/6; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase

Ref country code: KR

122 Ep: pct application non-entry in european phase