WO1999028831A9 - Preformatted allographic arabic text for html documents - Google Patents

Preformatted allographic arabic text for html documents

Info

Publication number
WO1999028831A9
WO1999028831A9 PCT/US1998/025201 US9825201W WO9928831A9 WO 1999028831 A9 WO1999028831 A9 WO 1999028831A9 US 9825201 W US9825201 W US 9825201W WO 9928831 A9 WO9928831 A9 WO 9928831A9
Authority
WO
WIPO (PCT)
Prior art keywords
false
text
text data
call
defineletter
Prior art date
Application number
PCT/US1998/025201
Other languages
French (fr)
Other versions
WO1999028831A1 (en
Inventor
Nizar Yahya Habash
Original Assignee
Univ Maryland
Nizar Yahya Habash
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ Maryland, Nizar Yahya Habash filed Critical Univ Maryland
Priority to AU17036/99A priority Critical patent/AU1703699A/en
Publication of WO1999028831A1 publication Critical patent/WO1999028831A1/en
Publication of WO1999028831A9 publication Critical patent/WO1999028831A9/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • G06F40/129Handling non-Latin characters, e.g. kana-to-kanji conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/53Processing of non-Latin text

Definitions

  • the present invention relates to a method and apparatus for rendering allographic Arabic text on internet or HTML (hyper-text markup language) documents such as home pages and web sites.
  • HTML hyper-text markup language
  • Typical home pages and web sites which are currently in use are designed for Roman or Latin script languages such as English.
  • Arabic script is very different from Roman script in a number of ways.
  • Arabic is written from right to left instead of from left to right, which causes, of course, significant differences with respect to justification and line wrapping.
  • the form of Arabic letters depends upon the position of the letter within the word. The same Arabic letter may have a different form depending upon whether the letter or character is a first character, a middle character, or an end character, or if the character stands alone.
  • the forms are typically referred to as initial, medial, final, and stand-alone. These multiple forms are referred to as allographs.
  • the rules for mapping a letter or grapheme into its allographs are called graphotactics.
  • Conventional solutions for representing Arabic characters include treating them either as graphics or as text. Commonly, the most current solution is to treat Arabic text documents as graphical images. Such a solution is platform independent, due to the fact that pixel-handling is a platform independent phenomenon.
  • graphical representation of characters requires a significant amount of information and calculation, a significant amount of memory is necessary, and a significant amount of time is necessary in order to render the characters.
  • Every letter is represented as one character regardless of its position, and a specialized operating system or program is capable of dealing with appropriate right-left directional representation and graphotactics.
  • the platform-dependent nature of this solution creates significant limitations on its viability. While home pages or documents can be created using, for example, Arabic Windows (tm), such documents can only viewed by computers having Arabic Windows (tm), or having a localized version of a browser capable of dealing with the directional representation and graphotactics.
  • the present invention is a method of treating Arabic text as text, in a manner which is platform independent, efficient from a memory and CPU speed perspective, is compatible with any other Arabic representation system, and enables benefits to be realized from advances in Roman script web typography.
  • Arabic script is considered and handled in an allographic manner; every form of a particular Arabic letter is considered to be a different character to solve the graphotactics problem, and the text is preformatted in order to solve the problems which are typically associated with the right-left direction problem.
  • the present invention therefore, allows Arabic to be treated by browsers as if it were English.
  • the invention encodes Arabic text in such a way that viewing would require a font only.
  • the invention has been implemented in a special text editor which essentially converts a conventional computer into an Arabic text editor.
  • HTML pages which are created using the invention are viewable by any computer that has access to the necessary font.
  • the font can either be downloaded to the machine or embedded in the HTML document as an object, otherwise known as a dynamic font.
  • the present invention therefore, includes a novel configuration for encoding Arabic text.
  • the invention is embodied, therefore, as a method of displaying text from a computer readable medium, and an apparatus for displaying text from a computer readable medium, with the text being displayed onto a computer display.
  • the method comprises the steps of storing text data in a graphemic format, and converting the graphemic text data comprising individual characters to allographic text data comprising individual characters using at least 8 bit encoding.
  • the allographic text data is stored on a computer readable medium, and the allographic text data is then rendered on a display using a font.
  • the text data is stored as Arabic script characters.
  • the rendering step can comprise one-to-one mapping of the allographic text data to glyphs.
  • An apparatus includes a storing device for storing text data in a graphemic format, converting means for converting the graphemic text data, using at least 8 bit encoding, and a second storing device is provided for storing the allographic text data.
  • Rendering means are provided for rendering the allographic text data as text on a display using a font.
  • Figure 2 illustrates a text rendering process according to the present invention
  • Figure 3 illustrates Arabic text as rendered according the present invention
  • Figure 4 illustrates the text of Figures 3, but without having the necessary font applied
  • Figure 5 is a conventional ASCII table; and Figure 6 is a table showing 8 bit representations of Arabic characters according to the invention.
  • Arabic script typically used for writing the Arabic language, can also be used for writing other languages such as Persian, Urdu, Pashto, Sindhi, and Kurdish. Arabic script appears as cursive writing, whether handwritten or printed. The resulting handwritten traditions are such that the same letter may be written in different forms, depending upon how the character joins with neighboring characters. Typically, the encoding is such that a total of 54-58 graphemes are encoded as approximately 109 allographs according to the present invention. These graphemes include 28 basic characters, 8 to 12 additional characters, 10 numerals, and 8 vowel characters. According to prior art such as Unicode, each letter receives only one Unicode character value, independent of the form.
  • the Unicode standard encodes characters, and the characters are resident as memory strings in internal memory or on disk storage.
  • a character When a character is rendered on a screen, it is called a glyph.
  • a character can render as one glyph, or part of one glyph.
  • the specific glyph corresponding to the appropriate letter form is determined by the character context.
  • a repertoire of glyphs in Unicode comprises a font. Referring to Figure 1 , the Unicode standard, as an example, contains information wherein characters are encoded, and stored on a disk as characters in a graphemic format.
  • the characters are taken from memory and subjected to a text conversion/rendering process wherein a set of conversion rules and a font are applied to the characters on the disk, to render a document on the screen.
  • the document on the screen is essentially displayed in real time as a series of glyphs.
  • the present invention treats documents in a significantly different manner than the prior art text handling system such as that which is discussed above.
  • a document which is stored on disk or in memory as a plurality of characters in a graphemic format is subjected to a conversion process wherein the graphemes are converted by a series of conversion rules into allographs, according to predetermined graphotactics.
  • the resulting document is then stored on the disk as a series of allographs.
  • a text rendering process is performed, wherein graphotactics are not used. Only one-to-one mapping is performed on the document to the screen, using a font.
  • the conversion occurs only once per document, while the rendering occurs every time a user accesses the document.
  • the conversion/rendering process wherein a grapheme is converted to a glyph occurs every time the document is accessed. This results in slower system operation and rendering time.
  • the present invention solves this problem by using the first as well as the second halves of the extended ASCII table, which consists of 8 bits, thereby using space typically reserved for the ASCII character set.
  • This provides the invention with the number of locations needed to represent at least the 109 allographs needed for good quality Arabic text publishing. This representation results in 8 bit encoding of the allographs.
  • a character table according to the present invention is shown in Figure 6.
  • the invention is directed to a method of and apparatus for displaying text from a computer memory onto a computer display, with the method comprising the steps of storing the text data in a graphemic format, then converting the graphemic text data, which is made up of individual characters, to allographic text data made up of individual characters, using at least 8-bit encoding.
  • the allographic text data is then stored in a suitable memory.
  • the final step is rendering the text on a display using a font. It should be noted that the present invention utilizes a rendering step wherein the allographic text data is mapped to glyphs in a one-to-one mapping configuration.
  • the step of converting from graphemes to allographs includes a step of appropriately Preformatting the text for right-to-left display.
  • Documents which are prepared in this fashion can be made available for accessing by any computer on the internet through normal web server technology.
  • a client or computer wishing to view the Arabic document would access the particular page by entering in the address or "URL" for the page.
  • the specific Arabic character information or font information can be loaded as a dynamic font with the page information, so that the client computer receives the necessary encoding information to properly view the Arabic characters.
  • the specific Arabic character information can be downloaded separately.
  • Figure 3 is an illustration of Arabic text rendered as glyphs on a computer screen through an internet browser, with the Arabic text having been encoded according to the present invention.
  • the text is, as noted previously, treated in a textual format, and is capable of being searched, edited, and linked.
  • Figure 4 illustrates the text as stored on disk as allographs. When subjected to the text rendering process and font information illustrated in Figure 2, the glyphs of Figure 3 are rendered. If the text rendering process were not performed using the appropriate font, the allographs would appear on the screen as illustrated in Figure 4.
  • the downloading of the font or the supply of the dynamic font to the computer upon which the Arabic text is rendered ensures appropriate viewing.
  • An example of the conversion rules for converting the graphemes to allographs are shown in the attached code page.
  • the code page, written in Visual Basic (tm) includes data regarding features of the characters that are used by the conversion rules to determine the allographs.
  • the letter "gayn" has the features of being connectable to other letters, does not act as a space, is not a vowel, and has a different form and therefore a different code for when the letter appears initially, medially, finally, or alone.
  • complex determining software such as Arabic Windows (tm) which attempts to determine an appropriate form based upon a character's position, is not necessary.
  • the invention is directed to a method and a computer system which displays text from a computer readable medium onto a computer display.
  • the invention provides a new and unique way of preparing and viewing HTML documents which have non-Roman characters, such as Arabic.
  • non-Roman characters such as Arabic.
  • False, False, False, 63, 63, 63, 63, 63 Call defineletter (shadda. False, False, False, 63, 63, 63) Call defineletter (sukuun. False, False, False, 63, 63, 63)

Abstract

A method and apparatus for displaying text from a computer readable medium onto a computer display stores text data in a graphemic format (21), then converts (23) the graphemic text data, being made up of individual characters, to allographic text data (24) which is also made up of individual characters, using at least 8 bit encoding. The allographic text data (24) is then stored. The allographic text data (24) is then rendered as text on a display using a font.

Description

TITLE OF THE INVENTION:
PREFORMATTED ALLOGRAPHIC ARABIC TEXT FOR HTML DOCUMENTS BACKGROUND OF THE INVENTION: Field of the Invention:
The present invention relates to a method and apparatus for rendering allographic Arabic text on internet or HTML (hyper-text markup language) documents such as home pages and web sites. Description of the Related Art: Typical home pages and web sites which are currently in use are designed for Roman or Latin script languages such as English. Arabic script, however, is very different from Roman script in a number of ways. In addition to the significant differences in the shape of the characters, Arabic is written from right to left instead of from left to right, which causes, of course, significant differences with respect to justification and line wrapping. Additionally, the form of Arabic letters depends upon the position of the letter within the word. The same Arabic letter may have a different form depending upon whether the letter or character is a first character, a middle character, or an end character, or if the character stands alone. The forms are typically referred to as initial, medial, final, and stand-alone. These multiple forms are referred to as allographs. The rules for mapping a letter or grapheme into its allographs are called graphotactics. Conventional solutions for representing Arabic characters include treating them either as graphics or as text. Commonly, the most current solution is to treat Arabic text documents as graphical images. Such a solution is platform independent, due to the fact that pixel-handling is a platform independent phenomenon. However, since such graphical representation of characters requires a significant amount of information and calculation, a significant amount of memory is necessary, and a significant amount of time is necessary in order to render the characters. Additionally, graphical representation of text makes it difficult or impossible for the text to be appropriately searched by a search engine, or linked to other sites. The treatment of Arabic characters as text would seem to be the most desirable solution. A text based system is efficient with respect to memory space and rendering time, and makes it possible to search and link as with other text documents. Existing encodings of Arabic include ISO (International Standards Organization), ASMO (Arab Standards and Metrology Organization), CP-1256 (Microsoft (tm) encoding for Arabic Windows (tm)), and Unicode. However, the current state of the art is such that a specific platform is needed in order to either create or view documents in Arabic. Each of the known encodings discussed above requires a special hardware or software configuration which must be prepared before any Arabic reading or writing can take place. Every letter is represented as one character regardless of its position, and a specialized operating system or program is capable of dealing with appropriate right-left directional representation and graphotactics. The platform-dependent nature of this solution, however, creates significant limitations on its viability. While home pages or documents can be created using, for example, Arabic Windows (tm), such documents can only viewed by computers having Arabic Windows (tm), or having a localized version of a browser capable of dealing with the directional representation and graphotactics. SUMMARY OF THE INVENTION:
The present invention, therefore, is a method of treating Arabic text as text, in a manner which is platform independent, efficient from a memory and CPU speed perspective, is compatible with any other Arabic representation system, and enables benefits to be realized from advances in Roman script web typography. These advantages are created by a method wherein Arabic script is considered and handled in an allographic manner; every form of a particular Arabic letter is considered to be a different character to solve the graphotactics problem, and the text is preformatted in order to solve the problems which are typically associated with the right-left direction problem. The present invention, therefore, allows Arabic to be treated by browsers as if it were English. The invention encodes Arabic text in such a way that viewing would require a font only. The invention has been implemented in a special text editor which essentially converts a conventional computer into an Arabic text editor. HTML pages which are created using the invention are viewable by any computer that has access to the necessary font. The font can either be downloaded to the machine or embedded in the HTML document as an object, otherwise known as a dynamic font. The present invention, therefore, includes a novel configuration for encoding Arabic text.
The invention is embodied, therefore, as a method of displaying text from a computer readable medium, and an apparatus for displaying text from a computer readable medium, with the text being displayed onto a computer display. The method comprises the steps of storing text data in a graphemic format, and converting the graphemic text data comprising individual characters to allographic text data comprising individual characters using at least 8 bit encoding. The allographic text data is stored on a computer readable medium, and the allographic text data is then rendered on a display using a font. In a preferred embodiment, the text data is stored as Arabic script characters. Furthermore, the rendering step can comprise one-to-one mapping of the allographic text data to glyphs. An apparatus according to the present invention includes a storing device for storing text data in a graphemic format, converting means for converting the graphemic text data, using at least 8 bit encoding, and a second storing device is provided for storing the allographic text data. Rendering means are provided for rendering the allographic text data as text on a display using a font. BRIEF DESCRIPTION OF THE DRAWINGS: Figure 1 illustrates a text rendering process according to the prior art;
Figure 2 illustrates a text rendering process according to the present invention;
Figure 3 illustrates Arabic text as rendered according the present invention; Figure 4 illustrates the text of Figures 3, but without having the necessary font applied;
Figure 5 is a conventional ASCII table; and Figure 6 is a table showing 8 bit representations of Arabic characters according to the invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS:
In order to more clearly understand the present invention, a brief discussion of textual representation and terminology is appropriate. Writing systems, in any language, are broken down into units; graphemes are known as the smallest units in any writing system which are capable of causing a contrast in meaning. In the English alphabet, therefore, switching from "hook" to "book" introduces a meaning change. Therefore, "h" and "b" are each graphemes. There is no prescribed form for a grapheme. The grapheme for a particular letter or sound may appear as a capital letter, a small letter, a parenthetical notation, or other form depending upon the particular handwriting style or type face which is chosen. Each of the possible forms for a grapheme is known as a graph. When graphs are considered to be variations of a particular grapheme, they are considered to be "allographs."
Arabic script, typically used for writing the Arabic language, can also be used for writing other languages such as Persian, Urdu, Pashto, Sindhi, and Kurdish. Arabic script appears as cursive writing, whether handwritten or printed. The resulting handwritten traditions are such that the same letter may be written in different forms, depending upon how the character joins with neighboring characters. Typically, the encoding is such that a total of 54-58 graphemes are encoded as approximately 109 allographs according to the present invention. These graphemes include 28 basic characters, 8 to 12 additional characters, 10 numerals, and 8 vowel characters. According to prior art such as Unicode, each letter receives only one Unicode character value, independent of the form. The Unicode standard encodes characters, and the characters are resident as memory strings in internal memory or on disk storage. When a character is rendered on a screen, it is called a glyph. A character can render as one glyph, or part of one glyph. The specific glyph corresponding to the appropriate letter form (initial, medial, final, etc.) is determined by the character context. A repertoire of glyphs in Unicode comprises a font. Referring to Figure 1 , the Unicode standard, as an example, contains information wherein characters are encoded, and stored on a disk as characters in a graphemic format. When text is sought to be rendered on the screen, the characters are taken from memory and subjected to a text conversion/rendering process wherein a set of conversion rules and a font are applied to the characters on the disk, to render a document on the screen. The document on the screen is essentially displayed in real time as a series of glyphs.
Referring to Figure 2, however, it can be seen that the present invention treats documents in a significantly different manner than the prior art text handling system such as that which is discussed above. According to the invention, a document which is stored on disk or in memory as a plurality of characters in a graphemic format is subjected to a conversion process wherein the graphemes are converted by a series of conversion rules into allographs, according to predetermined graphotactics. The resulting document is then stored on the disk as a series of allographs. When a user accesses the document for the purposes of reading it, a text rendering process is performed, wherein graphotactics are not used. Only one-to-one mapping is performed on the document to the screen, using a font. According to this process, the conversion occurs only once per document, while the rendering occurs every time a user accesses the document. According to the prior art of Figure 1 , however, the conversion/rendering process wherein a grapheme is converted to a glyph occurs every time the document is accessed. This results in slower system operation and rendering time.
Attempts have been made to utilize a process similar to the process of Figure 2, however using only 7 bits of a conventional 8 bit code page such as an extended ASCII table. A conventional extended ASCII table is shown in Figure 5. Specifically, the allographic characters in this previous attempt were assigned locations in the second half of the extended ASCII table, keeping the first half of the table identical to the ASCII character set. Since appropriate Arabic text requires a minimum of 109 allographs, however, this 7 bit attempt was too limited. It ignored specific characters such as vowel marks, and required combining of certain letter forms. This resulted in a system with limited Arabic text representation ability. For example, Koranic text requires more ability than this 7 bit system provided. The present invention solves this problem by using the first as well as the second halves of the extended ASCII table, which consists of 8 bits, thereby using space typically reserved for the ASCII character set. This provides the invention with the number of locations needed to represent at least the 109 allographs needed for good quality Arabic text publishing. This representation results in 8 bit encoding of the allographs. A character table according to the present invention, therefore, is shown in Figure 6.
The invention, therefore, is directed to a method of and apparatus for displaying text from a computer memory onto a computer display, with the method comprising the steps of storing the text data in a graphemic format, then converting the graphemic text data, which is made up of individual characters, to allographic text data made up of individual characters, using at least 8-bit encoding. The allographic text data is then stored in a suitable memory. The final step is rendering the text on a display using a font. It should be noted that the present invention utilizes a rendering step wherein the allographic text data is mapped to glyphs in a one-to-one mapping configuration.
Additionally, in the case of Arabic text, the step of converting from graphemes to allographs includes a step of appropriately Preformatting the text for right-to-left display. Documents which are prepared in this fashion can be made available for accessing by any computer on the internet through normal web server technology. A client or computer wishing to view the Arabic document would access the particular page by entering in the address or "URL" for the page. When the page begins loading, the specific Arabic character information or font information can be loaded as a dynamic font with the page information, so that the client computer receives the necessary encoding information to properly view the Arabic characters. In the alternative, the specific Arabic character information can be downloaded separately. Any computer capable of viewing documents with dynamic fonts, therefore, will be capable of viewing the Arabic documents, and will not require a specific hardware/software platform or any plug-ins. Figure 3 is an illustration of Arabic text rendered as glyphs on a computer screen through an internet browser, with the Arabic text having been encoded according to the present invention. The text is, as noted previously, treated in a textual format, and is capable of being searched, edited, and linked. Figure 4 illustrates the text as stored on disk as allographs. When subjected to the text rendering process and font information illustrated in Figure 2, the glyphs of Figure 3 are rendered. If the text rendering process were not performed using the appropriate font, the allographs would appear on the screen as illustrated in Figure 4. The downloading of the font or the supply of the dynamic font to the computer upon which the Arabic text is rendered ensures appropriate viewing. An example of the conversion rules for converting the graphemes to allographs are shown in the attached code page. The code page, written in Visual Basic (tm), includes data regarding features of the characters that are used by the conversion rules to determine the allographs. For example, the letter "gayn" has the features of being connectable to other letters, does not act as a space, is not a vowel, and has a different form and therefore a different code for when the letter appears initially, medially, finally, or alone. Because the present invention is capable of treating the different forms of each letter as a different character, complex determining software, such as Arabic Windows (tm) which attempts to determine an appropriate form based upon a character's position, is not necessary.
As noted previously, the invention is directed to a method and a computer system which displays text from a computer readable medium onto a computer display. In view of the recent popularity of internet publishing and HTML documents, the invention provides a new and unique way of preparing and viewing HTML documents which have non-Roman characters, such as Arabic. Although the above discussion of the embodiments of the invention are illustrative in nature, and it would be evident to a person of ordinary skill in the art that a number of modifications could be made to the invention while still remaining within the spirit and scope of the invention. For example, although Arabic text and Arabic characters are discussed, the invention could be used on other languages or other character sets which have similarities to Arabic text and Arabic characters.
For a more clear understanding of the metes and bounds of the present invention, reference should be made to the appended claims.
Global Const kaaf% = 223
Global Const laam% = 225
Global Const iim% = 227
Global Const nuun% = 228
Global Const haa% = 229
Global Const waw% = 230
Global Const alifmaqsura% = ■ 236
Global Const yaa% = 237
Global Const fat a% = 243
Global Const kasra% = 246
Global Const darπa% = 245
Global Const fathatanwiin% = 240
Global Const kasratanwiin% = 242
Global Const daiπmatanwiin% = 241
Global Const shadda% = 248
Global Const sukuun% = 250
Global Const comma% = 161 Global Const se icolon% = 186 Global Const period% = 220 Global Const questmark% = 191 Global Const exclamark% = 33
Global Const laam_alif% = 1 Global Const laam_alifmadda% = 2 Global Const laam_alifhamzaup% = 3 Global Const laam_alifham2adn% = 4
Global Const qaaf_hamza% = 5 Global Const giim% = 6 Global Const vii% = 7 Global Const pii% = 8 Global Const oo% = 9 Global Const ee% = 11 Global Const alif_taa% = 12 Global Const alif_saad% = 14
Global Const ctaamark% = 15 Global Const revshadda% = 16
'Copyright (c) 1997 Nizar Habash 'Universiy of Maryland at College Park 'All Rights Reserved
Sub defineletter (x, xconnects, xisspace, xisvowel, xinit, xmid, xfinal, xalone) letter (x) .connects = xconnects letter (x) .isspace = xisspace letter (x) .isvowel = xisvowel letter (x) .init = xinit letter (x) .mid = xmid letter (x) .final = xfinal letter (x) .alone = xalone
End Sub
Sub defineletters () 'Copyright (c) 1997 Nizar Habash 'Universiy of Maryland at College Park 'All Rights Reserved ' initiate
For x = 1 To 255
Call defineletter (x. False, True, False, x, x, x, x) Next
' defineletter (ha za, connects , isspace, isvowel , init,mid, final , alone)
Call defineletter (13, False, True, False, 13, 13, 13, 13)
Call defineletter (hamza, False, True, False, 136, 136, 136, 136)
Call defineletter (alifmadda, False, False, False, 68, 69, 69, 68)
Call defineletter (alifhamzaup, False, False, False, 66, 67, 67, 66)
Call defineletter (wawhamza, False, False, False, 140, 140, 140, 140)
Call defineletter (alifhamzadn, False, False, False, 70, 71, 71, 70)
Call defineletter (yaahamza. True, False, False, 137, 137, 138, 139)
Call defineletter (alif, False, False, False, 64, 65, 65, 64)
Call defineletter (baa. True, False, False, 72, 72, 73, 73)
Call defineletter (taamarbuta, False, False, False, 132, 133, 133, 132)
Call defineletter (taa, True, False, False, 74, 74, 75, 75)
Call defineletter (thaa, True, False, False, 76, 76, 77, 77)
Call defineletter (jii , True, False, False, 78, 78, 79, 80)
Call defineletter (chaa, True, False, False, 81, 81, 82, 83)
Call defineletter (khaa, True, False, False, 84, 84, 85, 86)
Call defineletter (daal, False, False, False, 87, 87, 87, 87)
Call defineletter (dhaal, False, False, False, 88, 88, 88, 88)
Call defineletter (raa, False, False, False, 89, 89, 89, 89)
Call defineletter (zaay, False, False, False, 90, 90, 90, 90)
Call definelette (siin, True, False, False, 91, 91, 92, 92)
Call defineletter (shiin, True, False, False, 93, 93, 94, 94)
Call defineletter (saad, True, False, False, 95, 95, 96, 96) Call defineletter (daad, True, False, False, 97, 97, 98, 98)
Call defineletter (ctaa, True, False, False, 99, 99, 99, 99) Call defineletter (dhaa, True, False, False, 100, 100, 100, 100)
Call defineletter (ayn, True, False, False, 101, 102, 103, 104) Call defineletter (gayn, True, False, False, 105, 106, 107, 108)
Call defineletter (faa, True, False, False, 109, 110, 111, 111) Call defineletter (qaaf, True, False, False, 112, 113, 114, 114)
Call defineletter (kaaf, True, False, False, 115, 115, 116, 116)
Call defineletter (laam, True, False, False, 117, 117, 118, 118)
Call defineletter (miim, True, False, False, 119, 119, 120, 120)
Call defineletter (nuun, True, False, False, 121, 121, 122, 122)
Call defineletter (haa. True, False, False, 123, 124, 125, 126)
Call defineletter (waw, False, False, False, 128, 128, 128, 128)
Call defineletter (alifmaqsura, False, False, False, 135, 134, 134, 135)
Call defineletter (yaa, True, False, False, 129, 129, 130, 131)
Call defineletter (fatha, False, False, True, 170, 170, 170, 170) Call defineletter (kasra, False, False, True, 172, 172, 172, 172) Call defineletter (dama, False, False, True, 171, 171, 171, 171) Call defineletter (fathatanwiin, False, False, False, 63, 63, 63, 63) Call defineletter (kasratanwiin, False, False, False, 63, 63, 63, 63) Call defineletter(dammatanwiin. False, False, False, 63, 63, 63, 63) Call defineletter (shadda. False, False, False, 63, 63, 63, 63) Call defineletter (sukuun. False, False, False, 63, 63, 63, 63)
Call defineletter (comma, False, False, False, 63, 63, 63, 63) Call defineletter (semicolon, False, False, False, 63, 63, 63, 63) Call defineletter (period, False, False, False, 63, 63, 63, 63) Call defineletter (questmark, False, False, False, 63, 63, 63, 63)
Call defineletter (spacebar, False, True, False, 32, 32, 32, 32)
Call defineletter (laam_alif, False, False, False, 141, 142, 142, 141) Call defineletter (laam_alifmadda, False, False, False, 145, 146, 146, 145) Call defineletter (laam_alifhamzaup, False, False, False, 143, 144, 144, 143) Call defineletter (laam_alifhamzadn, False, False, False, 147, 148, 148, 147)
Call defineletter (qaaf_hamza, True, False, False, 149, 150, 151, 151) Call defineletter (giim, True, False, False, 152, 152, 153, 154) Call defineletter (vii, True, False, False, 155, 155, 156, 159) Call defineletter (pii. True, False, False, 162, 162, 163, 163) Call defineletter (oo, False, False, False, 164, 164, 164, 164) Call defineletter (ee, True, False, False, 165, 165, 166, 167) Call defineletter (alif_taa, False, True, False, 168, 168, 168, 168) Call defineletter (alif_saad, False, False, True, 169, 169, 169, 169)
Call defineletter (ctaamark, False, False, True, 182, 182, 182, 182) Call defineletter (revshadda, False, False, True, 181, 181, 181, 181)
End Sub
Function flip (x As String) y = " "
For i = en(x) To 1 Step -1 y = y + Mid(x, i, 1) Next flip = y
End Function
Function flip2 (x As String) x = flip(x) done = False
While Not done pos = InStrd, x, Chr(13)) If pos = 0 Then y = x + Chr(13) + Chr(lO) + y done = True Else
If pos = Len(x) Then y = Mid(x, 1, pos - 1) + Chr(13) + Chr(lO) + y done = True Else y = Mid(x, 1, pos - 1) + Chr(13) + Chr(lO) + y x = Mid(x, pos + 1, Len(x) - pos) End If End If
Wend flip2 = y
End Function Function substitute (a, x. b)
If (a = -1) Then before_connect = False Else befαre_coπnect = letter(a) .connects End If
If (b = -1) Then after_space = True Else after_space = letter (b) .isspace
End If
If (before_cαπnect = False) Then If (a tsr_space = False) Then substitute = lette (x) . ir.it
Exit Function Else substitute = letter (x) .alone
Exit Function End If Else
If ( fter_space = False) Then substitute = letter(x) .mid
Exit Function Else substitute = letter (x) .final
Exit Function End If
ic r nction
Function catchpattem (a, b) c a -1 If a = 13 And b = 10 Then c = 13
If a = laam Then Select Case b Case alif: c = laam_alif
Case alifhamzaup: c = laaπ aiifhamzaup
Case alifhamzan: c = laam_alifhamzadn
Case alifma da: c = laam_ali±πadda End Select
If a = σaaf And b = exclamar Then c = qaaf_hamza
If a = jii And b = exclamark Then c = giim
If a = faa And b = exclamark Then c = vii
If a = baa And b = exclamark Then c = pii
If a = waw And b = exclamark Then c = oo
If a = yaa And b = exclamark Then c = ee ,
If a = saad And b = exclamark Then c = alif_saac
If a = taa And b = exclamark Then c = alif_taa
If a = shiin And b = exclamark Then c = revshadda
If a = ctaa And b = exclamark Then c = ctaamark
catchnattem = c
End Function
Sub catchpatterns () textone = " " a = Asc (Mid(arabic.Text, 1, 1) )
Far i = 2 To Len( rabic .Text) found = False a = Asc (Mid (arabic . Text , i - 1 , 1 ) ) b = Asc (Mid (arabic . Text , i , 1 ) ) c = -1 c = catchpattem (a, b)
If c > -1 Then found = True
If found Then textone = textone + Chr(c) i = i + 1
If i = Len(arabic.Text) Then textone = textone + Mid(arabic. ext, i, 1) End If Else textone = textone + Chr(a) End If
Next
If found = False Then textone = textone + Chr(b)' End If
End Sub
Sub Commandl_Click ( ) convert textl.Text = texttwo text2.Text = newtext
End Sub
Sub convert ()
' textone = arabic. text texttwo = " "
If Len(arabic.Text) > 0 Then Call catchpatterns
a = -1
For i = 1 To Len(textone) x = Asc (Mid(textone, i, 1))
If i < en(textone) Then b = Asc (Mid(textone, i + 1, 1))
If ( letter (b) . isvowel And i + 2 <= Le ( textone ) ) Then b = Asc (Mid ( textone, i + 2 , 1) )
End If Else b = -1 End If z = substitute (a , x, b) texttwo = texttwo + Chr(z) a = x
Next newtext = flip2 (texttwo) clipboard. SetText newtext End If
End Sub
Sub Form_Load () Call defineletters TITLE. isible = True TITLE.Enabled = True
End Sub
Function nums (x As String) 2 = " "
For i = 1 To Len(x) z = z + Str(Asc(Mid(x, i, 1) ) ) Next
nums = z
End Function
Sub oldconvert ( ) ' textone = arabic . text 'latin.Text = "" texttwo = " "
Call catchpatterns
a = -1
For i = 1 To Len(textone) x = Asc (Mid(textone, i, 1))
If i < Le (textone) Then b = Asc (Mid(textone, i + 1, 1))
If (letter (b) .isvowel And i + 2 <= Len( extone) ) Then b = Asc (Mid(textone, i + 2, 1))
End If Else b = -1 End If z = substitute(a, x, b) texttwo = texttwo + Chr(z) a = x Next
'latin.Text = flip (texttwo)
'Copyright (c) 1997 Nizar Habash 'Universiy of Maryland at College Park 'All Rights Reserved
End Sub

Claims

CLAIMS:
1. A method of displaying text from a computer readable medium onto a computer display, said method comprising the steps of: storing text data in a graphemic format; converting the graphemic text data comprising individual characters to allographic text data comprising individual characters using at least 8 bit encoding; storing the allographic text data on a computer readable medium; rendering the allographic text data on a display using a font.
2. A method as recited in claim 1 , wherein said text data is stored as Arabic script characters.
3. A method as recited in claim 2, wherein the rendering step comprises one-to-one mapping of the allographic text data to glyphs.
4. An apparatus for displaying text from a computer readable medium onto a computer display, said apparatus comprising: a storing device for storing text data in a graphemic format; converting means for converting the graphemic text data comprising individual characters to allographic text data comprising individual characters, using at least 8 bit encoding; a second storing device for storing the allographic text data; rendering means for rendering the allographic text data as text on a display using a font.
5. An apparatus as recited in claim 4, wherein said first storing device stores the text data in a graphemic format as Arabic script characters.
6. An apparatus as recited in claim 5, wherein the rendering means comprises mapping means for one-to-one mapping of the allographic text data to glyphs.
PCT/US1998/025201 1997-12-03 1998-12-03 Preformatted allographic arabic text for html documents WO1999028831A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU17036/99A AU1703699A (en) 1997-12-03 1998-12-03 Preformatted allographic arabic text for html documents

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US6744297P 1997-12-03 1997-12-03
US60/067,442 1997-12-03

Publications (2)

Publication Number Publication Date
WO1999028831A1 WO1999028831A1 (en) 1999-06-10
WO1999028831A9 true WO1999028831A9 (en) 1999-09-16

Family

ID=22076010

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1998/025201 WO1999028831A1 (en) 1997-12-03 1998-12-03 Preformatted allographic arabic text for html documents

Country Status (2)

Country Link
AU (1) AU1703699A (en)
WO (1) WO1999028831A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10147541A1 (en) * 2001-09-26 2003-04-10 Man Nutzfahrzeuge Ag Representation of Greek and Hebrew letters in a display of a motor vehicle
DE10147540A1 (en) * 2001-09-26 2003-04-10 Man Nutzfahrzeuge Ag Representation of Arabic script in a display of a motor vehicle
WO2006021973A2 (en) * 2004-08-23 2006-03-02 Geneva Software Technologies Limited A system and a method for a sim card based multi-lingual messaging application
US7801721B2 (en) * 2006-10-02 2010-09-21 Google Inc. Displaying original text in a user interface with translated text

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4176974A (en) * 1978-03-13 1979-12-04 Middle East Software Corporation Interactive video display and editing of text in the Arabic script
US5412771A (en) * 1992-02-07 1995-05-02 Signature Software, Inc. Generation of interdependent font characters based on ligature and glyph categorizations
US5556282A (en) * 1994-01-18 1996-09-17 Middlebrook; R. David Method for the geographical processsing of graphic language texts

Also Published As

Publication number Publication date
WO1999028831A1 (en) 1999-06-10
AU1703699A (en) 1999-06-16

Similar Documents

Publication Publication Date Title
Bradley The XML companion
US5781714A (en) Apparatus and methods for creating and using portable fonts
Marchal XML by Example
US5500931A (en) System for applying font style changes to multi-script text
AU2003200547B2 (en) Method for selecting a font
EP0664909B1 (en) Text input font system
JPH08509829A (en) Text input translation system
CN105005472B (en) The method and device of Uyghur Character is shown on a kind of WEB
WO2004090743A2 (en) Enhanced readability with flowed bitmaps
CN109933751B (en) Image-text drawing method and device, computer-readable storage medium and computer equipment
WO1999028831A9 (en) Preformatted allographic arabic text for html documents
Powers Beginning Css3
Tauber Character encoding of classical languages
Mudur et al. An architecture for the shaping of Indic texts
Lamport et al. LATEX| User's Guide and Reference Manual"
Shirali-Shahreza Pseudo-space Persian/Arabic text steganography
Erickson et al. Options for presentation of multilingual text: use of the Unicode standard
Adams Internationalization and character set standards
Hardie From legacy encodings to Unicode: the graphical and logical principles in the scripts of South Asia
Lemberg The CJK package for LATEX 2ε—Multilingual support beyond babel
Haentjens The ordering of universal character strings
Cunningham Global and local support dimensions for emerging community languages
Vatton et al. Amaya user manual
Haralambous et al. Methods for processing languages with Ω
Bradley SGML concepts

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: C2

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: C2

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

COP Corrected version of pamphlet

Free format text: PAGES 9-11 AND 14-16, SEQUENCE LISTING, REPLACED BY NEW PAGES 9-11 AND 14-16; PAGES 4/6-6/6, DRAWINGS, REPLACED BY NEW PAGES 4/6-6/6; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase in:

Ref country code: KR

122 Ep: pct application non-entry in european phase