WO1999019814A1 - The utilisation of multi-lingual names on the internet - Google Patents

The utilisation of multi-lingual names on the internet Download PDF

Info

Publication number
WO1999019814A1
WO1999019814A1 PCT/AU1998/000849 AU9800849W WO9919814A1 WO 1999019814 A1 WO1999019814 A1 WO 1999019814A1 AU 9800849 W AU9800849 W AU 9800849W WO 9919814 A1 WO9919814 A1 WO 9919814A1
Authority
WO
WIPO (PCT)
Prior art keywords
name
multilingual
names
coded
ascii
Prior art date
Application number
PCT/AU1998/000849
Other languages
French (fr)
Inventor
Jason Pouflis
Original Assignee
Jason Pouflis
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jason Pouflis filed Critical Jason Pouflis
Priority to AU95240/98A priority Critical patent/AU9524098A/en
Publication of WO1999019814A1 publication Critical patent/WO1999019814A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]

Definitions

  • the present invention relates to the utilisation of multilingual names on the Internet, related networks and computer systems.
  • Multilingual names include domain names, user names, file names, email addresses, newsgroups and Universal Resource Locators (URLs).
  • a method for providing for multilingual names for utilisation on the Internet comprising the steps of: forming an initial multilingual name in a multilingual format; mapping the multilingual name to a corresponding coded name in a reversible manner, the coded name comprising a restricted subset of the ASCII character set; and utilising the corresponding coded name (on the Internet) in place of the multilingual name.
  • the mapping step further comprises adding a predetermined pseudo-root name server to the corresponding coded name, particularly when the name is a domain name, or email address.
  • the mapping can include converting the multilingual name to a corresponding Hexadecimal coded name and representing the Hexadecimal coded name in an ASCII form.
  • the corresponding coded name can be divided into a series of labels with each label having a predetermined portion comprising a control code for the label.
  • the preferred embodiment is ideally utilised in existing or future internet applications, utilities, resources or services.
  • Existing applications include, but are not limited to: web browsers, editors, e.mail, news, telnet, ftp, gopher, WAIS, whois, nslookup, trace, ping, finger, rpc, cgi programs, file names, usernames, and databases .
  • Fig. 1 illustrates the steps in the method of the preferred embodiment . Description of Preferred and Other Embodiments
  • Multilingual names to be represented in limited subsets of the ASCII character set, 2. Names which are compatible with existing software - applications and databases, thus requiring no change to existing software.
  • New Software and changes to existing software to be made that incorporate the processes described, which may replace, or work with existing software.
  • multilingual domain names can be utilised, without changes to existing resolver or name server software.
  • the preferred embodiment is fully backwards compatible with existing systems and does not require any changes to existing software used for processing domain names, user names, file names, email addresses, newsgroups and Universal Resource Locators (URLs).
  • the preferred embodiment allows multilingual names to be written in many languages, even a mix, and then converted to fit into a subset of ASCII characters.
  • a converting program is needed to do the conversion and display of Multilingual names.
  • any program that converts between representations of names is called a converter - this may include resolvers, name servers, web browsers, and any program that carries out the converting process.
  • the preferred embodiment proposes, and address the issues of
  • ASCII characters are in angle brackets eg. ⁇ Jason> [ ]
  • UCS-2 characters are in square brackets eg. [Jason]
  • a multilingual name may be a simple string, or may comprise a number of components that require parsing and interpretation, as part of conversion to a coded name. Components of names may be hierarchically organised from left to right or right to left and may contain other non-hierarchical components.
  • Converters have the choice of converting the entire string, or converting each component, since they are likely to be specialists in their target language market.
  • Converting is at least the reversible transformation of characters from a multilingual set to ASCII, and may comprise parsing of components, substitution of characters, encoding, splitting, control codes indicating the encoding or splitting, or attachment of pseudo-root names.
  • Parsing of multilingual components involves identification of separators. Each separator can now be represented by several characters from several languages.
  • the user may even be given the option of what symbols they would like to use as separator characters. eg. instead of "@”, it is possible to choose " at ", so that a corresponding example email address would be "Jason at OneAccount.net”.
  • Parts of a multilingual name may have special meaning, for instance, the file name extension, or protocol to use.
  • a Japanese language user may prefer to see and use the
  • Converters may substitute ASCII characters in place of the synonymous multilingual characters.
  • Base Equivalent Characters may substitute ASCII characters in place of the synonymous multilingual characters.
  • Sigma one only for use at the end of a word, when the word is lowercase.
  • Another type of comparison could be character shape.
  • IBM Latin, Greek or Cyrillic.
  • a language insensitive search could force them all to Latin.
  • Control codes can be attached to a coded name, or to each component of a coded name, to indicate the type of encoding, and the split sequence.
  • a particular example is
  • control codes can be attached to the coded name to indicate the method of encoding.
  • a component of a multilingual name is too long when converted to fit into a single component of a coded name, it may be split across several components of a coded name.
  • Control codes attached to each component of the coded name can indicate which part of a multilingual component it belongs to, ie its order in a split component. This is particularly useful for hierarchical names with limits on the length of components, such as domain names .
  • UCS-2 as Hex in ASCII is an encoding of multilingual names. Its 3 octet control code is ⁇ X-n> where n is an
  • the control code is prepended to the coded component.
  • Each UCS-2 character becomes four ASCII characters in the ranges ⁇ 0>- ⁇ 9>, ⁇ A>- ⁇ F>; representing the value of the
  • UCS-2 character in Hexadecimal.
  • a pseudo-root name is a predetermined name attached to coded hierarchical names, such as newsgroups and domain names, so that they become part of a predetermined hierarchy. By adding the pseudo-root name to all coded names, that branch of the hierarchy effectively becomes the root of a pseudo-hierarchy.
  • a pseudo-root can be made in a part of the hierarchy in which control is exercised.
  • Simple multilingual strings such as user names, might merely be converted to a coded form with a control code attached indicating the encoding method, such as X-0.
  • Strings with components might also have special words substituted with synonymous characters. For instance, a Japanese file name is suffixed by Japanese characters that indicate it is an executable program, these characters may be replaced by the file name extension " . exe" . Newsgroups
  • Newsgroups are also known as Internet News, and Usenet.
  • Coded names can be used as the names of newsgroups, and displayed to users as multilingual newsgroup names.
  • domain names is left to implementors of converters.
  • the implementors, or even the users may select appropriate separator, quote, and escape symbols, along with special words, and the direction of the hierarchy (left to right, right to left, etc.).
  • Each domain label could even be entered in separate text fields, eliminating the need for separate characters. However, it is often easier to write and type a domain name with separating characters.
  • the domain names system is concerned with the format of binary data between resolvers and name servers . Due to compatibility issues, only a limited subset of ASCII is used in labels, the characters -'Z', O-O, 0'- ⁇ 9' and -' . It is an object of the preferred embodiment to allow multilingual domain names to be represented in this subset of ASCII.
  • FIG. 1 A process for representing multilingual domain names can be shown in Fig. 1. 1. Parsing, and Substitution of Special Words 1;
  • Special words may be substituted for selected or typed labels. For instance, replacing the Arabic label for Australia with “au”, or the Thai label for business with “com”.
  • English domain names are case insensitive, so lowercase Latin should be replaced with uppercase. Other languages may have different preferences. Defining the sets of equivalent characters can be left to implementors, and specialists in that language.
  • labels could be made of 8bit (ASCII, IS08859) , l ⁇ bit (UCS-2), 32bit (UCS-4), or variable length characters (UTF-8, UTF-7). Labels could even be made of other data, such as bitmaps (pictures), or sound data.
  • the preferred method of encoding is UCS-2 to Hex in ASCII, as it is fully compatible with existing DNS tools.
  • each UCS-2 character maps to 4 ASCII characters, any label that is longer than 15 UCS-2 characters must be split, so that it fits into the maximum label length of 63 octets. It is further recommended that labels which are 15 UCS-2 characters long, should be split with a coded blank second part. This allows for separation of control of the common part of a shared domain label, as will be further explained below.
  • a pseudo-root domain name is added to the coded domain name, for the reasons mentioned in "Pseudo-Root Names".
  • Name servers for the pseudo-root may be specialised for the processing of names in a particular encoding, or language.
  • the recommended pseudo-root domain name to add is ⁇ X-X>. ⁇ NET>. That is, "X-X.NET.” . 5. Presenting coded form of name
  • a converter may have to present the coded form in a way which is useable by applications.
  • the traditional way is specified in RFC1035 - labels separated by dots, with the highest level label to the right.
  • Converters that query the DNS themselves, may not need to concatenate the labels into a contiguous string.
  • Email mailboxes and addresses can use a larger part of the ASCII character set than DNS.
  • an email address comprises a mailbox name (local part) at a domain name .
  • a multilingual email address could be formed in some other way, using the languages own symbols for addressing. For instance, [Jason at Home, Australia] instead of Jason@HOME.AU. Converters or mail programs are responsible for processing the email addresses correctly.
  • Multilingual addresses could be processed in a number of ways: 1. Parsed, coded and sent to a mailbox at a domain Parsed
  • URLs encompass file names, newsgroups, domain names, email, and many other names.
  • a larger part of the ASCII character set is available for names, and encoding of octets is provided for.
  • the schemes that URLs encompass remain restricted in the characters they can use, so there is a need for coded multilingual URLs.
  • Substitution of special words and symbols URLs are currently defined for the US-ASCII character set. Multilingual users may prefer to use symbols from their own language, in place of the specific scheme names, reserved and special characters. Converters would then parse these symbols and replace them with the US-ASCII symbols.
  • Schemes that use Internet protocols are formatted as: " ⁇ scheme> : // ⁇ user>: ⁇ password>@ ⁇ host>: ⁇ port>/ ⁇ url-path>” .
  • Multilingual scheme should be parsed into a coded form like this. Conversion of components using the UCS-2 as Hex in ASCII can be applied to the user name, password, and host name (which is a usually a domain name) , and components of the url-path.
  • Multilingual port numbers should be converted into synonymous ASCII number, if written as a non-ASCII number such as in Chinese or Sanskrit numerals.
  • the url-path may be further parsed, and broken down into special and reserved characters, path names, file names, search, argument names, and argument values.
  • the method of the preferred embodiment can take many different forms of implementation, for example, as follows: Stand Alone Converter
  • This form takes in a multilingual name, and outputs a coded name as an ASCII string, or some other representation.
  • the converter may be created to work for particular kinds of names, such as URLs or email addresses, and/or to work with particular applications, such as web browsers .
  • Converters may have controls to, or automatically, send the ASCII string to relevant applications. They may allow a user to copy and paste to and from their applications . Incorporated into applications
  • the conversion function may be incorporated into the applications such as browsers, editors, email, telnet, ftp, and news. Plug-in or add-on to application
  • the converter may be a program or library that plugs- in or adds onto the existing applications, providing the application with the added multilingual name functionality.
  • the converter may take the form of a control that the application can use. Examples are Web pages that include javascript, Java controls, or Active-X controls.
  • Such controls and plug-ins may replace, or overlay a browsers current URL entry field, with a multilingual name field. This field both displays the multilingual name, and allows entry of multilingual URLs.
  • Coded names are passed back and forth from converter to browser.
  • Web Page interfaces to converter A converter may run on a web server, with access to the converter being provided through multilingual web pages. Users access a multilingual URL/domain name service such as "http://X-X.NET/". If their browser requests a particular language, a web page in that language is provided (if available) , otherwise a multilingual page is provided.
  • the web page can typically provide a form, so that the user may type in a multilingual URL. Users may select common parts from lists such as the encoding scheme, organisation type, and country. These lists may have defaults on a per user, or per language basis.
  • the converter server has several options:
  • Multilingual Registries may also provide a web interface to provide for registration of multilingual names, such as domain names and email addresses. Converter packaged with other facilities
  • Converters may be packaged with other facilities. For instance, a program may parse a multilingual name in several ways, and perform several searches such as DNS lookup, whois search, and web page search. It might present information to a user, or return specific information to a client application.
  • Resolvers The resolver accepts the multilingual name direct from applications, but then converts it before querying name servers. Resolvers may query name servers for both the binary and sub-ASCII representations of the multilingual domain name. The resolver may also try variations on the name.
  • the name server When performing recursive queries, the name server accepts sub-ASCII or binary multilingual domain names; and queries other name servers with sub-ASCII or binary Multilingual domain names.
  • the name server may convert from binary name to another format before querying its database and may return answers for either form.
  • the name server may respond with additional records for binary or sub-ASCII forms (including CNAME and A records) that match, or are variations of, the queried name. For example, if there are minor spelling errors, if they differ only in case, or their base equivalent characters are the same.
  • Databases may keep records in binary or sub-ASCII form. Conversion between them, and conversion for client or server programs may be required. Other areas of application The principle of having the first 3 characters in a field represent the encoding scheme can be applied generally. This can be applied to directory services, such as Whois, LDAP, and to search engines, and to databases.
  • the preferred embodiment provides for the representation of Multilingual characters, in more limited character sets.
  • the process includes converting UCS2 to Hex in ASCII, applied to internet names used in the Domain Name System (DNS), email, news and Uniform Resource Locaters.
  • DNS Domain Name System
  • a multilingual domain label is represented in one or more sub-ASCII labels.
  • the first 3 characters identify the label's encoding scheme, leaving a maximum of 60 sub-ASCII characters for encoded data in each domain name label .
  • the first and second characters is the name of the scheme ⁇ X-' ; and the third character identifies the part of the split multilingual label.
  • the name of the pseudo root server "X-X.NET" is attached to the sub-ASCII representation of the multilingual domain name.
  • the pseudo root server is visible in the current domain name space.
  • the first three characters of the local-part identify the local-part's encoding scheme.
  • the domain name follows the rules for DNS.
  • the entire email address is encoded, and sent to the relevant mail server, exchanger or gateway for processing or forwarding.
  • the relevant mail server, exchanger or gateway for processing or forwarding.
  • URL identifies the encoding scheme.
  • the encoding and representation can be implemented in the form of various software devices, such as upgrades or add ons to existing software, incorporation in new software, stand-alone applications, databases, servers, clients, resolvers, name servers.
  • the first three characters identify the encoding scheme to a converter, so that it may display the name in the right character set.
  • These characters mean nothing to existing DNS, E.mail and web systems, simply identifying the name of a domain, mailbox, file or other data. Hence variations utilising different encoding identifiers can also be easily used.
  • This scheme can be designed for temporary use, up until applications and databases, (including name servers and resolvers) become compliant with a multilingual character set such as ISO10646 or Unicode.
  • Multilingual Name - made of non-ASCII characters may be a string of characters, or several labels or fields.
  • URLs Universal Resource Locators
  • Coded Name - a string, or fields, of ASCII characters that represent a Multilingual name in some encoding.
  • Converter any program that converts from one representation of names to another. Especially, converting from UCS-2 to Hex in ASCII and back. Converters may incorporate resolvers, and other functions such as substitution for equivalent characters.
  • ASCII A character set that contains the English Alphabet, Arabic Numerals, punctuation marks and some computer control codes. There a several varieties of ASCII
  • Sub-ASCII - The limited subset of the ASCII character set that has been used in domain names: ⁇ A'- ⁇ Z', ⁇ a'- ⁇ z', ⁇ O'-O , and ' - ' (dash) .
  • UCS Universal multi-byte Character Set encodings of ISO10646 and Unicode, which cover most living languages.
  • UCS-2 is 2 bytes (16 octets)
  • UCS-4 is 4 bytes.
  • Equivalent Characters - characters that are mapped to the same base character by a program. In English ⁇ A' and a' differ only in case. To case insensitive programs, such as DNS, they are equivalent. In other languages, equivalent characters may differ in other ways. Eg. In Greek, there are two lowercase sigmas; one for use at the end of a word. Developers of programs for different language markets are specialists in these areas; they decide on which characters are equivalent.
  • Domain Name a name upto 255 octets made of several labels, one for each level in the hierarchy, "www.x-x.net.” is a domain in the "x-x.net.” domain in the "net.” domain.
  • the DNS store information related to domain names. Label - part of a domain name, upto 63 octets.
  • DNS The Domain Name System. A distributed database that is accessed by resolvers asking name servers. The DNS stores computer's names, IP addresses, and more. See RFC 1034, 1035 and others.
  • IP address - A 4 byte internet network address.
  • Resolver - a program that applications use to query the DNS.
  • a resolver in turn asks Name Servers for information.
  • Name Server - a name server has information about its domain that it gives to resolvers and other name servers. If it doesn't know it may query other name servers. Root Name Servers - the name servers at the top of all hierarchies . Pseudo-Root Name Servers - some application may add a predetermined name to all of their domain name queries, making it seem as if that name server is at the top of all hierarchies .
  • RFC - Request for Comments documents describe how the internet works. The Internet Engineering Task Force draws internet standards from the list of RFCs . Introduction to the Domain Name System (DNS) By way of introduction to the internet's Domain Name System, we illustrate with an example.
  • DNS Domain Name System
  • a superannuation web page URL is "http : //www . superannuation . net/index . htm” .
  • www . superannuation . net is a domain name, that is the name of the computer on which the page is kept. That computer's IP address (internet number) must be found to get the page. This is done by asking the DNS.
  • the web browser asks a DNS Resolver to find the IP address of the domain name.
  • the Resolver asks the local name server for the address. If the local name server doesn't know, it then tracks down the address by asking other name servers.
  • the local name server asks the net . domain name server where the superannuation. net . domain name server is. Then it asks this subdomam name server for the IP address of the domain name ww . superannuation . net , which is 105.42.3.5 ( ust an example address).
  • the local name server then tells the resolver the IP address, which in turn informs the web browser.
  • the web browser now asks the computer at that IP address for the web page via http: "//www. superannuation. net/index. htm" .
  • domain name labels may contain up to 63 octets of binary data. It is suggested that the names be made from the characters A-Z, 0-9 and - dash, a restricted subset of US ASCII, so that legacy applications keep working.
  • RFC882 Format of ARPA Internet Text Messages defines internet mail, and specifies the format of email addresses.
  • RFC1035 Domain Names - Implementation and Specification defines the DNS protocol, and specifies a format for domain names as a sequence of labels separated by dots. Labels begin with a letter, and may contain characters from ⁇ A'- ⁇ Z', ⁇ a'- ⁇ z', ⁇ 0'- ⁇ 9' and -' dash.
  • RFC1123 Requirement for Internet Hosts allows domain labels to begin with letters or numbers.
  • RFC1738 Uniform Resources Locators (URL) specifies the format of URLs, in a subset of US-ASCII that permits binary data as octets represented by %HH, where H is 0-9, A-F more commonly known as ⁇ Hex in ASCII' .
  • RFC2152 UTF-7 A mail safe transformation format for Unicode specifies methods for encoding Unicode into mail messages, but not for mail addresses, domain names, nor URLs.
  • RFC2181 Clarifications to the DNS Specification clarifies that ⁇ any binary string whatever can be used as the label' .
  • RFC2070 Internationalisation of the Hypertext Markup Language is one of many RFCs, that describe multilingual documents, but do not address the issue of DNS, email or URLs.
  • RFC1468 for Japanese, RFC1557 for Korean, RFC1922 for Chinese specify encodings for these character sets, that begin with escape sequences. It would be appreciated by a person skilled in the art that numerous variations and/or modifications may be made to the present invention as shown in the specific embodiments without departing from the spirit or scope of the invention as broadly described. The present embodiments are, therefore, to be considered in all respects to be illustrative and not restrictive.

Abstract

A method for providing for multi-lingual names for use on the Internet, related networks, and computers is disclosed, the method comprising the steps of: forming an initial multi-lingual name in a multi-lingual format; mapping the multi-lingual name to a corresponding coded name in a reversible manner, the coded name comprising a restricted subset of the ASCII character set; utilising the corresponding coded name on the Internet, related networks and computers in place of the multi-lingual name. Preferably the mapping step further comprises adding a predetermined pseudo-root name server to the corresponding coded name. The mapping can include converting the multi-lingual name to a corresponding hexadecimal name and representing the hexadecimal name in an ASCII form. The corresponding coded name can be divided into a series of labels with each label having a predetermined portion comprising a control code for the label. The preferred embodiment is ideally utilised in existing or future Internet applications, utilities, resources or services. Existing uses include, but are not limited to: web browsers, editors, e-mail, ness, telnet, ftp, gopher, WAIS, whois, nslookup, trace, ping, finger, rpc, cgi programs, usernames, and databases. When performing queries the name server may respond with additional records for binary or sub-ASCII forms that match, or are variations of, the queried name. For example, if there are minor spelling errors, if they differ only in case, or their base equivalent characters are the same.

Description

The Utilisation of Multi-Lingual Names on the Internet Field of the Invention
The present invention relates to the utilisation of multilingual names on the Internet, related networks and computer systems. Multilingual names include domain names, user names, file names, email addresses, newsgroups and Universal Resource Locators (URLs). Background of the Invention
In recent times, the internet has undergone an explosive growth in utilisation. The original formation of the internet was based around the utilisation of English language character formats and as such, such formats dominate domain name structures, URLs etc. A large proportion of the world' s population does not utilise the English language as its primary language of communication. Hence, there is a general need for other language's character based formats, for example: Chinese, Arabic, etc. Unfortunately, due to backward compatibility problems, these other language formats have received only restricted utilisation on the Internet. It is desired to expand the use of other languages to fundamental components of the internet being domain names, user names, file names, email addresses, newsgroups and Universal Resource Locators (URLs) . A glossary is provided, along with a brief Introduction to the Domain Name System (DNS) , and references to the most relevant Request for Comments (RFCs) .
Summary of the Invention It is an ob ect of the present invention to provide for an extended use of multilingual names on the internet, related networks and computer systems.
In accordance with a first aspect of the present invention, there is provided a method for providing for multilingual names for utilisation on the Internet, the method comprising the steps of: forming an initial multilingual name in a multilingual format; mapping the multilingual name to a corresponding coded name in a reversible manner, the coded name comprising a restricted subset of the ASCII character set; and utilising the corresponding coded name (on the Internet) in place of the multilingual name.
Preferably the mapping step further comprises adding a predetermined pseudo-root name server to the corresponding coded name, particularly when the name is a domain name, or email address. The mapping can include converting the multilingual name to a corresponding Hexadecimal coded name and representing the Hexadecimal coded name in an ASCII form. The corresponding coded name can be divided into a series of labels with each label having a predetermined portion comprising a control code for the label.
The preferred embodiment is ideally utilised in existing or future internet applications, utilities, resources or services. Existing applications include, but are not limited to: web browsers, editors, e.mail, news, telnet, ftp, gopher, WAIS, whois, nslookup, trace, ping, finger, rpc, cgi programs, file names, usernames, and databases . Brief Description of the Drawings
Notwithstanding any other forms which may fall within the scope of the present invention, preferred forms of the invention will now be described, by way of example only, with reference to the accompanying drawings in which:
Fig. 1 illustrates the steps in the method of the preferred embodiment . Description of Preferred and Other Embodiments
The preferred embodiment discloses processes that allow:
1. Multilingual names to be represented in limited subsets of the ASCII character set, 2. Names which are compatible with existing software - applications and databases, thus requiring no change to existing software.
3. New Software (and changes to existing software) to be made that incorporate the processes described, which may replace, or work with existing software. Using the processes described, multilingual domain names can be utilised, without changes to existing resolver or name server software.
The preferred embodiment is fully backwards compatible with existing systems and does not require any changes to existing software used for processing domain names, user names, file names, email addresses, newsgroups and Universal Resource Locators (URLs).
Existing programs don't need to be changed, however it is expected they will progressively be adapted to make it easy for non-English alphabets to be read and typed in the form of domain names, email addresses, etc.
The preferred embodiment allows multilingual names to be written in many languages, even a mix, and then converted to fit into a subset of ASCII characters. A converting program is needed to do the conversion and display of Multilingual names.
By way of definition any program that converts between representations of names (multilingual name <—> coded name) is called a converter - this may include resolvers, name servers, web browsers, and any program that carries out the converting process.
The preferred embodiment proposes, and address the issues of
1. General Methods that allow a variety and mix of representations of multilingual names;
2. Substitution of Characters for special words, or base equivalent characters;
3. Control Codes that indicate the encoding used, and splitting of names that are too long; a. UCS-2 as Hex in ASCII which is a particular encoding and splitting method; 4. Pseudo-Root Names attached to hierarchical names, to indicate an alternative hierarchy;
5. Application to Names of particular types: strings, newsgroups, domain names, email addresses, and URLs.
6. Forms of Implementation covering software and interfaces .
Conventions
T-he following conventions are used in the following examples.
<> ASCII characters are in angle brackets eg. <Jason> [ ] UCS-2 characters are in square brackets eg. [Jason]
Names with components or a hierarchy have usually been written with separators between the components such as the at symbol @', dot λ .', or slash V. eg. news: "comp. law. patents"; email : "JasonΘOneAccount . net"; URL: "http: //www. OneAccount . net/login. cgi" . Since this invention allows these symbols to be used within components, these symbols only act as separators outside of brackets, eg. "<Jason>@<OneAccount>.<net>";
"<http : >//<www> . <OneAccount> . <net>/<login . cgi>" . General Methods A multilingual name may be a simple string, or may comprise a number of components that require parsing and interpretation, as part of conversion to a coded name. Components of names may be hierarchically organised from left to right or right to left and may contain other non-hierarchical components.
Implementors of converters have the choice of converting the entire string, or converting each component, since they are likely to be specialists in their target language market. Converting is at least the reversible transformation of characters from a multilingual set to ASCII, and may comprise parsing of components, substitution of characters, encoding, splitting, control codes indicating the encoding or splitting, or attachment of pseudo-root names.
Parsing of multilingual components involves identification of separators. Each separator can now be represented by several characters from several languages.
The user may even be given the option of what symbols they would like to use as separator characters. eg. instead of "@", it is possible to choose " at ", so that a corresponding example email address would be "Jason at OneAccount.net".
Substitution of Characters
Special Words
Parts of a multilingual name may have special meaning, for instance, the file name extension, or protocol to use.
A Japanese language user may prefer to see and use the
Japanese characters for ".exe", or "http:".
Converters may substitute ASCII characters in place of the synonymous multilingual characters. Base Equivalent Characters
Sometimes, it is desirable to ignore the case of characters in English, such as for searching or matching names. We call this being case insensitive. To make comparisons, it is usual to force all the characters to upper or lower case. Other languages' alphabets have different rules. For instance, Greek has three forms of
Sigma, one only for use at the end of a word, when the word is lowercase.
Different kinds of comparisons may be done for each alphabet. We therefore define a sets of characters that are equivalent to each other for purposes of comparison. From each set, one character is said to be the Base Equivalent
Character. When making that comparison, equivalent characters are forced to the base equivalent character. For Case Insensitive comparisons on UCS-2, it is preferred that the base character be the earliest character of each set in ISO10646 order, from within the language. This forces Latin, Greek and Cyrillic to uppercase and
Hiragana and Katakana to lowercase. So, for instance, Greek lowercase alpha is substituted with Greek uppercase alpha, but not with Latin "A", nor Cyrillic "aleph".
Another type of comparison could be character shape.
The letters "IBM" could be Latin, Greek or Cyrillic. A language insensitive search could force them all to Latin.
Control Codes Control codes can be attached to a coded name, or to each component of a coded name, to indicate the type of encoding, and the split sequence. A particular example is
UCS-2 as Hex in ASCII.
Method of Encoding When a multilingual name is converted into a coded name, control codes can be attached to the coded name to indicate the method of encoding.
Split Sequence
If a component of a multilingual name is too long when converted to fit into a single component of a coded name, it may be split across several components of a coded name.
Control codes attached to each component of the coded name can indicate which part of a multilingual component it belongs to, ie its order in a split component. This is particularly useful for hierarchical names with limits on the length of components, such as domain names .
UCS-2 as Hex in ASCII
UCS-2 as Hex in ASCII is an encoding of multilingual names. Its 3 octet control code is <X-n> where n is an
ASCII number from <1> to <9>, when it comprises a split component, and <0> when the component is not split. The control code is prepended to the coded component.
Each UCS-2 character becomes four ASCII characters in the ranges <0>-<9>, <A>-<F>; representing the value of the
UCS-2 character in Hexadecimal. An example of UCS-2 to ASCII, not split.
[Jason] -> <X-0004A00610073006F006E>
An example of split ASCII to UCS-2
<X-30065XX-2006EXX-1004F> -> [One] Pseudo-Root Names
A pseudo-root name is a predetermined name attached to coded hierarchical names, such as newsgroups and domain names, so that they become part of a predetermined hierarchy. By adding the pseudo-root name to all coded names, that branch of the hierarchy effectively becomes the root of a pseudo-hierarchy.
This has several useful properties:
1. Separation of Names
Coded names won't be mixed up with normal ASCII names, so it is less confusing for users.
2. Separation of Risk
Technical, business or political changes to the pseudo-root hierarchy names, won't adversely affect the real root or other branches. 3. Separation of processing load
In hierarchical distributed systems, such as DNS, the processing load arising from multilingual names, is allocated to computers serving the pseudo-hierarchy. . Specialisation Pseudo-root hierarchies can specialise in a particular type of encoding or language. Different converters can attach different pseudo-root names, meaning the converter programs and hierarchies can specialise.
5. Politics A pseudo-root can be made in a part of the hierarchy in which control is exercised.
It is recommended that all coded domain names are subdomains of "X-X.NET", and coded newsgroups created under
"alt.x-". Application to Names Many combinations of processes may be applied to various kinds of names: Strings
Simple multilingual strings, such as user names, might merely be converted to a coded form with a control code attached indicating the encoding method, such as X-0.
Strings with components, such as file names, might also have special words substituted with synonymous characters. For instance, a Japanese file name is suffixed by Japanese characters that indicate it is an executable program, these characters may be replaced by the file name extension " . exe" . Newsgroups
Newsgroups are also known as Internet News, and Usenet.
Coded names can be used as the names of newsgroups, and displayed to users as multilingual newsgroup names.
To name newsgroups in multilingual characters, with an example for a newsgroup about patent law in English. [Law, Patent] (English language)
1. Substitute with base equivalent characters. Substitute ISO language code for language. <EN>. [LAW] . [PATENT]
2. Convert UCS-2 to ASCII and add control codes. <EN>.<X-0004C00410057>.<X-00050004100540045004E0054>
3. Add pseudo-root for multilingual news hierarchy. <ALT>.<X->.<EN>.<X-0004C00410057>.
<X-0005000 100540045004E0054>
4. Present the normal ASCII name of the newsgroup. "ALT. X-. EN. X-0004C00410057. X-00050004100540045004E0054"
It is recommended that since some alphabets are shared by many languages, that the top level newsgroup names be the 2 letter ISO language codes. Domain Names A brief introduction to the domain name system is supplied later. For details see the referenced RFCs. Domain Names are hierarchical names commonly used to identify organisations on the internet. RFC1035 specifies the presentation of domain names as domain labels separated by λ.' dots, with the highest level domain label on the right, and subdomains proceeding to the left. For example in "www.example.com.au.", "au" is the top level label for Australia, "com" is the second level label for commercial enterprises, "example" is the third level label - the name of the enterprise, and "www" is the fourth level label identifying a computer in the enterprise. This is the traditional way of writing domain names.
Instead, the presentation of domain names is left to implementors of converters. The implementors, or even the users, may select appropriate separator, quote, and escape symbols, along with special words, and the direction of the hierarchy (left to right, right to left, etc.). Each domain label could even be entered in separate text fields, eliminating the need for separate characters. However, it is often easier to write and type a domain name with separating characters.
The domain names system is concerned with the format of binary data between resolvers and name servers . Due to compatibility issues, only a limited subset of ASCII is used in labels, the characters -'Z', O-O, 0'-λ9' and -' . It is an object of the preferred embodiment to allow multilingual domain names to be represented in this subset of ASCII.
A process for representing multilingual domain names can be shown in Fig. 1. 1. Parsing, and Substitution of Special Words 1;
2. Substitution of Base Equivalent Characters 2;
3. Encoding, Splitting and Control codes 3;
4. Adding pseudo-root domain name 4;
5. Presenting coded form of names 5; 1. Parsing, and Substitution of Special Words Converters may accept domain name labels in a variety of ways, such as selection from a list of countries, or typing a partial domain name into a text field. Converters which allow labels to be typed together into one field need to parse the parts of the domain name into labels. Separator, quote, and escape characters may be defined by implementors of the converter, or be left to the user' s choice .
Special words may be substituted for selected or typed labels. For instance, replacing the Arabic label for Australia with "au", or the Thai label for business with "com".
2. Substitution of Base Equivalent Characters
English domain names are case insensitive, so lowercase Latin should be replaced with uppercase. Other languages may have different preferences. Defining the sets of equivalent characters can be left to implementors, and specialists in that language.
3. Encoding, Splitting and Control codes The Internet standard RFC1035 specifies that domain names have an overall limit of 255 octets, and that each label has a limit of 63 octets. Currently, labels only contain ASCII characters λA'-xZ', λa'-Λz', 'O'-O and
It is possible in future that labels could be made of 8bit (ASCII, IS08859) , lβbit (UCS-2), 32bit (UCS-4), or variable length characters (UTF-8, UTF-7). Labels could even be made of other data, such as bitmaps (pictures), or sound data.
For the representation of multilingual domain names, the preferred method of encoding is UCS-2 to Hex in ASCII, as it is fully compatible with existing DNS tools.
Since each UCS-2 character maps to 4 ASCII characters, any label that is longer than 15 UCS-2 characters must be split, so that it fits into the maximum label length of 63 octets. It is further recommended that labels which are 15 UCS-2 characters long, should be split with a coded blank second part. This allows for separation of control of the common part of a shared domain label, as will be further explained below.
There may be several businesses that share the first part of their name. Rather than giving control of the common part to one of these businesses, it is possible to give control to a neutral third party, such as the superdomam. For example: [Traveller's Rescue ].<AU> [Traveller's Rest].<AU> , and
[Traveller's Res].<AU> when split and prefixed would become <X-2> [cue] . <X-1> [Traveller' s Res].<AU> , <X-2>[t] .<X-1> [Traveller's Res] .<AU>, and
<X-2>[] . <X-1> [Traveller' s Res] .<AU> Control of the common domain <X-1> [Traveller' s Res].<AU> could be given to <AU>, or shared by the organisations. Each organisation can have control over its <X-2> subdomam. 4. Adding pseudo-root domain name
A pseudo-root domain name is added to the coded domain name, for the reasons mentioned in "Pseudo-Root Names". Name servers for the pseudo-root may be specialised for the processing of names in a particular encoding, or language.
The recommended pseudo-root domain name to add is <X-X>.<NET>. That is, "X-X.NET." . 5. Presenting coded form of name
A converter may have to present the coded form in a way which is useable by applications. The traditional way is specified in RFC1035 - labels separated by dots, with the highest level label to the right. Converters that query the DNS themselves, may not need to concatenate the labels into a contiguous string. Example of converting Multilingual Domain Name
The following provides an example of the domain name conversion process of the preferred embodiment. "Glebe, Traveller's Rescue, Australia" 1. Parsing, and Substitution of Special Words -> [Glebe] . [Traveller' s Rescue]. <AU> 2. Substitution of Base Equivalent Characters -> [GLEBE] . [TRAVELLER'S RESCUE]. <AU>
.3. Encoding, Splitting and Control codes Encoding UCS-2 characters as Hex in ASCII
-X0047004c004500420045>.<00540052004100560045004C004C 00450052002700530020005200450053004300550045>.<AU> Splitting and Prefixing with Control codes
-><X-00047004c004500420045>.<X-2004300550045>.<X-10054 0052004100560045004C004C00450052002700530020005200450053>.< AU>
4. Adding pseudo-root domain name
-><X-00047004c004500420045>.<X-2004300550045>.<X-10054 005200 100560045004C004C00450052002700530020005200450053>.< AU>.<X-X>.<NET>.
5. Presenting coded form of name
->X-00047004c00 5004200 5. X-2004300550045. X-l005400520 04100560045004C004C00450052002700530020005200450053. AU.X-X. NET. Email
Email mailboxes and addresses can use a larger part of the ASCII character set than DNS. Normally, an email address comprises a mailbox name (local part) at a domain name . A multilingual email address could be formed in some other way, using the languages own symbols for addressing. For instance, [Jason at Home, Australia] instead of Jason@HOME.AU. Converters or mail programs are responsible for processing the email addresses correctly. Multilingual addresses could be processed in a number of ways: 1. Parsed, coded and sent to a mailbox at a domain Parsed
-> [Jason] @ [Home] .<AU> Coded -> <X-0> [Jason] @<X-0> [HOME] . <AU>. <X-X>. <NET> .
2. Coded, and sent to a converting mail exchanger -> <X-0> [Jason at Home, Australia] @<MAIL> . <X-X> . <NET> .
3. Coded, and resolved by DNS
-> <X-0>[Jason at Home, Australia] . <MAIL>. <X-X> . <NET> . 4. Parsed, coded, and resolved by DNS
-> <X-0>[Jason at Home] . <AU> . <MAIL> . <X-X> . <NET>. Universal Resource Locators (URLs)
URLs encompass file names, newsgroups, domain names, email, and many other names. A larger part of the ASCII character set is available for names, and encoding of octets is provided for. However, the schemes that URLs encompass remain restricted in the characters they can use, so there is a need for coded multilingual URLs. Substitution of special words and symbols URLs are currently defined for the US-ASCII character set. Multilingual users may prefer to use symbols from their own language, in place of the specific scheme names, reserved and special characters. Converters would then parse these symbols and replace them with the US-ASCII symbols.
For instance [Secure Web] -> <https:> or [web] -> <http:>.
Schemes that use Internet protocols, are formatted as: "<scheme> : //<user>: <password>@<host>: <port>/<url-path>" . Multilingual scheme should be parsed into a coded form like this. Conversion of components using the UCS-2 as Hex in ASCII can be applied to the user name, password, and host name (which is a usually a domain name) , and components of the url-path.
Multilingual port numbers should be converted into synonymous ASCII number, if written as a non-ASCII number such as in Chinese or Sanskrit numerals. The url-path may be further parsed, and broken down into special and reserved characters, path names, file names, search, argument names, and argument values.
It is left to implementors of converters to elect the characters and symbols in their language, that will substitute for scheme names, special and reserved characters .
Some Examples - parsed and substituted, but not coded. [Mail: Jason at Home, Australia] -> <mailto:// [Jason] @ [Home] .AU.X-X.NET [News: English, Patent Law] -> <news : //alt . x- . en. [law] . [patent] > [Secure Web: OneAccount - login (Jason) ]
->http: // [OneAccount] .X-X.NET. / [login] .cgi? [login]=[Ja son] >
[Local File: Patents - Multilingual Test, program] -xfile: //localhost/ [Patents] / [Multilingual Test] .exe> Forms of Implementation
The method of the preferred embodiment can take many different forms of implementation, for example, as follows: Stand Alone Converter
This form takes in a multilingual name, and outputs a coded name as an ASCII string, or some other representation. The converter may be created to work for particular kinds of names, such as URLs or email addresses, and/or to work with particular applications, such as web browsers .
Converters may have controls to, or automatically, send the ASCII string to relevant applications. They may allow a user to copy and paste to and from their applications . Incorporated into applications
Alternatively, the conversion function may be incorporated into the applications such as browsers, editors, email, telnet, ftp, and news. Plug-in or add-on to application
The converter may be a program or library that plugs- in or adds onto the existing applications, providing the application with the added multilingual name functionality. Application loadable control
The converter may take the form of a control that the application can use. Examples are Web pages that include javascript, Java controls, or Active-X controls.
Such controls and plug-ins may replace, or overlay a browsers current URL entry field, with a multilingual name field. This field both displays the multilingual name, and allows entry of multilingual URLs. Coded names are passed back and forth from converter to browser. Web Page interfaces to converter A converter may run on a web server, with access to the converter being provided through multilingual web pages. Users access a multilingual URL/domain name service such as "http://X-X.NET/". If their browser requests a particular language, a web page in that language is provided (if available) , otherwise a multilingual page is provided.
The web page can typically provide a form, so that the user may type in a multilingual URL. Users may select common parts from lists such as the encoding scheme, organisation type, and country. These lists may have defaults on a per user, or per language basis.
When the multilingual URL form is submitted, the converter server has several options:
1. returning the coded URL as an ASCII string, which the user may link to, or use as they please.
2. providing a redirection to the coded URL.
3. presenting a frame view, where one frame contains the requested coded URL, and another contains a multilingual URL form, for typing other URLs. Multilingual Registries may also provide a web interface to provide for registration of multilingual names, such as domain names and email addresses. Converter packaged with other facilities
Converters may be packaged with other facilities. For instance, a program may parse a multilingual name in several ways, and perform several searches such as DNS lookup, whois search, and web page search. It might present information to a user, or return specific information to a client application. Resolvers The resolver accepts the multilingual name direct from applications, but then converts it before querying name servers. Resolvers may query name servers for both the binary and sub-ASCII representations of the multilingual domain name. The resolver may also try variations on the name.
Name Servers
When performing recursive queries, the name server accepts sub-ASCII or binary multilingual domain names; and queries other name servers with sub-ASCII or binary Multilingual domain names.
The name server may convert from binary name to another format before querying its database and may return answers for either form.
In responses, the name server may respond with additional records for binary or sub-ASCII forms (including CNAME and A records) that match, or are variations of, the queried name. For example, if there are minor spelling errors, if they differ only in case, or their base equivalent characters are the same. Databases
Databases may keep records in binary or sub-ASCII form. Conversion between them, and conversion for client or server programs may be required. Other areas of application The principle of having the first 3 characters in a field represent the encoding scheme can be applied generally. This can be applied to directory services, such as Whois, LDAP, and to search engines, and to databases.
It can therefore be generally seen that the preferred embodiment provides for the representation of Multilingual characters, in more limited character sets. In particular, the process includes converting UCS2 to Hex in ASCII, applied to internet names used in the Domain Name System (DNS), email, news and Uniform Resource Locaters. For DNS, a multilingual domain label is represented in one or more sub-ASCII labels. The first 3 characters identify the label's encoding scheme, leaving a maximum of 60 sub-ASCII characters for encoded data in each domain name label .
In UCS-2 to Hex in ASCII encoding the first and second characters is the name of the scheme ΛX-' ; and the third character identifies the part of the split multilingual label. The name of the pseudo root server "X-X.NET" is attached to the sub-ASCII representation of the multilingual domain name. The pseudo root server is visible in the current domain name space. For email, the first three characters of the local-part identify the local-part's encoding scheme. The domain name follows the rules for DNS.
Alternatively, the entire email address is encoded, and sent to the relevant mail server, exchanger or gateway for processing or forwarding. For URLs, the first three characters of each component (name, label, argument) in the
URL identifies the encoding scheme.
The encoding and representation can be implemented in the form of various software devices, such as upgrades or add ons to existing software, incorporation in new software, stand-alone applications, databases, servers, clients, resolvers, name servers.
The first three characters identify the encoding scheme to a converter, so that it may display the name in the right character set. These characters mean nothing to existing DNS, E.mail and web systems, simply identifying the name of a domain, mailbox, file or other data. Hence variations utilising different encoding identifiers can also be easily used. This scheme can be designed for temporary use, up until applications and databases, (including name servers and resolvers) become compliant with a multilingual character set such as ISO10646 or Unicode.
It is further possible under this scheme to have several pseudo roots. This allows multiple registries to run, specialising in particular languages. However, It is recommended that one pseudo root be selected, with registries sharing the pseudo root's database.
It would be further appreciated by a person skilled in the art that numerous variations and/or modifications may be made to the present invention as shown in the specific embodiments without departing from the spirit or scope of the invention as broadly described. The described present embodiments are, therefore, to be considered in all respects to be illustrative and not restrictive. Glossary
The following terms are hereinafter defined for ease of understanding:
Multilingual Name - made of non-ASCII characters, may be a string of characters, or several labels or fields.
This specifically includes, and is not limited to, domain names, user names, file names, email addresses, newsgroups and Universal Resource Locators (URLs).
Coded Name - a string, or fields, of ASCII characters that represent a Multilingual name in some encoding.
Converter - any program that converts from one representation of names to another. Especially, converting from UCS-2 to Hex in ASCII and back. Converters may incorporate resolvers, and other functions such as substitution for equivalent characters. ASCII - A character set that contains the English Alphabet, Arabic Numerals, punctuation marks and some computer control codes. There a several varieties of ASCII
Sub-ASCII - The limited subset of the ASCII character set that has been used in domain names: ΛA'-λZ', Λa'-Λz', ^O'-O , and ' - ' (dash) .
UCS - Universal multi-byte Character Set encodings of ISO10646 and Unicode, which cover most living languages. UCS-2 is 2 bytes (16 octets), UCS-4 is 4 bytes. Equivalent Characters - characters that are mapped to the same base character by a program. In English ΛA' and a' differ only in case. To case insensitive programs, such as DNS, they are equivalent. In other languages, equivalent characters may differ in other ways. Eg. In Greek, there are two lowercase sigmas; one for use at the end of a word. Developers of programs for different language markets are specialists in these areas; they decide on which characters are equivalent.
Domain Name - a name upto 255 octets made of several labels, one for each level in the hierarchy, "www.x-x.net." is a domain in the "x-x.net." domain in the "net." domain. The DNS store information related to domain names. Label - part of a domain name, upto 63 octets. DNS - The Domain Name System. A distributed database that is accessed by resolvers asking name servers. The DNS stores computer's names, IP addresses, and more. See RFC 1034, 1035 and others.
IP address - A 4 byte internet network address. Resolver - a program that applications use to query the DNS. A resolver in turn asks Name Servers for information.
Name Server - a name server has information about its domain that it gives to resolvers and other name servers. If it doesn't know it may query other name servers. Root Name Servers - the name servers at the top of all hierarchies . Pseudo-Root Name Servers - some application may add a predetermined name to all of their domain name queries, making it seem as if that name server is at the top of all hierarchies . RFC - Request for Comments documents describe how the internet works. The Internet Engineering Task Force draws internet standards from the list of RFCs . Introduction to the Domain Name System (DNS) By way of introduction to the internet's Domain Name System, we illustrate with an example.
When a user wants to view a web page, they may type in or select it's URL. For example, a superannuation web page URL is "http : //www . superannuation . net/index . htm" . "www . superannuation . net" is a domain name, that is the name of the computer on which the page is kept. That computer's IP address (internet number) must be found to get the page. This is done by asking the DNS.
The web browser asks a DNS Resolver to find the IP address of the domain name. The Resolver asks the local name server for the address. If the local name server doesn't know, it then tracks down the address by asking other name servers. The local name server asks the net . domain name server where the superannuation. net . domain name server is. Then it asks this subdomam name server for the IP address of the domain name ww . superannuation . net , which is 105.42.3.5 ( ust an example address).
The local name server then tells the resolver the IP address, which in turn informs the web browser. The web browser now asks the computer at that IP address for the web page via http: "//www. superannuation. net/index. htm" .
Internet Applications such as web browsers, ftp, telnet and email programs all use resolvers to ask the DNS for the address of domain names. Sensible domain names are easier for people to remember than IP addresses; when they are in their own language. To date, DNS implementations have required names to be in a small subset of ASCII: the letters A-Z, digits 0-9, and the dash -.
Internet standard documents are readily available on the Internet. The most pertinent to this patent application is RFC1035 : Domain Names - Implementation and Specification which describes how DNS works, and the format of names in detail.
The DNS specification RFC1035, with further updates and clarifications, state that domain name labels may contain up to 63 octets of binary data. It is suggested that the names be made from the characters A-Z, 0-9 and - dash, a restricted subset of US ASCII, so that legacy applications keep working.
Until all internet applications and protocols
(including resolvers, name servers, and databases) are able to handle binary labels, it is desirable to represent binary labels in this subset of ASCII, especially multilingual domain names.
Existing RFCs and Drafts
By way of background, a number of RFC documents, and internet-drafts are available from the Internet Engineering Task Force at http://ietf.org/". http: //dxcoms . cern. ch/wwwcs/public/ip/draftslist . html Although these documents frame the way in which the internet should work, a number of recommendations have not been adopted, nor implemented.
RFC882 Format of ARPA Internet Text Messages defines internet mail, and specifies the format of email addresses. RFC1035 Domain Names - Implementation and Specification defines the DNS protocol, and specifies a format for domain names as a sequence of labels separated by dots. Labels begin with a letter, and may contain characters from λA'-λZ', Λa'-Λz', λ0'-Λ9' and -' dash.
RFC1123 Requirement for Internet Hosts allows domain labels to begin with letters or numbers. RFC1738 Uniform Resources Locators (URL) specifies the format of URLs, in a subset of US-ASCII that permits binary data as octets represented by %HH, where H is 0-9, A-F more commonly known as λHex in ASCII' .
RFC2130 Character Set Workshop Report recommends ISO10646 as base character set for internet also says DNS should stay in limited ASCII format.
RFC2152 UTF-7 A mail safe transformation format for Unicode specifies methods for encoding Unicode into mail messages, but not for mail addresses, domain names, nor URLs. RFC2181 Clarifications to the DNS Specification clarifies that λany binary string whatever can be used as the label' .
RFC2070 Internationalisation of the Hypertext Markup Language is one of many RFCs, that describe multilingual documents, but do not address the issue of DNS, email or URLs.
RFC1468 for Japanese, RFC1557 for Korean, RFC1922 for Chinese specify encodings for these character sets, that begin with escape sequences. It would be appreciated by a person skilled in the art that numerous variations and/or modifications may be made to the present invention as shown in the specific embodiments without departing from the spirit or scope of the invention as broadly described. The present embodiments are, therefore, to be considered in all respects to be illustrative and not restrictive.

Claims

We Claim :
1. A method for providing for multilingual names for use on the Internet, related networks and computers, said method comprising the steps of: forming an initial multilingual name in a multilingual format; mapping said multilingual name to a corresponding coded name in a reversible manner, said coded name comprising a restricted subset of the ASCII character set; utilising said corresponding coded name on the
Internet and related networks in place of said multilingual name.
2. A method as claimed in claim 1 wherein said mapping step further comprises adding a predetermined pseudo-root name to said corresponding coded name.
3. A method as claimed in any preceding claim wherein said mapping includes converting said multilingual name to a corresponding hexadecimal coded name and representing said hexadecimal coded name in an ASCII form. 4. A method as claimed in any preceding claim wherein said corresponding coded name is divided into a series of labels with each label having a predetermined portion comprising a control code for said label.
5. A method as claimed in any preceding claim wherein a multilingual name is parsed or broken down into components .
6. A method as claimed in any preceding claim wherein components a multilingual name that have special, reserved, or schematic meaning are replaced with synonymous components in the coded name. . A method as claimed in any preceding claim wherein the characters of a multilingual name are replaced with their base equivalent characters.
8. A method for providing in domain name systems, answers that contain additional information about names that are similar to names in questions to the domain name system, the names being multilingual, coded or ordinary ASCII.
9. A method as claimed in any preceding claim wherein said method is used in applications that also directly or indirectly use internet protocols or services.
10. A method as claimed in any preceding claim wherein said method is used in internet applications, utilities, resources or services.
11. A method as claimed in any preceding claim wherein a multilingual name is represented by a coded name for the purposes of sending, receiving or otherwise processing one of email, talk, chat, IRC, the coded name being a name for a user of a program, computer system, or network.
PCT/AU1998/000849 1997-10-14 1998-10-13 The utilisation of multi-lingual names on the internet WO1999019814A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU95240/98A AU9524098A (en) 1997-10-14 1998-10-13 The utilisation of multi-lingual names on the internet

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AUPO9779 1997-10-14
AUPO9779A AUPO977997A0 (en) 1997-10-14 1997-10-14 The utilisation of multi-lingual names on the internet

Publications (1)

Publication Number Publication Date
WO1999019814A1 true WO1999019814A1 (en) 1999-04-22

Family

ID=3804059

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU1998/000849 WO1999019814A1 (en) 1997-10-14 1998-10-13 The utilisation of multi-lingual names on the internet

Country Status (2)

Country Link
AU (1) AUPO977997A0 (en)
WO (1) WO1999019814A1 (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20000030089A (en) * 1999-11-11 2000-06-05 김형근 Native Language Transition Software
WO2000056035A1 (en) * 1999-03-18 2000-09-21 Walid, Inc. Method and system for internationalizing domain names
WO2001017190A2 (en) * 1999-08-30 2001-03-08 Ying Tuo Method and apparatus for using non-english characters in domain names and e-mail addresses
WO2001059605A1 (en) * 2000-02-12 2001-08-16 Jang Deuk Kul Method for using native characters in domain names
EP1123533A1 (en) * 1998-09-29 2001-08-16 Eli Abir Method and system for alternate internet resource identifiers and addresses
US6314469B1 (en) 1999-02-26 2001-11-06 I-Dns.Net International Pte Ltd Multi-language domain name service
GB2366635A (en) * 2000-09-07 2002-03-13 Joint Forture Technology Inter Mother language domain name conversion system
EP1192566A1 (en) * 1999-06-18 2002-04-03 Multex.Com, Inc. A method and system for referencing, archiving and retrieving symbolically linked information
WO2002031702A1 (en) * 2000-10-09 2002-04-18 Enic Corporation Registering and using multilingual domain names
WO2002069607A2 (en) * 2001-02-28 2002-09-06 Characterisation Gmbh Method for providing internet addresses that contain special characters
KR100383861B1 (en) * 2000-01-28 2003-05-12 주식회사 한닉 Korean dns system
WO2003090115A1 (en) * 2002-04-22 2003-10-30 Thomas Arnfeldt Andersen Digital identity and method of producing same
FR2842056A1 (en) * 2002-07-08 2004-01-09 Speeq TELECOMMUNICATIONS METHOD, TERMINAL AND SERVER
US7020602B1 (en) 2000-08-21 2006-03-28 Kim Ki S Native language domain name registration and usage
WO2009111869A1 (en) * 2008-03-10 2009-09-17 Afilias Limited Platform independent idn e-mail storage translation
WO2010012085A1 (en) * 2008-08-01 2010-02-04 Research In Motion Limited Electronic mail system providing message character set formatting features and related methods
US7725816B2 (en) * 2000-02-09 2010-05-25 Microsoft Corporation Creation and delivery of customized content
EP2692102A1 (en) * 2011-03-30 2014-02-05 Afilias Limited Transmitting messages between internationalized email systems and non-internationalized email systems
US9344379B2 (en) 2006-09-14 2016-05-17 Afilias Limited System and method for facilitating distribution of limited resources
US10140282B2 (en) 2014-04-01 2018-11-27 Verisign, Inc. Input string matching for domain names

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5047932A (en) * 1988-12-29 1991-09-10 Talent Laboratory, Inc. Method for coding the input of Chinese characters from a keyboard according to the first phonetic symbols and tones thereof
US5337233A (en) * 1992-04-13 1994-08-09 Sun Microsystems, Inc. Method and apparatus for mapping multiple-byte characters to unique strings of ASCII characters for use in text retrieval
US5572668A (en) * 1995-02-07 1996-11-05 Oracle Corporation Method and apparatus for universal national language support program testing
EP0817099A2 (en) * 1996-06-24 1998-01-07 Sun Microsystems, Inc. Client-side, Server-side and collaborative spell check of URL's

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5047932A (en) * 1988-12-29 1991-09-10 Talent Laboratory, Inc. Method for coding the input of Chinese characters from a keyboard according to the first phonetic symbols and tones thereof
US5337233A (en) * 1992-04-13 1994-08-09 Sun Microsystems, Inc. Method and apparatus for mapping multiple-byte characters to unique strings of ASCII characters for use in text retrieval
US5572668A (en) * 1995-02-07 1996-11-05 Oracle Corporation Method and apparatus for universal national language support program testing
EP0817099A2 (en) * 1996-06-24 1998-01-07 Sun Microsystems, Inc. Client-side, Server-side and collaborative spell check of URL's

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
http://www.iahc.org/contrib/draft-duerst-dn s-il8n-00.txt, "Internationalisation of Domain Names", M. DUERST, UNIVERSITY OF ZURICH, 10 December 1996, Viewed on 9 November 1998. *

Cited By (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1123533A4 (en) * 1998-09-29 2004-03-31 Eli Abir Method and system for alternate internet resource identifiers and addresses
EP1123533A1 (en) * 1998-09-29 2001-08-16 Eli Abir Method and system for alternate internet resource identifiers and addresses
US6314469B1 (en) 1999-02-26 2001-11-06 I-Dns.Net International Pte Ltd Multi-language domain name service
US6446133B1 (en) 1999-02-26 2002-09-03 I-Dns.Net International Pte Ltd. Multi-language domain name service
WO2000056035A1 (en) * 1999-03-18 2000-09-21 Walid, Inc. Method and system for internationalizing domain names
US6182148B1 (en) 1999-03-18 2001-01-30 Walid, Inc. Method and system for internationalizing domain names
US6829653B1 (en) 1999-03-18 2004-12-07 Idn Technologies Llc Method and system for internationalizing domain names
EP1192566A1 (en) * 1999-06-18 2002-04-03 Multex.Com, Inc. A method and system for referencing, archiving and retrieving symbolically linked information
US7398262B1 (en) 1999-06-18 2008-07-08 Multex.Com, Inc. Method and system for referencing, archiving and retrieving symbolically linked information
EP1192566A4 (en) * 1999-06-18 2006-05-10 Multex Com Inc A method and system for referencing, archiving and retrieving symbolically linked information
WO2001017190A3 (en) * 1999-08-30 2001-09-07 Ying Tuo Method and apparatus for using non-english characters in domain names and e-mail addresses
WO2001017190A2 (en) * 1999-08-30 2001-03-08 Ying Tuo Method and apparatus for using non-english characters in domain names and e-mail addresses
KR20000030089A (en) * 1999-11-11 2000-06-05 김형근 Native Language Transition Software
KR100383861B1 (en) * 2000-01-28 2003-05-12 주식회사 한닉 Korean dns system
US7949944B2 (en) * 2000-02-09 2011-05-24 Microsoft Corporation Creation and delivery of customized content
US7725816B2 (en) * 2000-02-09 2010-05-25 Microsoft Corporation Creation and delivery of customized content
WO2001059605A1 (en) * 2000-02-12 2001-08-16 Jang Deuk Kul Method for using native characters in domain names
EP2375700A1 (en) * 2000-08-21 2011-10-12 Ki Seok Kim Native language domain name usage
DE10193513B3 (en) * 2000-08-21 2011-11-17 Ki Seok Kim Registration and use of domain names in your own language
ES2255338A1 (en) * 2000-08-21 2006-06-16 Ki Seok Kim Native language domain name registration and usage
US7020602B1 (en) 2000-08-21 2006-03-28 Kim Ki S Native language domain name registration and usage
GB2366635A (en) * 2000-09-07 2002-03-13 Joint Forture Technology Inter Mother language domain name conversion system
US7774432B2 (en) 2000-10-09 2010-08-10 Verisign, Inc. Registering and using multilingual domain names
WO2002031702A1 (en) * 2000-10-09 2002-04-18 Enic Corporation Registering and using multilingual domain names
WO2002069607A3 (en) * 2001-02-28 2003-01-23 Characterisation Gmbh Method for providing internet addresses that contain special characters
WO2002069607A2 (en) * 2001-02-28 2002-09-06 Characterisation Gmbh Method for providing internet addresses that contain special characters
WO2003090115A1 (en) * 2002-04-22 2003-10-30 Thomas Arnfeldt Andersen Digital identity and method of producing same
FR2842056A1 (en) * 2002-07-08 2004-01-09 Speeq TELECOMMUNICATIONS METHOD, TERMINAL AND SERVER
WO2004008341A1 (en) * 2002-07-08 2004-01-22 Speeq Method, terminal and server for selecting a server address
US9344379B2 (en) 2006-09-14 2016-05-17 Afilias Limited System and method for facilitating distribution of limited resources
US8719355B2 (en) 2008-03-10 2014-05-06 Afilias Limited Platform independent IDN e-mail storage translation
WO2009111869A1 (en) * 2008-03-10 2009-09-17 Afilias Limited Platform independent idn e-mail storage translation
WO2010012085A1 (en) * 2008-08-01 2010-02-04 Research In Motion Limited Electronic mail system providing message character set formatting features and related methods
US10992613B2 (en) 2008-08-01 2021-04-27 Blackberry Limited Electronic mail system providing message character set formatting features and related methods
EP2692102A4 (en) * 2011-03-30 2014-12-10 Afilias Ltd Transmitting messages between internationalized email systems and non-internationalized email systems
EP2692102A1 (en) * 2011-03-30 2014-02-05 Afilias Limited Transmitting messages between internationalized email systems and non-internationalized email systems
US10140282B2 (en) 2014-04-01 2018-11-27 Verisign, Inc. Input string matching for domain names

Also Published As

Publication number Publication date
AUPO977997A0 (en) 1997-11-06

Similar Documents

Publication Publication Date Title
WO1999019814A1 (en) The utilisation of multi-lingual names on the internet
Berners-Lee Universal resource identifiers in WWW: a unifying syntax for the expression of names and addresses of objects on the network as used in the World-Wide web
Berners-Lee Universal resource identifiers in www
JP3492580B2 (en) Multilingual domain name service
CA2319750C (en) Www addressing
US6182148B1 (en) Method and system for internationalizing domain names
US7774432B2 (en) Registering and using multilingual domain names
US9141717B2 (en) Methods, systems, products, and devices for processing DNS friendly identifiers
KR20020082461A (en) Network address server
US20040019697A1 (en) Method and system for correcting the spelling of incorrectly spelled uniform resource locators using closest alphabetical match technique
US20030177274A1 (en) Virtual subdomain address file suffix
KR100433982B1 (en) System for acc esing web page using real names and method thereof
KR100503677B1 (en) Native language domain name registration and usage
Berners-Lee RFC1630: Universal Resource Identifiers in WWW: A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network as used in the World-Wide Web
KR100706702B1 (en) Korean Internet contents address service method and system using original DNS
KR20010066754A (en) system for using domain names in the user&#39;s preferred language on the internet
KR100338666B1 (en) System for accesing web page using many languages and method thereof
KR20010075446A (en) Method and system for alternate internet resource identifiers and addresses
KR20050099943A (en) System for accessing web page and method thereof
KR20020075314A (en) System for acc esing web page using real names and method thereof
KR20030024294A (en) System for accesing web page using many languages and method thereof
Locators et al. Uniform Resource Locators
KR20010069028A (en) Multi language internet domain name sytem and its method
AU4003700A (en) Multi-language domain name service
KR20030015272A (en) A method of resolving a non-latin character url

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
NENP Non-entry into the national phase

Ref country code: KR

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA