CA2411227A1 - System and method of creating and using compact linguistic data - Google Patents

System and method of creating and using compact linguistic data Download PDF

Info

Publication number
CA2411227A1
CA2411227A1 CA002411227A CA2411227A CA2411227A1 CA 2411227 A1 CA2411227 A1 CA 2411227A1 CA 002411227 A CA002411227 A CA 002411227A CA 2411227 A CA2411227 A CA 2411227A CA 2411227 A1 CA2411227 A1 CA 2411227A1
Authority
CA
Canada
Prior art keywords
words
creating
linguistic data
mapped
characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CA002411227A
Other languages
French (fr)
Other versions
CA2411227C (en
Inventor
Vadim Fux
Michael G. Elizarov
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BlackBerry Ltd
Original Assignee
2012244 Ontario Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 2012244 Ontario Inc filed Critical 2012244 Ontario Inc
Priority to PCT/CA2003/001023 priority Critical patent/WO2004006122A2/en
Priority to AT03762372T priority patent/ATE506651T1/en
Priority to JP2004518331A priority patent/JP4382663B2/en
Priority to DE60336856T priority patent/DE60336856D1/en
Priority to AU2003249793A priority patent/AU2003249793A1/en
Priority to EP03762372A priority patent/EP1631920B1/en
Publication of CA2411227A1 publication Critical patent/CA2411227A1/en
Priority to HK06108040.7A priority patent/HK1091668A1/en
Application granted granted Critical
Publication of CA2411227C publication Critical patent/CA2411227C/en
Priority to JP2009145681A priority patent/JP2009266244A/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99937Sorting

Abstract

A system and method of creating and using compact linguistic data are provided. Frequencies of words appearing in a corpus are calculated. Each unique character in the words is mapped to a character index, and characters in the words are replaced with the character indexes. Sequences of characters are mapped to substitution indexes, and the sequences of characters in the words are replaced with the substitution indexes. The words are grouped by common prefixes, and each prefix is mapped to location information for the group of words which start with the prefix. -34-
CA002411227A 2002-07-03 2002-11-07 System and method of creating and using compact linguistic data Expired - Lifetime CA2411227C (en)

Priority Applications (8)

Application Number Priority Date Filing Date Title
AT03762372T ATE506651T1 (en) 2002-07-03 2003-07-03 SYSTEM AND METHOD FOR GENERATING AND USING COMPACT LINGUISTIC DATA
JP2004518331A JP4382663B2 (en) 2002-07-03 2003-07-03 System and method for generating and using concise linguistic data
DE60336856T DE60336856D1 (en) 2002-07-03 2003-07-03 SYSTEM AND METHOD FOR THE PRODUCTION AND USE OF COMPACT LINGUISTIC DATA
AU2003249793A AU2003249793A1 (en) 2002-07-03 2003-07-03 System and method of creating and using compact linguistic data
PCT/CA2003/001023 WO2004006122A2 (en) 2002-07-03 2003-07-03 System and method of creating and using compact linguistic data
EP03762372A EP1631920B1 (en) 2002-07-03 2003-07-03 System and method of creating and using compact linguistic data
HK06108040.7A HK1091668A1 (en) 2002-07-03 2006-07-18 System and method of creating and using compact linguistic data
JP2009145681A JP2009266244A (en) 2002-07-03 2009-06-18 System and method of creating and using compact linguistic data

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US39390302P 2002-07-03 2002-07-03
US60/393,903 2002-07-03

Publications (2)

Publication Number Publication Date
CA2411227A1 true CA2411227A1 (en) 2004-01-03
CA2411227C CA2411227C (en) 2007-01-09

Family

ID=30770900

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002411227A Expired - Lifetime CA2411227C (en) 2002-07-03 2002-11-07 System and method of creating and using compact linguistic data

Country Status (6)

Country Link
US (3) US7269548B2 (en)
JP (1) JP2009266244A (en)
CN (1) CN1703692A (en)
AT (1) ATE506651T1 (en)
CA (1) CA2411227C (en)
HK (1) HK1091668A1 (en)

Families Citing this family (56)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
USRE43082E1 (en) 1998-12-10 2012-01-10 Eatoni Ergonomics, Inc. Touch-typable devices based on ambiguous codes and methods to design such devices
US7091885B2 (en) * 2004-06-02 2006-08-15 2012244 Ontario Inc. Handheld electronic device with text disambiguation
US7312726B2 (en) 2004-06-02 2007-12-25 Research In Motion Limited Handheld electronic device with text disambiguation
US7711542B2 (en) * 2004-08-31 2010-05-04 Research In Motion Limited System and method for multilanguage text input in a handheld electronic device
US7895218B2 (en) 2004-11-09 2011-02-22 Veveo, Inc. Method and system for performing searches for television content using reduced text input
FR2878344B1 (en) * 2004-11-22 2012-12-21 Sionnest Laurent Guyot DATA CONTROLLER AND INPUT DEVICE
EP1817691A4 (en) * 2004-12-01 2009-08-19 Whitesmoke Inc System and method for automatic enrichment of documents
US7788266B2 (en) 2005-08-26 2010-08-31 Veveo, Inc. Method and system for processing ambiguous, multi-term search queries
US7779011B2 (en) 2005-08-26 2010-08-17 Veveo, Inc. Method and system for dynamically processing ambiguous, reduced text search queries and highlighting results thereof
US7644054B2 (en) 2005-11-23 2010-01-05 Veveo, Inc. System and method for finding desired results by incremental search using an ambiguous keypad with the input containing orthographic and typographic errors
US7774341B2 (en) 2006-03-06 2010-08-10 Veveo, Inc. Methods and systems for selecting and presenting content based on dynamically identifying microgenres associated with the content
US8073860B2 (en) 2006-03-30 2011-12-06 Veveo, Inc. Method and system for incrementally selecting and providing relevant search engines in response to a user query
EP4209927A1 (en) 2006-04-20 2023-07-12 Veveo, Inc. User interface methods and systems for selecting and presenting content based on user navigation and selection actions associated with the content
US7646868B2 (en) * 2006-08-29 2010-01-12 Intel Corporation Method for steganographic cryptography
US8423908B2 (en) * 2006-09-08 2013-04-16 Research In Motion Limited Method for identifying language of text in a handheld electronic device and a handheld electronic device incorporating the same
US7752193B2 (en) * 2006-09-08 2010-07-06 Guidance Software, Inc. System and method for building and retrieving a full text index
US7536384B2 (en) 2006-09-14 2009-05-19 Veveo, Inc. Methods and systems for dynamically rearranging search results into hierarchically organized concept clusters
WO2008045690A2 (en) 2006-10-06 2008-04-17 Veveo, Inc. Linear character selection display interface for ambiguous text input
US20080091427A1 (en) * 2006-10-11 2008-04-17 Nokia Corporation Hierarchical word indexes used for efficient N-gram storage
US8078884B2 (en) 2006-11-13 2011-12-13 Veveo, Inc. Method of and system for selecting and presenting content based on user identification
WO2008064041A2 (en) * 2006-11-19 2008-05-29 Rmax, Llc Internet-based computer for mobile and thin client users
US8048363B2 (en) * 2006-11-20 2011-11-01 Kimberly Clark Worldwide, Inc. Container with an in-mold label
US8103499B2 (en) * 2007-03-22 2012-01-24 Tegic Communications, Inc. Disambiguation of telephone style key presses to yield Chinese text using segmentation and selective shifting
WO2008148012A1 (en) 2007-05-25 2008-12-04 Veveo, Inc. System and method for text disambiguation and context designation in incremental search
US8176419B2 (en) * 2007-12-19 2012-05-08 Microsoft Corporation Self learning contextual spell corrector
JP2009245308A (en) * 2008-03-31 2009-10-22 Fujitsu Ltd Document proofreading support program, document proofreading support method, and document proofreading support apparatus
US7663511B2 (en) * 2008-06-18 2010-02-16 Microsoft Corporation Dynamic character encoding
US7730061B2 (en) * 2008-09-12 2010-06-01 International Business Machines Corporation Fast-approximate TFIDF
CN101533403B (en) * 2008-11-07 2010-12-01 广东国笔科技股份有限公司 Derivative generating method and system
US20100332215A1 (en) * 2009-06-26 2010-12-30 Nokia Corporation Method and apparatus for converting text input
US20110191332A1 (en) 2010-02-04 2011-08-04 Veveo, Inc. Method of and System for Updating Locally Cached Content Descriptor Information
CN103052951B (en) * 2010-08-06 2016-01-06 国际商业机器公司 Text string generation method and system
JP5392227B2 (en) * 2010-10-14 2014-01-22 株式会社Jvcケンウッド Filtering apparatus and filtering method
JP5392228B2 (en) * 2010-10-14 2014-01-22 株式会社Jvcケンウッド Program search device and program search method
JP5605288B2 (en) * 2011-03-31 2014-10-15 富士通株式会社 Appearance map generation method, file extraction method, appearance map generation program, file extraction program, appearance map generation device, and file extraction device
JPWO2012150637A1 (en) * 2011-05-02 2014-07-28 富士通株式会社 Extraction method, information processing method, extraction program, information processing program, extraction device, and information processing device
US8924446B2 (en) 2011-12-29 2014-12-30 Verisign, Inc. Compression of small strings
CN102831224B (en) * 2012-08-24 2018-09-04 北京百度网讯科技有限公司 Generation method and device are suggested in a kind of method for building up in data directory library, search
US9329778B2 (en) * 2012-09-07 2016-05-03 International Business Machines Corporation Supplementing a virtual input keyboard
US9584642B2 (en) 2013-03-12 2017-02-28 Google Technology Holdings LLC Apparatus with adaptive acoustic echo control for speakerphone mode
US10381001B2 (en) 2012-10-30 2019-08-13 Google Technology Holdings LLC Voice control user interface during low-power mode
US10373615B2 (en) 2012-10-30 2019-08-06 Google Technology Holdings LLC Voice control user interface during low power mode
US10304465B2 (en) 2012-10-30 2019-05-28 Google Technology Holdings LLC Voice control user interface for low power mode
USD772898S1 (en) 2013-03-15 2016-11-29 H2 & Wf3 Research, Llc Display screen with graphical user interface for a document management system
US9805018B1 (en) 2013-03-15 2017-10-31 Steven E. Richfield Natural language processing for analyzing internet content and finding solutions to needs expressed in text
US8788263B1 (en) * 2013-03-15 2014-07-22 Steven E. Richfield Natural language processing for analyzing internet content and finding solutions to needs expressed in text
USD788115S1 (en) 2013-03-15 2017-05-30 H2 & Wf3 Research, Llc. Display screen with graphical user interface for a document management system
US11568080B2 (en) * 2013-11-14 2023-01-31 3M Innovative Properties Company Systems and method for obfuscating data using dictionary
US8768712B1 (en) 2013-12-04 2014-07-01 Google Inc. Initiating actions based on partial hotwords
US20160170971A1 (en) * 2014-12-15 2016-06-16 Nuance Communications, Inc. Optimizing a language model based on a topic of correspondence messages
US9799049B2 (en) * 2014-12-15 2017-10-24 Nuance Communications, Inc. Enhancing a message by providing supplemental content in the message
KR20180031291A (en) * 2016-09-19 2018-03-28 삼성전자주식회사 Multilingual Prediction and Translation Keyboard
US10120860B2 (en) * 2016-12-21 2018-11-06 Intel Corporation Methods and apparatus to identify a count of n-grams appearing in a corpus
US10877998B2 (en) * 2017-07-06 2020-12-29 Durga Turaga Highly atomized segmented and interrogatable data systems (HASIDS)
US10740381B2 (en) * 2018-07-18 2020-08-11 International Business Machines Corporation Dictionary editing system integrated with text mining
CN110673836B (en) * 2019-08-22 2023-05-23 创新先进技术有限公司 Code complement method, device, computing equipment and storage medium

Family Cites Families (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4403303A (en) * 1981-05-15 1983-09-06 Beehive International Terminal configuration manager
US4500955A (en) 1981-12-31 1985-02-19 International Business Machines Corporation Full word coding for information processing
US4814746A (en) * 1983-06-01 1989-03-21 International Business Machines Corporation Data compression method
US4843389A (en) * 1986-12-04 1989-06-27 International Business Machines Corp. Text compression and expansion method and apparatus
US4864503A (en) * 1987-02-05 1989-09-05 Toltran, Ltd. Method of using a created international language as an intermediate pathway in translation between two national languages
US5126739A (en) * 1989-01-13 1992-06-30 Stac Electronics Data compression apparatus and method
US5146221A (en) * 1989-01-13 1992-09-08 Stac, Inc. Data compression apparatus and method
DE69118250T2 (en) * 1990-01-19 1996-10-17 Hewlett Packard Ltd ACCESS FOR COMPRESSED DATA
US5254990A (en) * 1990-02-26 1993-10-19 Fujitsu Limited Method and apparatus for compression and decompression of data
EP0471518B1 (en) * 1990-08-13 1996-12-18 Fujitsu Limited Data compression method and apparatus
DE69131779T2 (en) * 1990-12-21 2004-09-09 British Telecommunications P.L.C. VOICE CODING
US5325091A (en) * 1992-08-13 1994-06-28 Xerox Corporation Text-compression technique using frequency-ordered array of word-number mappers
US5657423A (en) * 1993-02-22 1997-08-12 Texas Instruments Incorporated Hardware filter circuit and address circuitry for MPEG encoded data
US5509088A (en) * 1993-12-06 1996-04-16 Xerox Corporation Method for converting CCITT compressed data using a balanced tree
JPH07192095A (en) 1993-12-27 1995-07-28 Nec Corp Character string input device
US5798721A (en) 1994-03-14 1998-08-25 Mita Industrial Co., Ltd. Method and apparatus for compressing text data
US5684478A (en) * 1994-12-06 1997-11-04 Cennoid Technologies, Inc. Method and apparatus for adaptive data compression
US5847697A (en) * 1995-01-31 1998-12-08 Fujitsu Limited Single-handed keyboard having keys with multiple characters and character ambiguity resolution logic
US5818437A (en) * 1995-07-26 1998-10-06 Tegic Communications, Inc. Reduced keyboard disambiguating computer
GB2305746B (en) 1995-09-27 2000-03-29 Canon Res Ct Europe Ltd Data compression apparatus
US5778361A (en) * 1995-09-29 1998-07-07 Microsoft Corporation Method and system for fast indexing and searching of text in compound-word languages
JP3566441B2 (en) 1996-01-30 2004-09-15 シャープ株式会社 Dictionary creation device for text compression
US6169672B1 (en) * 1996-07-03 2001-01-02 Hitachi, Ltd. Power converter with clamping circuit
US5951623A (en) * 1996-08-06 1999-09-14 Reynar; Jeffrey C. Lempel- Ziv data compression technique utilizing a dictionary pre-filled with frequent letter combinations, words and/or phrases
US6023670A (en) * 1996-08-19 2000-02-08 International Business Machines Corporation Natural language determination using correlation between common words
US6414610B1 (en) * 1997-02-24 2002-07-02 Rodney J Smith Data compression
US6618506B1 (en) * 1997-09-23 2003-09-09 International Business Machines Corporation Method and apparatus for improved compression and decompression
JPH11143877A (en) * 1997-10-22 1999-05-28 Internatl Business Mach Corp <Ibm> Compression method, method for compressing entry index data and machine translation system
US5896321A (en) * 1997-11-14 1999-04-20 Microsoft Corporation Text completion system for a miniature computer
US6075470A (en) * 1998-02-26 2000-06-13 Research In Motion Limited Block-wise adaptive statistical data compressor
US6646573B1 (en) * 1998-12-04 2003-11-11 America Online, Inc. Reduced keyboard text input system for the Japanese language
US6219731B1 (en) * 1998-12-10 2001-04-17 Eaton: Ergonomics, Inc. Method and apparatus for improved multi-tap text input
GB2347240A (en) * 1999-02-22 2000-08-30 Nokia Mobile Phones Ltd Communication terminal having a predictive editor application
US6668092B1 (en) * 1999-07-30 2003-12-23 Sun Microsystems, Inc. Memory efficient variable-length encoding/decoding system
US6904402B1 (en) * 1999-11-05 2005-06-07 Microsoft Corporation System and iterative method for lexicon, segmentation and language model joint optimization
US6516305B1 (en) * 2000-01-14 2003-02-04 Microsoft Corporation Automatic inference of models for statistical code compression
EP1213643A1 (en) * 2000-12-05 2002-06-12 Inventec Appliances Corp. Intelligent dictionary input method
US7103534B2 (en) * 2001-03-31 2006-09-05 Microsoft Corporation Machine learning contextual approach to word determination for text input via reduced keypad keys
US6400286B1 (en) * 2001-06-20 2002-06-04 Unisys Corporation Data compression method and apparatus implemented with limited length character tables
US6587057B2 (en) * 2001-07-25 2003-07-01 Quicksilver Technology, Inc. High performance memory efficient variable-length coding decoder
US6653954B2 (en) * 2001-11-07 2003-11-25 International Business Machines Corporation System and method for efficient data compression
US20030182279A1 (en) * 2002-03-19 2003-09-25 Willows Kevin John Progressive prefix input method for data entry
US6657565B2 (en) * 2002-03-21 2003-12-02 International Business Machines Corporation Method and system for improving lossless compression efficiency

Also Published As

Publication number Publication date
HK1091668A1 (en) 2007-01-26
CN1703692A (en) 2005-11-30
ATE506651T1 (en) 2011-05-15
JP2009266244A (en) 2009-11-12
US7809553B2 (en) 2010-10-05
US20100211381A1 (en) 2010-08-19
CA2411227C (en) 2007-01-09
US7269548B2 (en) 2007-09-11
US20040006455A1 (en) 2004-01-08
US20080015844A1 (en) 2008-01-17

Similar Documents

Publication Publication Date Title
CA2411227A1 (en) System and method of creating and using compact linguistic data
WO2003005288A3 (en) Method and system for performing a pattern match search for text strings
BR9612258B1 (en) interleukin-1beta converting enzyme inhibitors as well as pharmaceutical composition.
SG142159A1 (en) Index structure of metadata, method for providing indices of metadata, and metadata searching method and apparatus using the indices of metadata
SG142156A1 (en) Index structure of metadata, method for providing indices of metadata, and metadata searching method and apparatus using the indices of metadata
DE69434408D1 (en) Recombinant production of aromatic polyketides
WO2002019176A8 (en) Data list transmutation and input mapping
WO2001084373A3 (en) Information retrieval
EP1209662A3 (en) Client-server based speech recognition
WO2002037690A3 (en) A method of generating huffman code length information
SE0004319D0 (en) System and procedure
AU6802298A (en) Dictionary of an alphabetic foreign language
WO2001063769A3 (en) A data coding system
PT1060602E (en) PROCESS FOR REDUCING PEAK VALUES ON DIGITAL ISSUED SIGNS MODULATED ON A STAND ONLY OR MODULATED ON MULTIPLE SUPPORT
WO2005038584A3 (en) Matching job candidate information
AU2000251210A1 (en) An alphabet character input device
Payne Jusepe de Ribera: The Rawness of Nature
Johnson The Black Scholar books received--Beyond Ontological Blackness: An Essay on African-American Religious and Cultural Criticism by Victor Anderson
TW348235B (en) Method of spelling check using Pinyin and universal characters
Kieffer et al. A class of noiseless data compression algorithms based on Lempel-Ziv parsing trees
De Voogd The Letters of Laurence Sterne
Plentinger et al. CAMASE: register of agro-ecosystems models, version 2, March, 1996
Kupreeva St Anselm of Canterbury. Works
EP0797360A3 (en) Method for calculating bit length of code word and variable length code table applied to the method therefor
Bertoletti Deborah Parker. Commentary and Ideology: Dante in the Renaissance.

Legal Events

Date Code Title Description
EEER Examination request
MKEX Expiry

Effective date: 20221107