UST922008I4 - Enter from step - Google Patents
Enter from step Download PDFInfo
- Publication number
- UST922008I4 UST922008I4 US922008DH UST922008I4 US T922008 I4 UST922008 I4 US T922008I4 US 922008D H US922008D H US 922008DH US T922008 I4 UST922008 I4 US T922008I4
- Authority
- US
- United States
- Prior art keywords
- character
- matrix
- logical
- string
- row
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/02—Comparing digital values
Abstract
AN IMPROVES PROGRAM METHOD IS SHOWN FOR DETERMINING THE SIMILARITY OF TWO CHARACTER STRINGS WHERE THEY ARE NOT NECESSARILY EQUAL CHARACTER BY CHARACTER. EACH CHARACTER OF ONE CHARACTER STRING IS COMPARED WITH EACH CHARACTER OF THE OTHER CHARACTER STRING, AND THE RESULTS ARE STORED IN A MATRIX. EACH ROW OF THE MATRIX CORRESPONDS TO A RESPECTIVE ONE OF THE CHARACTERS OF THE FIRST STRING, AND EACH COLUMN OF THE MATRIX CORRESPONDS TO A RESPECTIVE CHARACTER OF THE OTHER STRING. A LOGICAL ONE VALUE IS STORED IN EACH MATRIX POSITION WHERE CORRESPONDING CHARACTERS OF THE FIRST AND SECOND STRING ARE EQUAL, AND A LOGICAL ZERO IS STORED IN EACH POSITION WHERE THEY ARE NOT EQUAL. IN EVENT THAT THERE IS MORE THAN ONE LOGICAL ONE IN EACH ROW (OR IN EACH COLUMN OR BOTH), ALL OF THE LOGICAL ONES IN THAT ROW (OR COLUMN, OR BOTH) EXCEPT THAT WHICH IS CLOSEST TO THE MAJOR DIAGONAL OF THE MATRIX ARE
CHANGED TO LOGICAL ZERO. IN THE EVENT THAT TWO LOGICAL ONES IN A ROW (OR COLUMN, OR BOTH) ARE EQUIDISTANT FROM THE MAJOR DIAGONAL, BOTH MAY BE RETAINED. THE REMAINING LOGICAL ONES IN THE MATRIX ARE THEN CONSIDERED AS POINTS ON AN X, Y COORDINATE SYSTEM, AND THE STANDARD CORRELATION COEFFICIENT (WHICH MEASURES LINEAR DEPENDENCE) OF THE POINTS IS CALCULATED TO DETERMINE THE DEGREE OF SIMILARITY BETWEEN THE TWO CHARACTER STRINGS. A DECISION THAT SUFFICIENT SIMILARITY EXISTS TO MAKE A DETERMINATION THAT A MATCH HAS OCCURRED MAY BE BASED UPON VARIOUS KNOWN TESTS, FOR EXAMPLE, EXCEEDING PREDETERMINED THRESHOLDS, OR, IN THE EVENT THAT A MATCH IS MADE AGAINST SEVERAL CHARACTER STRINGS, THE HIGHEST COEFFICIENT CALCULATED CAN BE DETERMINED AS THE MATCH. THE IMPROVED METHOD CAN BE USED IN PROGRAMMING AREAS SUCH AS INQUIRY (OR QUERY) SYSTEMS AND THE JOB CONTROL LANGUAGE PROCESSOR OF AN OPERATING SYSTEM. THE PROGRAM USUALLY HANDLES MISSING CHARACTERS, TRANSPOSED CHARACTERS AND OTHER COMMON ERRORS. THE PROGRAM CAN BE EITHER A FIXED OR AN ADAPTIVE TYPE.
CHANGED TO LOGICAL ZERO. IN THE EVENT THAT TWO LOGICAL ONES IN A ROW (OR COLUMN, OR BOTH) ARE EQUIDISTANT FROM THE MAJOR DIAGONAL, BOTH MAY BE RETAINED. THE REMAINING LOGICAL ONES IN THE MATRIX ARE THEN CONSIDERED AS POINTS ON AN X, Y COORDINATE SYSTEM, AND THE STANDARD CORRELATION COEFFICIENT (WHICH MEASURES LINEAR DEPENDENCE) OF THE POINTS IS CALCULATED TO DETERMINE THE DEGREE OF SIMILARITY BETWEEN THE TWO CHARACTER STRINGS. A DECISION THAT SUFFICIENT SIMILARITY EXISTS TO MAKE A DETERMINATION THAT A MATCH HAS OCCURRED MAY BE BASED UPON VARIOUS KNOWN TESTS, FOR EXAMPLE, EXCEEDING PREDETERMINED THRESHOLDS, OR, IN THE EVENT THAT A MATCH IS MADE AGAINST SEVERAL CHARACTER STRINGS, THE HIGHEST COEFFICIENT CALCULATED CAN BE DETERMINED AS THE MATCH. THE IMPROVED METHOD CAN BE USED IN PROGRAMMING AREAS SUCH AS INQUIRY (OR QUERY) SYSTEMS AND THE JOB CONTROL LANGUAGE PROCESSOR OF AN OPERATING SYSTEM. THE PROGRAM USUALLY HANDLES MISSING CHARACTERS, TRANSPOSED CHARACTERS AND OTHER COMMON ERRORS. THE PROGRAM CAN BE EITHER A FIXED OR AN ADAPTIVE TYPE.
Description
DEFENSIVE PUBLICATION UNITED STATES PATENT OFFICE Published at the request of the applicant or owner in accordance with the Notice of Dec. 16, 1969, 869 O.G. 687. The abstracts of Defensive Publication applications are identified by distinctly numbered series and are arranged chronologically. The heading of each abstract indicates the number of pages of specification, including claims and sheets of drawings contained in the application as originally filed. The files of these applications are available to the public for inspection and reproduction may be purchased for 30 cents a sheet.
Defensive Publication applications have not been examined as to the merits of alleged invention. The Patent Office makes no assertion as to the novelty of the disclosed subject matter.
PUBLISHED MAY 7, 1974 922 O.G. 1O
T922,008 DATA PROCESSOR PROGRAM FOR DETERMIN- ING CHARACTER STRING SIMILARITY Donald C. Gause, Owego, and Eduardo Kelierman and Gerald L. Rouse, Endicott, N.Y., assignors to International Business Machines Corporation, Armonk, N.Y. Continuation of application Ser. No. 154,093, June 17,
1971. This application Oct. 18, 1973, Ser. No. 407,733
Int. Cl. GOSb 19/28; G06f 7/34 US. Cl. 444-1 4 Sheets Drawing. 27 Pages Specification An improved program method is shown for determining the similarity of two character strings where they are not necessarily equal character by character. Each character of one character string is compared with each character of the other character string, and the results are stored in a matrix. Each row of the matrix corresponds to a respective one of the characters of the first string, and each column of the matrix corresponds to a respective character of the other string. A logical one value is stored in each matrix position where corresponding characters of the first and second string are equal, and a logical zero is stored in each position where they are not equal. In the event that there is more than one logical one in each row (or in each column, or both), all of the logical ones in that row (or column, or both) except that which is closest to the major diagonal of the matrix are changed to logical zero. In the event that two logical ones in a row (or column, or both) are equidistant from the major diagonal, both may be retained. The remaining logical ones in the matrix are then considered as points on an X, Y coordinate system, and the standard correlation cocfiicicnt (which measures linear dependence) of the points is calculated to determine the degree of similarity between the two character strings. A decision that sufiicient similarity exists to make a determination that a match has occurred may be based upon various known tests, for example, exceeding predetermined thresholds; or, in the event that a match is made against several character strings, the highest coeflicicnt calculated can be determined as the match. The improved method can be used in programming areas such as inquiry (or query) systems and the job control language processor of an operat- A 0 N E THE COLUMNlBEAI IF, AND ONLY IF, AID-DUI OTHERWISE, LET MI I;J)= D.
ROIV OF M CONTAINS MORE THAN ONE I, TH N LY THE ONE CIOSEST TO THE mIZN 2D;A0ONA% (THE MAIN DIAOONAL IS MIIgII,
MI Inn") CONSIDER THE I'S IN N AS POINTS ON AN X-Y STEP 4 COORDINATE SYSTEM. THAT ISJF Mil-,JI ISI THEN VIE HAVEAPOINT WITH THE Y-CODRDINATE EOUAI. TO I AND THEX COORDINATE EOUAL TO J.
DETERMINE THE DEGREE OFSIMILARITY BY COMPUTING STEP 5 THE STANDARD CORRELATION OOEFFICIENTIWIIIOII MEASURES LINEAR DEPENOENOY OFTHE POINTS.
ing system. The program usually handles missing characters, transposed characters and other common errors. The program can be either a fixed or an adaptive type.
oo 1 o .t e 0 m n A 5 9 s Rt 4 Tmw C h 3|III.. MS A 2 J W M m O ZIJA I N VIII' I MMY ET m m D 000T E n smM 0000 U A 00 0 M G G m 0 0 .G R I cmuu 000 S D R w n M 5 mm 0 r mm w mm EOAUOI A Dh O OOH m M A O O M A III 0 FIG.2b FIG.2 c
FIG. 20
FIG. 1
FORM A MATRIX, M,BY LETTING MIT',J) IWHEREI INDICATES THE ROW O J INDICATES THE COLUMN EAII ,AND ONLY IF, AID=BIJI OTHERWISE, T M(I;J)=0.
IF A ROW OF M CONTAINS MORE THAN ONE I, THEN RETAIN ONLY THE ONE CIOSEST TO THE STEP I STEP 2 STEP 3 MAIN DIACONAL ITHE MAIN DIAGONAL ISMII;I), M(2;2).....)
EDUARDO KELLERMAN GERALD L. ROUSE DONALD C. GAUSE erg 4 ATTORNEY w 0 M V W EL G mun YMNO TIC I U M & R M N O O N A O CT 1 0 0 N E I Dl OMV nU I SWET -l T ..HA Irrrl mSTN R OI m 0 WmDn New mw m AT 0 D M T TN ICLmE DI T H ErErE D| RR M R v D E n E D A EL V EL E H A DN m N m D w NNCL DD |L S U OEU S OHO W CCTE ELHCL DT 4 5 DI EL D1 T E S q.
May 7, 1974 c GAUsE ETAL T922,008
DATA PROCESSOR PROGRAM FOR DETERMINING CHARACTER STRING SIMILARITY Original Filed June 17, 1971 4 Sheets-Sheet B CALCIILATED OOECTION QCI THRESHOLD CORRELATION TADLE LENGTH VALUE LIsT Q TABLE g I SCORE COORDINATE COORDINATE 'MATR IX VECTOR VECTOR A x Y MAX REC N O REC 1 3 REC.
m REC. T REC. I
k n I3- IO [H PROCESSOR I/O DEVICES CPU TERMINAL /12 FIG. 3
May 7, 1974 c, us ETAL T922,008
DATA PROCESSOR PROGRAM FOR DETERMINING CHARACTER STRING SIMILARITY 4 Sheets-Sheet 5 Original Filed June 17, 1971 READ INTO BUFT ALL DONE WPTMTSSRHND IEIT CARRIAGE T BUF2 BUFT N/ WAS A MATCH FOUND PRINT THE ANSWER ASSOC- IATED WITH QUEST|0N TMAX TYPE YES? NO ON TRY" AND HIT TE 'A EAENE J PRINT 'VWAS THAT RIGHT? READ INTO TN 37 TFT TNQTETTTTTTHTTT J INCREASE THRESHOLD FDR QUESTIONTTIMAX G DARRIAGE RETURN FIG. 4
ADD NEW QUESTION AND NEW ANSWER TO LIST May 7, 1974 D. C. GAUSE ETAL DATA PROCESSOR PROGRAM FOR DETERMINING CHARACTER Original Filed June 17, 1971 NO MATCH TO STEP 56 FIG. 4
STRING SIMILARITY 44 ISI GREATER THAN YES NUMBER OF QUESTIONS? E IS MAX TO STEP 28 FICA FIG.5
45 SCORE ET] ='DEGREE OE SIMILARITY' 0F BUFZ AND OEI;T
ENTER FROM STEP 35 4T PLACE A ZERO IN THOSE POSITIONS OF SCORE WHERE THE VALUE OF SCORE IS LESS THAN THE THRESHOLD OF THE CORRESPONDING QUESTION.
SET ZERO IN POSITION IN SCORE DEFINED BY 1 MAX 48 LEI MAX BE THE HIGHEST 51 VALUE IN SCORE YE 5O
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US40773373A | 1973-10-18 | 1973-10-18 |
Publications (1)
Publication Number | Publication Date |
---|---|
UST922008I4 true UST922008I4 (en) | 1974-05-07 |
Family
ID=23613302
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US922008D Pending UST922008I4 (en) | 1973-10-18 | 1973-10-18 | Enter from step |
Country Status (1)
Country | Link |
---|---|
US (1) | UST922008I4 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0136379A2 (en) * | 1983-10-03 | 1985-04-10 | Proximity Technology Inc. | Word comparator means and method |
US5142619A (en) * | 1990-02-21 | 1992-08-25 | International Business Machines Corporation | Method and apparatus for visually comparing files in a data processing system |
US20080282073A1 (en) * | 2007-05-10 | 2008-11-13 | Mason Cabot | Comparing text strings |
-
1973
- 1973-10-18 US US922008D patent/UST922008I4/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0136379A2 (en) * | 1983-10-03 | 1985-04-10 | Proximity Technology Inc. | Word comparator means and method |
EP0136379A3 (en) * | 1983-10-03 | 1986-08-06 | Proximity Devices Corporation | Associative memory circuit system and method |
US5142619A (en) * | 1990-02-21 | 1992-08-25 | International Business Machines Corporation | Method and apparatus for visually comparing files in a data processing system |
US20080282073A1 (en) * | 2007-05-10 | 2008-11-13 | Mason Cabot | Comparing text strings |
US7991987B2 (en) * | 2007-05-10 | 2011-08-02 | Intel Corporation | Comparing text strings |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP2882569B2 (en) | Document format recognition execution method and apparatus | |
EP0268373B1 (en) | Method and apparatus for determining a data base address | |
US6725223B2 (en) | Storage format for encoded vector indexes | |
Bjork | Recovery scenario for a DB/DC system | |
US20180307743A1 (en) | Mapping method and device | |
CN106250319A (en) | Static code scanning result treating method and apparatus | |
UST922008I4 (en) | Enter from step | |
CA2006230C (en) | Method and apparatus for validating character strings | |
US11715030B2 (en) | Automatic object optimization to accelerate machine learning training | |
US20190220363A1 (en) | Method, apparatus and computer program product for improving inline pattern detection | |
CN111045670A (en) | Method and device for identifying multiplexing relationship between binary code and source code | |
US5946493A (en) | Method and system in a data processing system for association of source code instructions with an optimized listing of object code instructions | |
EP0446117B1 (en) | Fast determination of subtype relationship in a single inheritance type hierarchy | |
EP0097818B1 (en) | Spelling verification method and typewriter embodying said method | |
US3613086A (en) | Compressed index method and means with single control field | |
CN109388685B (en) | Method and device for warehousing spatial data used by planning industry | |
US6886161B1 (en) | Method and data structure for compressing file-reference information | |
US7016535B2 (en) | Pattern identification apparatus, pattern identification method, and pattern identification program | |
Wichmann | Some statistics from ALGOL programs | |
GB1479122A (en) | Method of operating a computer to produce test case programmes | |
JP3259781B2 (en) | Database search system and database search method | |
Ehrich | SAM: A Configurable Experimental Text Editor for Investigating Human Factors Issues in Text Processing and Understanding | |
CN113159221A (en) | Network security protection method based on big data | |
WO2020245993A1 (en) | Information processing program, information processing method and information processing device | |
KR900007727B1 (en) | Character recognition apparatus |