UST922008I4 - Enter from step - Google Patents

Enter from step Download PDF

Info

Publication number
UST922008I4
UST922008I4 US922008DH UST922008I4 US T922008 I4 UST922008 I4 US T922008I4 US 922008D H US922008D H US 922008DH US T922008 I4 UST922008 I4 US T922008I4
Authority
US
United States
Prior art keywords
character
matrix
logical
string
row
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed filed Critical
Application granted granted Critical
Publication of UST922008I4 publication Critical patent/UST922008I4/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/02Comparing digital values

Abstract

AN IMPROVES PROGRAM METHOD IS SHOWN FOR DETERMINING THE SIMILARITY OF TWO CHARACTER STRINGS WHERE THEY ARE NOT NECESSARILY EQUAL CHARACTER BY CHARACTER. EACH CHARACTER OF ONE CHARACTER STRING IS COMPARED WITH EACH CHARACTER OF THE OTHER CHARACTER STRING, AND THE RESULTS ARE STORED IN A MATRIX. EACH ROW OF THE MATRIX CORRESPONDS TO A RESPECTIVE ONE OF THE CHARACTERS OF THE FIRST STRING, AND EACH COLUMN OF THE MATRIX CORRESPONDS TO A RESPECTIVE CHARACTER OF THE OTHER STRING. A LOGICAL ONE VALUE IS STORED IN EACH MATRIX POSITION WHERE CORRESPONDING CHARACTERS OF THE FIRST AND SECOND STRING ARE EQUAL, AND A LOGICAL ZERO IS STORED IN EACH POSITION WHERE THEY ARE NOT EQUAL. IN EVENT THAT THERE IS MORE THAN ONE LOGICAL ONE IN EACH ROW (OR IN EACH COLUMN OR BOTH), ALL OF THE LOGICAL ONES IN THAT ROW (OR COLUMN, OR BOTH) EXCEPT THAT WHICH IS CLOSEST TO THE MAJOR DIAGONAL OF THE MATRIX ARE

CHANGED TO LOGICAL ZERO. IN THE EVENT THAT TWO LOGICAL ONES IN A ROW (OR COLUMN, OR BOTH) ARE EQUIDISTANT FROM THE MAJOR DIAGONAL, BOTH MAY BE RETAINED. THE REMAINING LOGICAL ONES IN THE MATRIX ARE THEN CONSIDERED AS POINTS ON AN X, Y COORDINATE SYSTEM, AND THE STANDARD CORRELATION COEFFICIENT (WHICH MEASURES LINEAR DEPENDENCE) OF THE POINTS IS CALCULATED TO DETERMINE THE DEGREE OF SIMILARITY BETWEEN THE TWO CHARACTER STRINGS. A DECISION THAT SUFFICIENT SIMILARITY EXISTS TO MAKE A DETERMINATION THAT A MATCH HAS OCCURRED MAY BE BASED UPON VARIOUS KNOWN TESTS, FOR EXAMPLE, EXCEEDING PREDETERMINED THRESHOLDS, OR, IN THE EVENT THAT A MATCH IS MADE AGAINST SEVERAL CHARACTER STRINGS, THE HIGHEST COEFFICIENT CALCULATED CAN BE DETERMINED AS THE MATCH. THE IMPROVED METHOD CAN BE USED IN PROGRAMMING AREAS SUCH AS INQUIRY (OR QUERY) SYSTEMS AND THE JOB CONTROL LANGUAGE PROCESSOR OF AN OPERATING SYSTEM. THE PROGRAM USUALLY HANDLES MISSING CHARACTERS, TRANSPOSED CHARACTERS AND OTHER COMMON ERRORS. THE PROGRAM CAN BE EITHER A FIXED OR AN ADAPTIVE TYPE.

Description

DEFENSIVE PUBLICATION UNITED STATES PATENT OFFICE Published at the request of the applicant or owner in accordance with the Notice of Dec. 16, 1969, 869 O.G. 687. The abstracts of Defensive Publication applications are identified by distinctly numbered series and are arranged chronologically. The heading of each abstract indicates the number of pages of specification, including claims and sheets of drawings contained in the application as originally filed. The files of these applications are available to the public for inspection and reproduction may be purchased for 30 cents a sheet.
Defensive Publication applications have not been examined as to the merits of alleged invention. The Patent Office makes no assertion as to the novelty of the disclosed subject matter.
PUBLISHED MAY 7, 1974 922 O.G. 1O
T922,008 DATA PROCESSOR PROGRAM FOR DETERMIN- ING CHARACTER STRING SIMILARITY Donald C. Gause, Owego, and Eduardo Kelierman and Gerald L. Rouse, Endicott, N.Y., assignors to International Business Machines Corporation, Armonk, N.Y. Continuation of application Ser. No. 154,093, June 17,
1971. This application Oct. 18, 1973, Ser. No. 407,733
Int. Cl. GOSb 19/28; G06f 7/34 US. Cl. 444-1 4 Sheets Drawing. 27 Pages Specification An improved program method is shown for determining the similarity of two character strings where they are not necessarily equal character by character. Each character of one character string is compared with each character of the other character string, and the results are stored in a matrix. Each row of the matrix corresponds to a respective one of the characters of the first string, and each column of the matrix corresponds to a respective character of the other string. A logical one value is stored in each matrix position where corresponding characters of the first and second string are equal, and a logical zero is stored in each position where they are not equal. In the event that there is more than one logical one in each row (or in each column, or both), all of the logical ones in that row (or column, or both) except that which is closest to the major diagonal of the matrix are changed to logical zero. In the event that two logical ones in a row (or column, or both) are equidistant from the major diagonal, both may be retained. The remaining logical ones in the matrix are then considered as points on an X, Y coordinate system, and the standard correlation cocfiicicnt (which measures linear dependence) of the points is calculated to determine the degree of similarity between the two character strings. A decision that sufiicient similarity exists to make a determination that a match has occurred may be based upon various known tests, for example, exceeding predetermined thresholds; or, in the event that a match is made against several character strings, the highest coeflicicnt calculated can be determined as the match. The improved method can be used in programming areas such as inquiry (or query) systems and the job control language processor of an operat- A 0 N E THE COLUMNlBEAI IF, AND ONLY IF, AID-DUI OTHERWISE, LET MI I;J)= D.
ROIV OF M CONTAINS MORE THAN ONE I, TH N LY THE ONE CIOSEST TO THE mIZN 2D;A0ONA% (THE MAIN DIAOONAL IS MIIgII,
STEP 2 IF A COLUMN OFM CONTAINS MORE THAN ONE I.
STEP 3 THEN RETAIN ONLY THE ONE CLOSEST TO THE MAIZNZDIAOONAL (THE MAIN DIAGONAL I5MII;I),
MI Inn") CONSIDER THE I'S IN N AS POINTS ON AN X-Y STEP 4 COORDINATE SYSTEM. THAT ISJF Mil-,JI ISI THEN VIE HAVEAPOINT WITH THE Y-CODRDINATE EOUAI. TO I AND THEX COORDINATE EOUAL TO J.
DETERMINE THE DEGREE OFSIMILARITY BY COMPUTING STEP 5 THE STANDARD CORRELATION OOEFFICIENTIWIIIOII MEASURES LINEAR DEPENOENOY OFTHE POINTS.
ing system. The program usually handles missing characters, transposed characters and other common errors. The program can be either a fixed or an adaptive type.
oo 1 o .t e 0 m n A 5 9 s Rt 4 Tmw C h 3|III.. MS A 2 J W M m O ZIJA I N VIII' I MMY ET m m D 000T E n smM 0000 U A 00 0 M G G m 0 0 .G R I cmuu 000 S D R w n M 5 mm 0 r mm w mm EOAUOI A Dh O OOH m M A O O M A III 0 FIG.2b FIG.2 c
FIG. 20
FIG. 1
FORM A MATRIX, M,BY LETTING MIT',J) IWHEREI INDICATES THE ROW O J INDICATES THE COLUMN EAII ,AND ONLY IF, AID=BIJI OTHERWISE, T M(I;J)=0.
IF A ROW OF M CONTAINS MORE THAN ONE I, THEN RETAIN ONLY THE ONE CIOSEST TO THE STEP I STEP 2 STEP 3 MAIN DIACONAL ITHE MAIN DIAGONAL ISMII;I), M(2;2).....)
EDUARDO KELLERMAN GERALD L. ROUSE DONALD C. GAUSE erg 4 ATTORNEY w 0 M V W EL G mun YMNO TIC I U M & R M N O O N A O CT 1 0 0 N E I Dl OMV nU I SWET -l T ..HA Irrrl mSTN R OI m 0 WmDn New mw m AT 0 D M T TN ICLmE DI T H ErErE D| RR M R v D E n E D A EL V EL E H A DN m N m D w NNCL DD |L S U OEU S OHO W CCTE ELHCL DT 4 5 DI EL D1 T E S q.
May 7, 1974 c GAUsE ETAL T922,008
DATA PROCESSOR PROGRAM FOR DETERMINING CHARACTER STRING SIMILARITY Original Filed June 17, 1971 4 Sheets-Sheet B CALCIILATED OOECTION QCI THRESHOLD CORRELATION TADLE LENGTH VALUE LIsT Q TABLE g I SCORE COORDINATE COORDINATE 'MATR IX VECTOR VECTOR A x Y MAX REC N O REC 1 3 REC.
m REC. T REC. I
k n I3- IO [H PROCESSOR I/O DEVICES CPU TERMINAL /12 FIG. 3
May 7, 1974 c, us ETAL T922,008
DATA PROCESSOR PROGRAM FOR DETERMINING CHARACTER STRING SIMILARITY 4 Sheets-Sheet 5 Original Filed June 17, 1971 READ INTO BUFT ALL DONE WPTMTSSRHND IEIT CARRIAGE T BUF2 BUFT N/ WAS A MATCH FOUND PRINT THE ANSWER ASSOC- IATED WITH QUEST|0N TMAX TYPE YES? NO ON TRY" AND HIT TE 'A EAENE J PRINT 'VWAS THAT RIGHT? READ INTO TN 37 TFT TNQTETTTTTTHTTT J INCREASE THRESHOLD FDR QUESTIONTTIMAX G DARRIAGE RETURN FIG. 4
ADD NEW QUESTION AND NEW ANSWER TO LIST May 7, 1974 D. C. GAUSE ETAL DATA PROCESSOR PROGRAM FOR DETERMINING CHARACTER Original Filed June 17, 1971 NO MATCH TO STEP 56 FIG. 4
STRING SIMILARITY 44 ISI GREATER THAN YES NUMBER OF QUESTIONS? E IS MAX TO STEP 28 FICA FIG.5
45 SCORE ET] ='DEGREE OE SIMILARITY' 0F BUFZ AND OEI;T
ENTER FROM STEP 35 4T PLACE A ZERO IN THOSE POSITIONS OF SCORE WHERE THE VALUE OF SCORE IS LESS THAN THE THRESHOLD OF THE CORRESPONDING QUESTION.
SET ZERO IN POSITION IN SCORE DEFINED BY 1 MAX 48 LEI MAX BE THE HIGHEST 51 VALUE IN SCORE YE 5O
US922008D 1973-10-18 1973-10-18 Enter from step Pending UST922008I4 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US40773373A 1973-10-18 1973-10-18

Publications (1)

Publication Number Publication Date
UST922008I4 true UST922008I4 (en) 1974-05-07

Family

ID=23613302

Family Applications (1)

Application Number Title Priority Date Filing Date
US922008D Pending UST922008I4 (en) 1973-10-18 1973-10-18 Enter from step

Country Status (1)

Country Link
US (1) UST922008I4 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0136379A2 (en) * 1983-10-03 1985-04-10 Proximity Technology Inc. Word comparator means and method
US5142619A (en) * 1990-02-21 1992-08-25 International Business Machines Corporation Method and apparatus for visually comparing files in a data processing system
US20080282073A1 (en) * 2007-05-10 2008-11-13 Mason Cabot Comparing text strings

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0136379A2 (en) * 1983-10-03 1985-04-10 Proximity Technology Inc. Word comparator means and method
EP0136379A3 (en) * 1983-10-03 1986-08-06 Proximity Devices Corporation Associative memory circuit system and method
US5142619A (en) * 1990-02-21 1992-08-25 International Business Machines Corporation Method and apparatus for visually comparing files in a data processing system
US20080282073A1 (en) * 2007-05-10 2008-11-13 Mason Cabot Comparing text strings
US7991987B2 (en) * 2007-05-10 2011-08-02 Intel Corporation Comparing text strings

Similar Documents

Publication Publication Date Title
JP2882569B2 (en) Document format recognition execution method and apparatus
EP0268373B1 (en) Method and apparatus for determining a data base address
US6725223B2 (en) Storage format for encoded vector indexes
Bjork Recovery scenario for a DB/DC system
US20180307743A1 (en) Mapping method and device
CN106250319A (en) Static code scanning result treating method and apparatus
UST922008I4 (en) Enter from step
CA2006230C (en) Method and apparatus for validating character strings
US11715030B2 (en) Automatic object optimization to accelerate machine learning training
US20190220363A1 (en) Method, apparatus and computer program product for improving inline pattern detection
CN111045670A (en) Method and device for identifying multiplexing relationship between binary code and source code
US5946493A (en) Method and system in a data processing system for association of source code instructions with an optimized listing of object code instructions
EP0446117B1 (en) Fast determination of subtype relationship in a single inheritance type hierarchy
EP0097818B1 (en) Spelling verification method and typewriter embodying said method
US3613086A (en) Compressed index method and means with single control field
CN109388685B (en) Method and device for warehousing spatial data used by planning industry
US6886161B1 (en) Method and data structure for compressing file-reference information
US7016535B2 (en) Pattern identification apparatus, pattern identification method, and pattern identification program
Wichmann Some statistics from ALGOL programs
GB1479122A (en) Method of operating a computer to produce test case programmes
JP3259781B2 (en) Database search system and database search method
Ehrich SAM: A Configurable Experimental Text Editor for Investigating Human Factors Issues in Text Processing and Understanding
CN113159221A (en) Network security protection method based on big data
WO2020245993A1 (en) Information processing program, information processing method and information processing device
KR900007727B1 (en) Character recognition apparatus