CA2224457C - Protecting confidential information in a database for enabling targeted advertising in a communications network - Google Patents
Protecting confidential information in a database for enabling targeted advertising in a communications network Download PDFInfo
- Publication number
- CA2224457C CA2224457C CA002224457A CA2224457A CA2224457C CA 2224457 C CA2224457 C CA 2224457C CA 002224457 A CA002224457 A CA 002224457A CA 2224457 A CA2224457 A CA 2224457A CA 2224457 C CA2224457 C CA 2224457C
- Authority
- CA
- Canada
- Prior art keywords
- processor
- electronically
- tuples
- public
- attribute values
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/02—Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
- H04L63/0227—Filtering policies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
- G06F21/6254—Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0407—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the identity of one or more communicating identities is hidden
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/10—Network architectures or network communication protocols for network security for controlling access to devices or network resources
- H04L63/102—Entity profiles
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/2866—Architectures; Arrangements
- H04L67/30—Profiles
- H04L67/306—User profiles
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/53—Network services using third party service providers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/40—Network security protocols
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/30—Definitions, standards or architectural aspects of layered protocol stacks
- H04L69/32—Architecture of open systems interconnection [OSI] 7-layer type protocol stacks, e.g. the interfaces between the data link level and the physical level
- H04L69/322—Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions
- H04L69/329—Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions in the application layer [OSI layer 7]
Abstract
Protecting a database against the deduction of confidential values contained therein is accomplished by partitioning the database into public and private values (202), some of which public values are deemed more important than others (203). The private attribute values are electronically processed (204-226) to reduce any high correlation between the public values and the private values. Specifically the processor partitions the database (204-210) into safe tuples and unsafe tuples, which unsafe tuples have high correlative public values (216-218). The processor then selectively combines the public attribute values of the tuples (220) to camouflage such tuples from deduction of their private attribute values beyond a threshold level of uncertainty (226).
Description
CA 022244~7 l997-l2-ll W O 96/42059 PCT/U~,''0~703 Protecting Confidential Information in A Database for Enabling Targeted Advertising in a Communications Network Field of the Invention The present invention is related to a system and method for maintaining the confidel)liality of certain information in a ~l~hh~.se. According to an illustrative embodiment, the rl~t~h~.se illustratively contains de~lloy,dphic information regarding customers of a communication network. The system and method can enable advertisers to target specific ones of the customers, whose demographics meet anadvertizer specified profile, for advertising via the communications network. Inparticular, the method and system relate to processing the demographics d~tAh~e to ensure that private information of the customers cannot be deduced by the advertisers beyond a controllable level of uncertainty, so that an advertiser cannot deduce the specific con~idenlial infommation belonging to a specific customer.
R-~ck~,.ound of the Invention The present invention is relevant to delivery of information in any kind of information infrastructure. The invention is illustrated herein using a communications network type of information infrastructure which can deliver video programming.
In a typical network in which advertisements or other video programming are delivered, such as a conventional cable television network, the adve~liser,~nls are delivered to many customers il,disclim;llately. This is disadvantageous for the customers because some customers are subjected to advertisements in which they have no interest. It is also disadvantageous to the advertisers because the advertisers must pay to deliver the advertisement to a large audience of customers including, the customers they desire to reach and the customers who have no interest in the advertisement.
3 0 In a preferred advertisement strategy, the advertisers target a selected group of the customers who are more likely to be interested in the advertisements and deliver the advertisements to only the selected group of customers. Until recently, . .. ...
CA 022244~7 1997-12-ll such targeted advertisement was not possible in bro~clc~-~t communications because the communications network in which the advertisements were delivered did not permit delivery of advertisements to only specified customers. However, recent advances in communications networks have made such selective delivery of broadc~sted advertisements possible. FIG 1 depicts one such illustrative improved prior art communications network 10. Illustratively, the communications network 10 may be any kind of network such as a telephone network, a computer network, a local area network (LAN), a wide area network (WAN), a cable television network,etc. As shown, the network 10 interconnects sources 21 and 22, such as advertisers, to destinations 31, 32, 33 and 34, such as customers. The communications network 10 can transport video, audio and other data from a source, e.g., the source 21, to only specific ones of the destinations 31-34, e.g., the destinations 31 and 33. For example, the video, audio and data may be transmitted as a bitstream which is organized into packets. Each packet contains a header portion which includes at least one identifier, for a destination 31,32,33 and/or 34, that is unique over the network 10 (e.g., the identifiers for the destinations 31 and 33). These identifiers are referred to as network addresses. The packet is routed by the communications network 10 only to those destinations 31 and 33 as specified by the network addresses contained in the header of the packet.
In order to implement the targeted advertising strategy, the advertisers must be able to detemmine the customers to which the adve,liser"e,ll~ are targeted.
Advantageously, demographic data regarding the customers is compiled into a d~t~h~.se. A d~t~h~se is defined as a collection of data items, organized according to a data model, and ~cce~ssed via queries. The invention herein is illustrated using a relational ~At~h~se model. A relational d~hh~.~e or relation may be organized into a two dimensional table containing rows and columns of infommation. Each column of the relation corresponds to a particular attribute and has a domain which comprises the data values of that attribute. Each row of a relation, which includes one value from each attribute, is known as a record or tuple.
R-~ck~,.ound of the Invention The present invention is relevant to delivery of information in any kind of information infrastructure. The invention is illustrated herein using a communications network type of information infrastructure which can deliver video programming.
In a typical network in which advertisements or other video programming are delivered, such as a conventional cable television network, the adve~liser,~nls are delivered to many customers il,disclim;llately. This is disadvantageous for the customers because some customers are subjected to advertisements in which they have no interest. It is also disadvantageous to the advertisers because the advertisers must pay to deliver the advertisement to a large audience of customers including, the customers they desire to reach and the customers who have no interest in the advertisement.
3 0 In a preferred advertisement strategy, the advertisers target a selected group of the customers who are more likely to be interested in the advertisements and deliver the advertisements to only the selected group of customers. Until recently, . .. ...
CA 022244~7 1997-12-ll such targeted advertisement was not possible in bro~clc~-~t communications because the communications network in which the advertisements were delivered did not permit delivery of advertisements to only specified customers. However, recent advances in communications networks have made such selective delivery of broadc~sted advertisements possible. FIG 1 depicts one such illustrative improved prior art communications network 10. Illustratively, the communications network 10 may be any kind of network such as a telephone network, a computer network, a local area network (LAN), a wide area network (WAN), a cable television network,etc. As shown, the network 10 interconnects sources 21 and 22, such as advertisers, to destinations 31, 32, 33 and 34, such as customers. The communications network 10 can transport video, audio and other data from a source, e.g., the source 21, to only specific ones of the destinations 31-34, e.g., the destinations 31 and 33. For example, the video, audio and data may be transmitted as a bitstream which is organized into packets. Each packet contains a header portion which includes at least one identifier, for a destination 31,32,33 and/or 34, that is unique over the network 10 (e.g., the identifiers for the destinations 31 and 33). These identifiers are referred to as network addresses. The packet is routed by the communications network 10 only to those destinations 31 and 33 as specified by the network addresses contained in the header of the packet.
In order to implement the targeted advertising strategy, the advertisers must be able to detemmine the customers to which the adve,liser"e,ll~ are targeted.
Advantageously, demographic data regarding the customers is compiled into a d~t~h~.se. A d~t~h~se is defined as a collection of data items, organized according to a data model, and ~cce~ssed via queries. The invention herein is illustrated using a relational ~At~h~se model. A relational d~hh~.~e or relation may be organized into a two dimensional table containing rows and columns of infommation. Each column of the relation corresponds to a particular attribute and has a domain which comprises the data values of that attribute. Each row of a relation, which includes one value from each attribute, is known as a record or tuple.
3 0 FIG 2 shows an exemplary relational d~t~h~-se (prior art) Y. The relation Y
of FIG 2 contains data pertaining to a population group. The relation Y has six CA 022244~7 l997-l2-ll W O 96/42059 PCT~US96/09703 attributes or columns 2-1,2-2,2-3,2-4,2-5 and 2-6, for storing, respectively, name, age, weight, height, social security number and telephone extension data values of the population. The d~t~h~ce also has twelve records or tuples 3-1, 3-2, 3-3,.....
3-12. Each tuple 3-1,3-2,3-3,....,3-12 has one data value from each attribute. For instance, the tuple 3-10 has the name attribute value "lee", the age attribute value 40, the weight attribute value 171, the height attribute value 180, the social security number attribute value 99~ ~8 7654 and the telephone extension attribute value 0123.
To identify the targeted customers for an advertisement, a profile containing queries is executed against the ~l~t~hAse. A query is used to identify tuples which meet criteria of interest from the d~t~h~ce A query usually includes a predicatewhich specifies the criteria of interest. For instance, the following query executed against the relation Y:
Select from A where Y.Age < 15 OR Y.Age ~ 50 includes the predicate "where Y.Age < 15 OR Y.Age > 50" which specifies that only those tuples having an Age attribute value less than 15 or greater than 50 are to be identified. The advertiser can thus construct a profile for execution against the relational d~t~h~ce to identify the targeted audience of customers.
The problem with i",~ 'e,ne"li,)g such a targeted advertising scheme is that customers may be reluctant to wholesale disclose the necessary demographic data for constructing the relational d~t~h~se. In particular, customers may be concerned about:
(1 ) direct release of raw infommation about an individual customer, (2) deduction of non-released infommation of an individual customer from infomlation regarding the identity of the customers who match a given profile, and (3) deduction of non-released information of a specific individual customer from knowledge of a series of profiles, together with the number of individual customers that received or would receive the 3 0 adve, lise~ "e~ ItS corresponding to those profiles.
CA 022244~7 1997-12-11 W O 96/420S9 PCTAUS~ 7~3 The first two threats to privacy can be overcome by modifying the communications network in a fashion similar as has been done for protecting anonymity of customers who retrieve video in Hardt-Komacki & Yacobi, Securing End-User Prlvacy During Inforrna~ion Filtering, PROC. OF THE CONF. ON HIGH PERF.INFO. FILTERING, 1991. Such a modified network is shown in FIG 3. As shown, the communications network 50 interconnects sources (advertisers) 61, 62 and destinations (customers) 71, 72, 73 and 74 similar to the network 10 of FIG 1.
However, a filter station 80 and name translator station 90 are also provided which are connected to the communications network 50. Illustratively, the filter station 80 has a memory 82 for maintaining the cl~t~h~.se of customer demographic data.
Furthermore, the filter station 80 has a processor 84 which can execute queries against the demographics d~t~h~.se stored in the memoN 82. Each source, such as the source 62, has a seNer 64 and a memoN 66. The seNer 64 of the source 62 transmits one or more profiles (containing queries for identifying particular target audiences) to the processor 84 of the filter station 80. The processor 84 executes each profile query against the relational ~i~t~h~se stored in the memoN, 82 to retrieve the aliases assigned to each customer identified by each query. The processor 84 then transmits the corresponding aliases for each profile back to the seNer 64 of the source 62 which may be stored in the memory 66 for later use.
When the advertiser-source 62 desires to transmit the advertisement to the targeted customer destinations, e.g., the destinations 72 and 74, the seNer 64 transmits the advertisement and the aliases into the network 50. The network 50 delivers the advertisement and aliases to the processor 92 of the name translator station 90. The processor 92 then translates the aliases to their corresponding network addresses, for example, using infommation stored in a memory 94. The processor 92 of the name translator station 90 then transmits the advertisement to the customer destinations 72, 74 using the network addresses.
In the modified communications system, the customer-destination, e.g., the destination 72, knows its own demographic infommation. The advertiser-source, 3 o e.g., the source 62, knows its advertisement, its profiles and how many customers will receive the advertisement. The advertiser only receives aliases for the W O 96/420~9 PCT/USr.'~370 individual cust~" ~,~ 71-74. Thus, the advertiser does not posses the raw de"loy,aphic infor~ tion and is not given info",lalion for identifying the cusLom~rs 71-74 (such as the network acJ-Jlasses). The filter station 80 con~ains intur~,laLion regarding the entire clellloyraphics dA~ A~e and receives the profiles suiJmill~d by 5 the advellisers. The name translator station 90 contains only the translations of aliases to network addresses and rece,ves the aliases and advertisements. The network 50 only receives the ad~ellisemenl and network addlesses of the destinations.
Despite such protections, the advertiser still obtains some results of the 10 execution of the queries of the profiles against the demographics d~t~h~e, such as the number of cusLull~l which match each profiie. This may be sufficient inFol~,lalion to deduce personal inror",alion of the cu~lù"~er. For example, s~lppose the advertiser knows the identities of 100 customers in the zip code 07090 who collect stamps. Furthermore, sl Inpose the advertiser submits a profile for targeting all cu~Lomers in zip code 07090 who collect starnps and who have an annual income of $50,000-$100,000. If 100 aliases are returned to the advertiser, then the advertiser successfully deduces the salary range of all 100 stamp collectors.
The above threat, wherein query results can lead to deducing private il l~ur~l laLion, is referred to as a "tracker attack." Stated more generally, a "tracker" is 2 0 a special case of a linear system which involves solving the equaffon:
HX=Q (1) where:H is a matrix which represer)t~ tuples that satisfy co,~esponding queries,where each column j represents a diFFerenl tuple, each row i represen~ a diFFererll query and where each matrix element hj = 1 if 2 5 the j~ tuple ~tisfics the predicate Cj of the i~ query and 0 otherwise, C is a vector representi, l~ the pre~ t~s used in each i~ query, X is a vector represenling the (unknown) tuples which satisfy the predicates C (to be solved by equation (1)), and Q is a vector of counts or other results returned by each i~ query containing elements q; where each qj is the sum (or other result SUBS~EIIJ~tSHt~l ~RUIE26) CA 022244~7 1997-12-11 W O 96/42059 PCTAJS~'03703 returned from the jth query) over an attribute of the tuples retrieved by the j~h query.
The prior art has proposed some solutions for protecting st~tistic~l relational databases from tracker attacks. Dobkin, Jones & Lipton, Secure D~tAh~ses:
Protection Against User Inference, ACM TRANS. ON DATABASE SYS., vol. 4, no. 1, Mar., 1979, p.97-106 proposes to restrict query set overlap, i.e., to prevent submission of multiple similar query sets, to prevent this kind of attack. However, such a control is difficult to implement because a history of all previously submitted query sets must be maintained and compared against the most recent submitted query. A "cell-suppression" technique has also been proposed wherein statistics, or other query execution results, that may reveal sensitive information are never released. However, cell-suppression techniques are best used for queries which produce two and three di"lensional tables but not for arbitrary queries which are of concem in impleme, lling targeted advertising.
Random noise techniques have been proposed wherein a random number is subtracted from the results retumed by a query. This solution is not satisfactory for imple")enlil1g targeted advertising because the result presented to the advertiser would then be inherently inaccurate. In an altemative scheme proposed in Wamer, Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias, 60 J. OF THE AM. STAT. Assoc. p.63-69 (1965), individuals may enter erroneous values into the relational li~t~h~se a certain percentage of the time. The problem with this strategy is that the advertisers would then target advertisements to the wrong audience a certain percentage of the time. Denning, Secure St~ti..li~l DAtAh~ses Under Random Sample Queries, ACM TRANS. ON DATABASE SYS., vol. 5, no. 3, Sept., 1980, p.291-315 discloses a noise technique wherein the queries are applied to only random sllhsets of the tuples rather than all of the tuples in the relational d~t~h~e. In addition to the specific disadvantages mentioned above, one or more of the above-described noise addition techniques may be subverted bya variety of noise removal methods.
3 0 Yu & Chin, A Study on the Protection of St~tistic~l D~tAh;~e5~ PROC. ACM
SIGMOD INT'L CONF. ON THE MGMT. OF DATA, p.169-181 (1977) and Chin &
CA 022244~7 l997-l2-ll W O 96/420~9 PCTAUS~J1~703 Ozsoyoglu, Security in Partitioned Dynamic S~l;.~ljCAI Databases, PROC. IEEE
COMPSAC CONF., P. 594-601 (1979) disclose methods for partitioning the relational tAhA~e into disjoint partitions.
All of the above methods were developed primarily for statistical databases and do not have properties which enable the implementalion of targeted advertising.
In particular, the above methods do not provide precise identification of tupleswhich satisfy queries or do not provide an accurate count (or other retumed query result) of such retrieved tuples. However, both of these properties are important in targeted advertising. First, it is important to accurately target all customers whose 10 demographic data matches a submitted profile. Second, it is vital to obtain an accurate count of the identified customers for purposes of billing the advertiser and for purposes of deciding whether or not the profile identified a desirable number of customers for receiving the adve, lisemenl.
It is therefore an object of the present invention to overcome the disadvantages of the prior art. It is another object of the present invention toprovide a targeted advertising method which preserves the privacy of confidential information of the customer. In particular, it is an object of the present invention to reduce the advertisers' ability to deduce confidential infommation about the customers from the results of one or more profile queries executed against a 2 o demographics relational dAt~hAse.
Summary of the ll~e.~tio.~
These and other objects are achieved according to the present invention.
According to one embodiment, the present invention can maintain the confidentiality of information in d~t~hAse for use in a communications system environment. As inthe prior art communications system, this embodiment provides a communications network which interconnects an advertiser, customers, a filter station and a name translator station. Illustratively, the filter station maintains a demographics database of ill~o""dlion regarding the customers. However, the invention can work with 3 o dAt~hA~es storing any kind of infommation and can work for both relational and non-relational dAtAhA.~es In order to obtain a target audience for an advertisement, the CA 022244~7 1997-12-11 W O 96/420~9 PCTAUS96109703 advertiser can submit one or more profiles containing queries to the filter station.
The filter station executes the profile queries against the demographics d~tAh~se in order to identify tuples corresponding to customers who match the profile of thetargeted audience. To preserve the anonymity of the customers, the filter station 5 transmits ~ es, instead of identifying infommation, for the customers identified by the profile to the advertiser. When the advertiser desires to deliver an advertisement to the target audience of customers, the advertiser transmits the advertisement and the aliases via the communications network to the name translator station. The name translator station then translates the received aliases 10 to the network addresses of the customers using its translation table and then transmits the advertisement to the customers via the communications network.
Like the conventional communications network, the communications network according to an embodiment of the present invention restricts the access of the advertisers to the demographics relational d~t~h~se and discloses aliases to the15 advertisers in lieu of the actual network addresses of the customers. This prevents:
(1) disclosure of the raw infommation in the database to the advertiser, and (2) deduction of confidential infommation from the identity of customers.
However, unlike the conventional communications system, the present invention also provides for reducing the advertiser's ability to deduce confidential information from the results retumed by the filter station in response to the profile queries submitted by the advertiser. That is, the present invention protects against tracker attacks and other kinds of confi.lenliality breeches, wherein the advertiser alle~lpts to deduce confidential information about the customers in the d~t~h~.se from, for 2 5 example, the mere number of aliases returned in response to a profile query.To achieve this protection in the present invention, the attributes are divided into two cl~.~ses, namely, public attributes, for which no confidentiality protection is provided, and private attributes, for which confidentiality protection is provided. In order to prevent an advertiser from deducing private attribute values, the dAt~h~.ce 30 is thereafter processed to reduce any high correlation between public attribute CA 022244~7 1997-12-11 W O 96/42059 PCTAJS9~'0~703 _ 9 _ values and private attribute values. A vector of one or more particular public attribute values is said to have a high correlation with a private attribute value, if:
(1) the vector of particular public attribute values identifies a group of tuples of the r~t~h~.se which have public attribute values that match the vector of public attribute values, and (2) the level of uncertainty regarding the values of the private attribute of the identified group is less than a predetermined threshold.
Stated another way, a specific vector of public attribute values of tuples may correspond to a small number of private attribute values thus reducing the uncertainty about the private attribute values when the public attribute values are known. In the worst case, the vector of public attribute values would correspond to only a single private attribute value. Thus, there might be a high level of certainty in determining the actual private attribute values of the group of tuples identified by a given vector of public attributes. Illustratively, if the number of di~linclly different private attribute values for the group identified by such a vector is less than a predetemmined threshold number of values, then the correlation of the public attributes is unacceptably high. Herein, a public attribute value with an unacceptably high correlation with one or more private attribute values is referred to as a "highly correlative public attribute value".
According to one embodiment, tuples containing public attribute values that are highly correlated with private attribute values are processed in a fashion either to camouflage the public attributes of the tuple or to remove such tuples from identification in the d~t~h~e. Tuples are "camouflaged" by combining the specific public attribute values of the tuples, that are highly correlated with one or more 2 5 specific private attribute values of the tuples, with other public attribute values of the tuples to reduce the correlation.
A method and system are therefore provided wherein attributes are classified as private or public and wherein the correlation between public and private attributes is reduced by camouflaging highly correlative public attribute values. The 3 o invention provides for introduction of an adjustable level of uncertainty in deducing CA 022244~7 l997-l2-ll 9 PCT~U53''~3703 private ir,fc,rrllalion from the results of queries executed against the demographics relational d~t~h~se.
Brief Description of the Drawing FIG 1 depicts an ordinary prior art communications network.
FIG 2 depicts a prior art demographics relational rl~t~h~.~e.
FIG 3 depicts a prior art communications network with privacy protection of customer network addresses.
FIG 4 depicts a communications network according to an embodiment of the present invention with anonymity protection of private customer infommation.
FIG 5 schemalically depicts a flowchart illustrating a method according to one embodiment of the present invention Detailed DescriPtion of the l~ tiGr.
As mentioned before, the present invention can protect the confidentiality of virtually any kind of information in both relational and non-relational d~t~h~.ses and in a variety of envirol"nents including communication networks. For purposes of simplicity and clarity, the invention is illustrated below using a communications network envi,oni,~,)l and a relational d~t~h~e containing demographics information. In the embodiment discussed below, advertisers submit queries for execution against the relational demographics database for purposes of identifying a target audience for advertising. Again, this is illustrative; the invention can also work in other applications wherein queries are submitted to achieve other goals.FIG 4 shows an illustrative communications network 100 according to the present invention. As shown, advertisers 121 and 122, customers 131, 132, 133 and 134, and a name translation station 140 are provided which are connected to the communications network 100. Furthermore, a filter station 150 is provided which is adapted according to the presel1t invention. The filter station 150 has a processor 155 and a memory 160 connected thereto.
Like the processor 84 and "le"lo~ 82 (FIG 3) of the filter station 80 (FIG 3) of the conventional filter station 80 (FIG 3), the processor 155 and memory 160 can CA 022244C,7 1997-12-11 WO 96/42059 PCT/US~G~ 703 perform various functions for preventing disclosure to the advertisers 121-122 of the raw data. The processor 155 and 160 can also perform functions for preventing deduction by the advertisers 121-122 of private infommation from the identification of customers (from their networK addresses). The processor 155 can receive de,,loyldphics information from the customers 131-134 and can construct a demographics relational dat~h~ce. The processor 155 can store the demographics relational d~t~h~se in the memory 160. The processor 155 can also receive from the advertisers 121-122, such as the advertiser 122, profiles containing queries for execution against the relational d~t~hA~e. In response, the processor 155 identifies the tuples of the relational ~l~t~h~.ce which match the profile. The processor 155 then transmits the identifier and the aliases to the advertiser 122.
The processor 155 and memory 160 of the filter station 150 are also capable of processing the demographics relational database to reduce the ability of advertisers to deduce private infommation from results retumed by the filter station 150 in response to profile queries submitted by the advertisers. In the discussions below, it is presumed that the advertisers use the number of returned aliases todedllce private information, although the discussion is general enough to apply to any result retumed in response to profile queries.
The processing of the processor 155 and memory 160 can be summarized as partitioning the ,i~t~h~.se into public attributes, for which no confidentiality protection need be provided, and private attributes, for which confidentiality protection is provided. In providing confidentiality protection, it should be noted that some of the information of the demographics relational ~t~h~.se is already assumed to be public, or otherwise not worthy of confidentiality protection. For2 5 instance, consider a frequent flyer d~t~h~se which contains the following attributes:
zip code, telephone number, occup~tion, dietary restrictions and income level. The telephone number of an individual customer may be widely published in a telephone directory. Furthermore, the occur~tion of an individual customer, while not widely published, may be considered non-confidential or non-personal. On the other hand, 3 0 other infommation such as dietary restrictions and income level may be presumed to be personal and confidential information. After partitioning the d~t~h~e, the CA 022244~7 1997-12-11 W O 96/42059 PCTAJ~3'1/~3703 correlation between public attributes and private attributes is reduced by camouflaging some highly correlative public attribute values and outright removing some tuples containing highly correlative public attribute values which are difficult to camouflage.
The processor 155 may also partition out an identification attribute from the database which uniquely identifies each tuple. Such an identification could be anetwork address, social security number, etc. Such information can only be the subject of a profile query if that query does not execute against private attributes or is merely used to update the corresponding tuple of the d~t~hA.se.
Illustratively, the public attributes are further divided into important public attributes and non-important public attributes. Advertisers are permitted to specify attribute values of important public attributes with a greater degree of certainty than non-important public attributes. Illustratively, the advertisers may specify which of the attributes are to be treated as important. The invention is illustrated below with important and non-important public attribute partitioning.
In the ~iscussion below, the vector A represents the public attributes of a specified set or group of tuples and each component <A1,...,A">, of A represents an individual public attribute vector. The vector A' represents the important public attributes of a specified set or group of tuples and each component <A;,...,A'"~ of 2 o A ', represents an individual important public attribute vector. The vector A "
represents the non-important public attributes of a specified set or group of tuples and each component <A"1,...,A"p of A" represents an individual non-important public attribute vector. The vector P represents the private attributes of a specified set or group of tuples and the components <P1~ Pq;~ represents an individual 2 5 private attribute vector. The vector K represents a vector of uncertainty thresholds for the private attributes P. Illustratively, each scaler component ki of K is athreshold count of distinctly different private attribute values in Pi. Each threshold of uncertainty ki can be fixed or dynamically adjusted by the processor 155 to adjust the level of confidenliality protection. The vectors V, V' V' V"' and U represent 3 o distinct vectors of particular scaler attribute values <v"..., vn>, <v'"...,v'j,...,v'm>, etc.
for the public attributes A, A', or A" of a single tuple. Herein, the notation A;=v,.....
CA 022244~7 1997-12-11 W O 96/42059 PCT~US3~ 3~3 A n-vn, refers to a single tuple (i.e., row of the relational dAtAhAce) for which each designated public attribute vector, e.g., A'1, takes on the corresponding, distinct, scaler attribute value, e.g., v,.
FIG 5 is a flowchart which schematically illustrates a process executed by the processor 155 and memory 160 for ensuring the confidentiality of demographicinfommation from deduction by the advertisers 121-122. In a first step 202, the processor 155 partitions the attributes of the dAtAhA.se into public attributes A1,...,An, containing non-confidential information and private attributes P~ Pq~ containingconfidential information. For example, suppose the attributes are age, height, 10 religious afliliation and salary. The attributes age and height might be designated as public attributes whereas the attributes religious affiliation and salary might be designated as private attributes.
Next, in steps 204-226, the processor 155 removes high correlations between public and private attributes of tuples in the dAtAhA~e. Stated another way, 15 consider a specific vector of particular attribute values V such that A1=v"
A2=vz~ An=vn. This vector V identifies a group of tuples which have values for public attributes Al~ An that match V. The rlA1AhAse is processed to ensure thatfor any such group of tuples identified by any vector V, there is a threshold level of uncertainty k, about the values of any jth private attribute Pj in the identified set. For 2 o example, consider a dAtAhA-se having only public attributes of age and occupation and only private attributes of salary range. The dAtAhA~e may have certain vectors of age and occupation (e.g., ~age:35, occupation: doctor>) for which there are relatively few diflerent values for salary (e.g., salary: top 5%). In processing the database, certain attribute values are combined in an all~:",pt to "camouflage"
2 5 tuples which otherwise would have easily deducible private attributes. Other tuples which cannot be camouflaged are removed.
(As ~iscussed in greater detail below, "removed" tuples can be treated in one of a number of ways. For instance, the removed tuples can be excluded from queryexecution and thus would never receive a targeted advertisement. Altematively, the 3 o "removed" tuples are not excluded from either query execution or targeted advertising. However, the processor 155 must take steps to ensure that the CA 022244~7 1997-12-11 W O 96142059 PCT~US9~'0~1~3 - 14-confidentiality of private attribute values of such removed tuples is not compromised by query execution.) In steps 204-210, the processor 155 partitions the cl~t~thA-se into a "safe" setF and an "unsafe" set R of tuples. In step 204, the processor forms each possible 5 vector of important public attribute values V' which vector V' includes one attribute value <v'"...,v'j,...,v'm> for each important public attribute A ;,. ..,A j,... ,A m For example, the following are distinct vectors which may be formed on a ~t~h~se with important public attributes age, weight and occupation and private attribute salary:
cage=53, occupation=doctor>; <age=35, occupation=doctor>; ~age=35, occupation=minister>; etc. A group of tuples corresponds to each of these vectors V'. That is, each tuple in a particular group contains the same important attribute values as the vector V' to which the group corresponds. For example, the vector <age=35, occupation= minister> might identify the tuples:
age=35, occupation=minister, salary= 70%
age=35, occup~tion=minister, salary= 70%
age=35, occupation=minister, salary= 65%
age=35, occupation=minister, salary= 35%
age=35, occupation=minister, salary= 40%
age=35, occupation=minister, salary= 40%
age=35, occupation=minister, salary= 15%
In step 206, for each group thus formed, the processor 155 compares the number of distinct attribute values in each jth private attribute Pi of the group to the corresponding uncertainty threshold ~. If there are at least ~ distinct private attribute values in the group for each jth private attribute Pj, the processor 155 adds the group of tuples to the set F in step 208. Othe~ise, the processor 155 adds the group of tuples to the set R in step 210. For example, suppose that 1< is set to 4 in the above age, occupation, salary example. In such a case, there are 5 distinct values for the private attribute salary, namely, 70%, 65%, 40%, 35% and 15%.
Thus, all of these tuples may be added to the set F. On the other hand, suppose another group of tuples was identified for the vector <age=35 occupation=doctor>as follows:
CA 022244~7 1997-12-11 W O 96/42059 PCT~US9~/05703 age=35, occupation=doctor, salary=5%
age=35, occupation=doctor, salary=5%
age=35, occupation=doctor, salary=10%
age=35, occupation=doctor, salary=10%
age=35, occupation=doctor, salary=5%
age=35, occup~tion=doctor, salary=10%
age=35, occupation=doctor, salary=5%
age=35, occupation=doctor, salary=15%
age=35, occupation=doctor, salary=5%
age=35, occupation=doctor, salary=5%
age=35, occupation=doctor, salary=15%
This group has only 3 distinct salary attribute values, namely, 5%, 10%, and 15%.
Thus, the processor 155 would add these tuples to the set R.
Next in steps 212-222, the processor 155 combines selected important public attribute values. In step 212, the processor 155 selects an important attribute A;. Illustratively, the processor 155 selects each j~h important attribute in decreasing number of distinct attribute values over the entire r~t~hase The processor 155 then executes the steps 214-226 with the selected important public attribute A;. In step 214, the processor 155 identifies each distinct value v'; of the selected important public attribute A; in the set R. In step 216, the processor 155 then identifies each tuple in the both sets F and R having each important public attribute value v';
(identified in the set R) for the important public attribute A;. For example, suppose age is selected as the attribute A;. Then age=35 is a public attribute value that is contained by the tuples with public attribute values <age=35, occupation=doctor> in the set R. Age=35 is also a public attribute value contained by the tuples with public attribute values <age=35, occupation=minister> in the set F. Therefore, the following tuples in sets R and F are identified:
age=35, occupation=minister, salary= 70%
age=35, occupation=minister, salary= 70%
age=35, occupation=minister, salary= 65%
age=35, occupation=minister, salary= 35%
CA 022244~7 1997-12-11 W O 96/42059 PCTAJ~3~ 9703 age=35, occl Ip~tion=miniSter, salary= 40%
age=35, occupation=minister, salary= 40%
age=35, occupation=minister, salary= 15%
age=35, occupation=doctor, salary=5%
age=35, occupation=doctor, salary=5%
age=35, occupation=doctor, salary=10%
age=35, occupation=doctor, salary=10%
age=35, occupation=doctor, salary=5%
age=35, occupation=doctor, salary=10%
age=35, occupation=doctor, salary=5%
age=35, occupation=doctor, salary=15%
age=35, occupation=doctor, salary=5%
age=35, occupation=doctor, salary=5%
age=35, occupation=doctor, salary=15%
Next in step 218, the processor identifies each distinct vector V" in the identified tuples of sets F and R where the vector V" includes important public attribute values v""...,v"j."v"j+"...,v"m on the important public attributes A',.....
A j.1,A j",...,A m other than A j. A group of the tuples which were identified in the sets R and F corresponds to each distinct vector V" That is, each tuple in a particular group has the attribute values of the particular attribute value vector V" to which the group corresponds. Such tuples are identified by the processor 155 in step 218.
For example, suppose the public attributes are age, weight and height and the private attribute is salary. Suppose the values v'j=35 and v'j=53 identify the following tuples:
age=35, weight=150, height=6', salary= 5%
age=53, weight=150, height=6', salary= 10%
age=35, weight=160, height=6', salary= 10%
age=53, weight=160, height=5.5', salary= 15%
age=35, weight=150, height=5.5', salary= 5%
age=53, weight=150, height=5.5', salary= 10%
age=35, weight=150, height=5.5', salary= 15%
CA 022244~7 l997-l2-ll W O 96/42059 PCT~US96/09703 age=53, weight=160, height=6', salary= 20%
The vectors V" are: <weight=150, height=6'~; <weight=160, height=6'>, <weight=150, height=5.5'> and <weight=160, height=5.5'>. The identified groups are as follows:
weight=150. height=6' age=35, weight=150, height=6', salary= 5%
age=53, weight=150, height=6', salary= 10%
weight=160. height=6' age=35, weight=160, height=6', salary= 10%
age=53, weight=160, height=6', salary= 20%
weight=160, height=5.5' age=53, weight=160, height=5.5', salary= 15%
weight=150, height=5.5' age=35, weight=150, height=5.5', salary= 5%
age=53, weight=150, height=5.5', salary= 10%
age=35, weight=150, height=5.5', salary= 15%
Next, in step 220, if there are at least ki distinct private attributes values in a group for each jth private attribute Pi, the processor 155 combines all of the values in the group for the important public attribute A,. Illustratively, each value v'; may only be combined once. For example, suppose k=3 for salary. Then the group corresponding to vector V"-<weight=150, height=5.5'> satisfies the threshold of uncertainty. The age attribute values are therefore combined to produce the tuples:
age={35,53}, weight=150, height=5.5', salary= 5%
age={35,53}, weight=150, height=5.5', salary= 10%
age={35,53}, weight=150, height=5.5', salary= 15%
In step 222, the processor 155 substitutes a representative public attribute value for each combination. Continuing with our example, the representative value may be the first public attribute value v'j selected, i.e., age=35, to produce the tuples:
3 0 age=35, weight=150, height=5.5', salary= 5%
age=35, weight=150, height=5.5', salary= 10%
CA 022244~7 1997-12-11 W O 96142059 PCTAJSg'1~703 age=35, weight=150, height=5.5', salary= 15%
In step 224, the processor 155 identifies each distinct vector V"' of the important public attributes A' in the set F. In step 226, the processor 155 alsoidentifies each vector U of non-important public attribute values, i.e., the values 5 ul,,ut such that A",=u" A'2=u2,,A"Fut, which occur with each distinct attribute value vector V"'of the important public attributes A'. In step 226, the processor 155 combines each vector U of non-important public attribute values with the distinct attribute value vector V"'of the important public attributes A'with which it occurs.
For example, suppose the set F contained the important attributes sex and age, the non-important attributes height and weight and the private attribute salary.
Furtherrnore, suppose the set F contains the following tuples before this step:
sex=M, age=35, weight=180, height=6', salary=10%
sex=M, age=35, weight=175, height=5', salary=15%
sex=M, age=35, weight=180, height=6', salary=25%
sex=M, age=35, weight=180, height=6', salary=15%
sex=M, age=35, weight=175, height=6', salary=15%
sex=M, age=35, weight=180, height=5', salary=10%
sex=M, age=35, weight=175, height=5', salary=10%
sex=F, age=35, weight=120, height=6', salary=10%
sex=F, ~ge-35, weight=120, height=6', salary=15%
sex=F, ~ge-35, weight=120, height=5', salary=25%
sex=F, age=30, weight=110, height=5', salary=10%
sex=F, age=30, weight=110, height=5', salary=15%
sex=F, age=30, weight=120, height=6', salary=15%
2 5 sex=F, age=30, weight=110, height=5', salary=25%
The distinct vectors V"' of important public attribute values A' are <sex=F, age=35>, <sex=F, age=30> and <sex=M, age=35>. The vectors U occurring with V"'_<sex=F, age=35> are <weight=120, height=6'>, <wei~lll=120, height=5'>. The vectors U occurring with V"'-<sex=F, age=30> are <weight=110, height=5'> and <weight=120, height=6'>. The vectors Uoccurring with V"'_<sex=M, age=35> are ...., ..~.....
CA 022244~7 l997-l2-ll W O 96142059 PCTAUS~ 03 <weight=180, height=6'>, <weight=175, height=6'>, cweight=175, height=5'> and <weight=180, height=5'>. The combined tuples are as follows:
sex=M, age=35, <weight=180,175>, <height=6',5'>, salary=10%
sex=M, age=35, <weight=180,175>, <height=6',5'>, salary=15%
sex=M, age=35, <weight=180,175>, <height=6',5'>, salary=25%
sex=M, age=35, <weight=180,175>, <height=6',5'>, salary=15%
sex=M, age=35, <weight=180,175>, <height=6',5'>, salary=15%
sex=M, age=35, <weight=180,175>, <height=6',5'>, salary=10%
sex=M, age-35, <weight=180,175>, <height=6',5'>, salary=10%
sex=F, age=35, <weight=120,110>, <height=6',5'>, salary=10%
sex=F, age=35, cweight=120,110~, <height=6',5'>, salary=15%
sex=F, age=35, <weight=120,110>, <height=6',5'>, salary=25%
sex=F, age=3~, <weight=120,110>, <height=6',5'>, salary=10%
sex=F, age=30, <weight=120,110>, <height=6',5'>, salary=15%
sex=F, age=30, ~weight=120,110>, <height=6',5'>, salary=15%
sex=F, age=30, <weight=120,110>, <height=6',5'>, salary=25%
Note, that in the above process, where the public attributes are partitioned into important public attributes and non-important public attributes, only the important public attributes are checked to detemmine if they might require camouflaging. The non-important public attributes are simply combined as set outin step 224. As mentioned above, the advertisers.illustratively specify which of the public attributes A are important public attributes A' and which are non-important public attributes A". This is significant because the partitioning of the publicattributes into important and non-important govems which public attributes are checked to determine if they require camouflaging and which public attributes are simply combined in step 224.
After executing steps 202-224, the processor 155 can store the tuples of the set F as the new demographics relational d~tAhA.se Illustratively, the processor155 discards, i.e., does not execute queries against, the tuples of the set R.
Queries may then be executed against the new demographics relational database.
However, the advertisers must be cognizant of the existence of combined values CA 022244~7 1997-12-11 and should refer to the combined public attribute values in formulating the profile queries.
Alternatively, instead of constructing a new demographics relational cl~tAh~.se, the processor 155 maintains a record in the memory 160 indicating the 5 partitioning of the attribute values. Consider the above dA~tAhAse discussed in connection with step 224. The following are examples of partitions resulting from steps 202-224:
(1) for sex=F, age=35, the tuples:
sex=F, age=35, cweight=120,110>, <height=6',5'>, salary=10%
sex=F, age=35, <weight=120,110>, <height=6',5'>, salary=15%
sex=F, age=35, <weight=120,110>, <height=6',5'>, salary=25%
(2) for sex=F, age=30, the tuples:
sex=F, age=30, <weight=120,110>, <height=6',5'>, salary=10%
sex=F, age=30, <weight=120,110>, <height=6',5'>, salary=15%
sex=F, age=30, <weight=120,110>, <height=6',5'>, salary=15%
sex=F, age=30, <weight=120,110>, <height=6',5'>, salary=25%
The processor 155 maintains a record containing indications of the partitions.
However, if this is done, the processor 155 must perform some post processing to ensure that no profile queries violate the partition. That is, queries which identify all tuples within a partition do not violate the partition. However, queries which attempt to identify only some of the tuples within a partition violate the partition. More formally stated, a query is said to violate a partition if the following occurs. Suppose there are two tuples, represented as ~AtAhAse row vectors T1=<Afvl~ A~=vk~ An~vm> and T2=<A1=u,~ ~Ak=uk~ ~A~um>~ wherein both tuples Tl and T2 are in the same partition. That is, for each important attribute A"...,Ak, v1=ul, v2=u2,..., and vk=uk. A query violates the partition if it has criteria directed to both public and private attributes and if the query is satisfied by the tuple T, but not by the tuple T2. To determine if a profile query violates the partition, the processor 155 can execute the profile query against the demographics relational 30 dAtAhA.~e. The processor 155 can then compare the tuples identified by the profile query to the non-identified tuples of the demographics relational ~iAtAhA~e to CA 022244~7 1997-12-11 W O 96/42059 PCTAJS96~'09703 detemmine if a non-identified tuple T2 and an identified tuple T1 exists for which the corresponding attribute values are in the same partitions as described above.
If a profile query violates the partition, the processor 155 can outright rejectthe profile query. Altematively, the processor 155 modifies the set of identified tuples by also identifying, i.e., including the tuples T2 which were not initially identified by the query, to remove the partition violation. However, if such modifications are performed, the processor 155 should notify the advertiser of the modification and its nature. Illustratively, the processor 155 achieves this by describing the contents of the partitions of the attributes specified in the advertiser's query. For example, the processor 155 can transmit a message regarding the modifications to the advertiser.
In short, a system and method are disclosed for protecting a database against deduction of confidential attribute values therein. A memory is provided for storing the d~t~h~-ce and a processor is provided for processing the database.
Using the processor, the dAtAhAce is electronically partitioned into public attributes, containing non-confidential attribute values, and private attributes, containingprivate attribute values. The processor is then used to electronically process the private attribute values to reduce any high correlation between public attributevalues and private attribute values. Specifically, the processor can partition the d~tAhAse into safe tuples and unsafe tuples, such that each unsafe tuple is a member of a group:
(1 ) identified by a vector of attribute values (i.e., each tuple of the group has public attribute values matching the vector), and (2) which group has a level of uncertainty as to at least one value of a 2 5 private attribute that is less than a threshold level of uncertainty.
The processor can then selectively combine the public attribute values of the tuples to camouflage such tuples from deduction of their private attribute values beyond a threshold level of uncertainty or remove such tuples from the dAt~h~.se. This isachieved by:
CA 022244~7 1997-12-11 W O 96/42059 PCT~US96/09703 (1) identifying all tuples containing particular attribute values for a selected public attribute, which particular values are contained by at least one tuple with a highly correlative public attribute value, (2) identifying groups of tuples corresponding to, i.e., containing public attribute values that match, distinct vectors of values for the public attributes other than the selected public attribute, (3) combining values of the selected public attribute of each group if there is at least a threshold level of uncertainty for each private attribute value in the group, and (4) removing unsafe tuples for which no combination can be performed to camouflage the unsafe tuples.
Finally, the above discussion is intended to be merely illustrative of the invention. Numerous alternative embodiments may be devised by those having ordinary skill in the art without departing from the spirit and scope of the following 1 5 claims.
of FIG 2 contains data pertaining to a population group. The relation Y has six CA 022244~7 l997-l2-ll W O 96/42059 PCT~US96/09703 attributes or columns 2-1,2-2,2-3,2-4,2-5 and 2-6, for storing, respectively, name, age, weight, height, social security number and telephone extension data values of the population. The d~t~h~ce also has twelve records or tuples 3-1, 3-2, 3-3,.....
3-12. Each tuple 3-1,3-2,3-3,....,3-12 has one data value from each attribute. For instance, the tuple 3-10 has the name attribute value "lee", the age attribute value 40, the weight attribute value 171, the height attribute value 180, the social security number attribute value 99~ ~8 7654 and the telephone extension attribute value 0123.
To identify the targeted customers for an advertisement, a profile containing queries is executed against the ~l~t~hAse. A query is used to identify tuples which meet criteria of interest from the d~t~h~ce A query usually includes a predicatewhich specifies the criteria of interest. For instance, the following query executed against the relation Y:
Select from A where Y.Age < 15 OR Y.Age ~ 50 includes the predicate "where Y.Age < 15 OR Y.Age > 50" which specifies that only those tuples having an Age attribute value less than 15 or greater than 50 are to be identified. The advertiser can thus construct a profile for execution against the relational d~t~h~ce to identify the targeted audience of customers.
The problem with i",~ 'e,ne"li,)g such a targeted advertising scheme is that customers may be reluctant to wholesale disclose the necessary demographic data for constructing the relational d~t~h~se. In particular, customers may be concerned about:
(1 ) direct release of raw infommation about an individual customer, (2) deduction of non-released infommation of an individual customer from infomlation regarding the identity of the customers who match a given profile, and (3) deduction of non-released information of a specific individual customer from knowledge of a series of profiles, together with the number of individual customers that received or would receive the 3 0 adve, lise~ "e~ ItS corresponding to those profiles.
CA 022244~7 1997-12-11 W O 96/420S9 PCTAUS~ 7~3 The first two threats to privacy can be overcome by modifying the communications network in a fashion similar as has been done for protecting anonymity of customers who retrieve video in Hardt-Komacki & Yacobi, Securing End-User Prlvacy During Inforrna~ion Filtering, PROC. OF THE CONF. ON HIGH PERF.INFO. FILTERING, 1991. Such a modified network is shown in FIG 3. As shown, the communications network 50 interconnects sources (advertisers) 61, 62 and destinations (customers) 71, 72, 73 and 74 similar to the network 10 of FIG 1.
However, a filter station 80 and name translator station 90 are also provided which are connected to the communications network 50. Illustratively, the filter station 80 has a memory 82 for maintaining the cl~t~h~.se of customer demographic data.
Furthermore, the filter station 80 has a processor 84 which can execute queries against the demographics d~t~h~.se stored in the memoN 82. Each source, such as the source 62, has a seNer 64 and a memoN 66. The seNer 64 of the source 62 transmits one or more profiles (containing queries for identifying particular target audiences) to the processor 84 of the filter station 80. The processor 84 executes each profile query against the relational ~i~t~h~se stored in the memoN, 82 to retrieve the aliases assigned to each customer identified by each query. The processor 84 then transmits the corresponding aliases for each profile back to the seNer 64 of the source 62 which may be stored in the memory 66 for later use.
When the advertiser-source 62 desires to transmit the advertisement to the targeted customer destinations, e.g., the destinations 72 and 74, the seNer 64 transmits the advertisement and the aliases into the network 50. The network 50 delivers the advertisement and aliases to the processor 92 of the name translator station 90. The processor 92 then translates the aliases to their corresponding network addresses, for example, using infommation stored in a memory 94. The processor 92 of the name translator station 90 then transmits the advertisement to the customer destinations 72, 74 using the network addresses.
In the modified communications system, the customer-destination, e.g., the destination 72, knows its own demographic infommation. The advertiser-source, 3 o e.g., the source 62, knows its advertisement, its profiles and how many customers will receive the advertisement. The advertiser only receives aliases for the W O 96/420~9 PCT/USr.'~370 individual cust~" ~,~ 71-74. Thus, the advertiser does not posses the raw de"loy,aphic infor~ tion and is not given info",lalion for identifying the cusLom~rs 71-74 (such as the network acJ-Jlasses). The filter station 80 con~ains intur~,laLion regarding the entire clellloyraphics dA~ A~e and receives the profiles suiJmill~d by 5 the advellisers. The name translator station 90 contains only the translations of aliases to network addresses and rece,ves the aliases and advertisements. The network 50 only receives the ad~ellisemenl and network addlesses of the destinations.
Despite such protections, the advertiser still obtains some results of the 10 execution of the queries of the profiles against the demographics d~t~h~e, such as the number of cusLull~l which match each profiie. This may be sufficient inFol~,lalion to deduce personal inror",alion of the cu~lù"~er. For example, s~lppose the advertiser knows the identities of 100 customers in the zip code 07090 who collect stamps. Furthermore, sl Inpose the advertiser submits a profile for targeting all cu~Lomers in zip code 07090 who collect starnps and who have an annual income of $50,000-$100,000. If 100 aliases are returned to the advertiser, then the advertiser successfully deduces the salary range of all 100 stamp collectors.
The above threat, wherein query results can lead to deducing private il l~ur~l laLion, is referred to as a "tracker attack." Stated more generally, a "tracker" is 2 0 a special case of a linear system which involves solving the equaffon:
HX=Q (1) where:H is a matrix which represer)t~ tuples that satisfy co,~esponding queries,where each column j represents a diFFerenl tuple, each row i represen~ a diFFererll query and where each matrix element hj = 1 if 2 5 the j~ tuple ~tisfics the predicate Cj of the i~ query and 0 otherwise, C is a vector representi, l~ the pre~ t~s used in each i~ query, X is a vector represenling the (unknown) tuples which satisfy the predicates C (to be solved by equation (1)), and Q is a vector of counts or other results returned by each i~ query containing elements q; where each qj is the sum (or other result SUBS~EIIJ~tSHt~l ~RUIE26) CA 022244~7 1997-12-11 W O 96/42059 PCTAJS~'03703 returned from the jth query) over an attribute of the tuples retrieved by the j~h query.
The prior art has proposed some solutions for protecting st~tistic~l relational databases from tracker attacks. Dobkin, Jones & Lipton, Secure D~tAh~ses:
Protection Against User Inference, ACM TRANS. ON DATABASE SYS., vol. 4, no. 1, Mar., 1979, p.97-106 proposes to restrict query set overlap, i.e., to prevent submission of multiple similar query sets, to prevent this kind of attack. However, such a control is difficult to implement because a history of all previously submitted query sets must be maintained and compared against the most recent submitted query. A "cell-suppression" technique has also been proposed wherein statistics, or other query execution results, that may reveal sensitive information are never released. However, cell-suppression techniques are best used for queries which produce two and three di"lensional tables but not for arbitrary queries which are of concem in impleme, lling targeted advertising.
Random noise techniques have been proposed wherein a random number is subtracted from the results retumed by a query. This solution is not satisfactory for imple")enlil1g targeted advertising because the result presented to the advertiser would then be inherently inaccurate. In an altemative scheme proposed in Wamer, Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias, 60 J. OF THE AM. STAT. Assoc. p.63-69 (1965), individuals may enter erroneous values into the relational li~t~h~se a certain percentage of the time. The problem with this strategy is that the advertisers would then target advertisements to the wrong audience a certain percentage of the time. Denning, Secure St~ti..li~l DAtAh~ses Under Random Sample Queries, ACM TRANS. ON DATABASE SYS., vol. 5, no. 3, Sept., 1980, p.291-315 discloses a noise technique wherein the queries are applied to only random sllhsets of the tuples rather than all of the tuples in the relational d~t~h~e. In addition to the specific disadvantages mentioned above, one or more of the above-described noise addition techniques may be subverted bya variety of noise removal methods.
3 0 Yu & Chin, A Study on the Protection of St~tistic~l D~tAh;~e5~ PROC. ACM
SIGMOD INT'L CONF. ON THE MGMT. OF DATA, p.169-181 (1977) and Chin &
CA 022244~7 l997-l2-ll W O 96/420~9 PCTAUS~J1~703 Ozsoyoglu, Security in Partitioned Dynamic S~l;.~ljCAI Databases, PROC. IEEE
COMPSAC CONF., P. 594-601 (1979) disclose methods for partitioning the relational tAhA~e into disjoint partitions.
All of the above methods were developed primarily for statistical databases and do not have properties which enable the implementalion of targeted advertising.
In particular, the above methods do not provide precise identification of tupleswhich satisfy queries or do not provide an accurate count (or other retumed query result) of such retrieved tuples. However, both of these properties are important in targeted advertising. First, it is important to accurately target all customers whose 10 demographic data matches a submitted profile. Second, it is vital to obtain an accurate count of the identified customers for purposes of billing the advertiser and for purposes of deciding whether or not the profile identified a desirable number of customers for receiving the adve, lisemenl.
It is therefore an object of the present invention to overcome the disadvantages of the prior art. It is another object of the present invention toprovide a targeted advertising method which preserves the privacy of confidential information of the customer. In particular, it is an object of the present invention to reduce the advertisers' ability to deduce confidential infommation about the customers from the results of one or more profile queries executed against a 2 o demographics relational dAt~hAse.
Summary of the ll~e.~tio.~
These and other objects are achieved according to the present invention.
According to one embodiment, the present invention can maintain the confidentiality of information in d~t~hAse for use in a communications system environment. As inthe prior art communications system, this embodiment provides a communications network which interconnects an advertiser, customers, a filter station and a name translator station. Illustratively, the filter station maintains a demographics database of ill~o""dlion regarding the customers. However, the invention can work with 3 o dAt~hA~es storing any kind of infommation and can work for both relational and non-relational dAtAhA.~es In order to obtain a target audience for an advertisement, the CA 022244~7 1997-12-11 W O 96/420~9 PCTAUS96109703 advertiser can submit one or more profiles containing queries to the filter station.
The filter station executes the profile queries against the demographics d~tAh~se in order to identify tuples corresponding to customers who match the profile of thetargeted audience. To preserve the anonymity of the customers, the filter station 5 transmits ~ es, instead of identifying infommation, for the customers identified by the profile to the advertiser. When the advertiser desires to deliver an advertisement to the target audience of customers, the advertiser transmits the advertisement and the aliases via the communications network to the name translator station. The name translator station then translates the received aliases 10 to the network addresses of the customers using its translation table and then transmits the advertisement to the customers via the communications network.
Like the conventional communications network, the communications network according to an embodiment of the present invention restricts the access of the advertisers to the demographics relational d~t~h~se and discloses aliases to the15 advertisers in lieu of the actual network addresses of the customers. This prevents:
(1) disclosure of the raw infommation in the database to the advertiser, and (2) deduction of confidential infommation from the identity of customers.
However, unlike the conventional communications system, the present invention also provides for reducing the advertiser's ability to deduce confidential information from the results retumed by the filter station in response to the profile queries submitted by the advertiser. That is, the present invention protects against tracker attacks and other kinds of confi.lenliality breeches, wherein the advertiser alle~lpts to deduce confidential information about the customers in the d~t~h~.se from, for 2 5 example, the mere number of aliases returned in response to a profile query.To achieve this protection in the present invention, the attributes are divided into two cl~.~ses, namely, public attributes, for which no confidentiality protection is provided, and private attributes, for which confidentiality protection is provided. In order to prevent an advertiser from deducing private attribute values, the dAt~h~.ce 30 is thereafter processed to reduce any high correlation between public attribute CA 022244~7 1997-12-11 W O 96/42059 PCTAJS9~'0~703 _ 9 _ values and private attribute values. A vector of one or more particular public attribute values is said to have a high correlation with a private attribute value, if:
(1) the vector of particular public attribute values identifies a group of tuples of the r~t~h~.se which have public attribute values that match the vector of public attribute values, and (2) the level of uncertainty regarding the values of the private attribute of the identified group is less than a predetermined threshold.
Stated another way, a specific vector of public attribute values of tuples may correspond to a small number of private attribute values thus reducing the uncertainty about the private attribute values when the public attribute values are known. In the worst case, the vector of public attribute values would correspond to only a single private attribute value. Thus, there might be a high level of certainty in determining the actual private attribute values of the group of tuples identified by a given vector of public attributes. Illustratively, if the number of di~linclly different private attribute values for the group identified by such a vector is less than a predetemmined threshold number of values, then the correlation of the public attributes is unacceptably high. Herein, a public attribute value with an unacceptably high correlation with one or more private attribute values is referred to as a "highly correlative public attribute value".
According to one embodiment, tuples containing public attribute values that are highly correlated with private attribute values are processed in a fashion either to camouflage the public attributes of the tuple or to remove such tuples from identification in the d~t~h~e. Tuples are "camouflaged" by combining the specific public attribute values of the tuples, that are highly correlated with one or more 2 5 specific private attribute values of the tuples, with other public attribute values of the tuples to reduce the correlation.
A method and system are therefore provided wherein attributes are classified as private or public and wherein the correlation between public and private attributes is reduced by camouflaging highly correlative public attribute values. The 3 o invention provides for introduction of an adjustable level of uncertainty in deducing CA 022244~7 l997-l2-ll 9 PCT~U53''~3703 private ir,fc,rrllalion from the results of queries executed against the demographics relational d~t~h~se.
Brief Description of the Drawing FIG 1 depicts an ordinary prior art communications network.
FIG 2 depicts a prior art demographics relational rl~t~h~.~e.
FIG 3 depicts a prior art communications network with privacy protection of customer network addresses.
FIG 4 depicts a communications network according to an embodiment of the present invention with anonymity protection of private customer infommation.
FIG 5 schemalically depicts a flowchart illustrating a method according to one embodiment of the present invention Detailed DescriPtion of the l~ tiGr.
As mentioned before, the present invention can protect the confidentiality of virtually any kind of information in both relational and non-relational d~t~h~.ses and in a variety of envirol"nents including communication networks. For purposes of simplicity and clarity, the invention is illustrated below using a communications network envi,oni,~,)l and a relational d~t~h~e containing demographics information. In the embodiment discussed below, advertisers submit queries for execution against the relational demographics database for purposes of identifying a target audience for advertising. Again, this is illustrative; the invention can also work in other applications wherein queries are submitted to achieve other goals.FIG 4 shows an illustrative communications network 100 according to the present invention. As shown, advertisers 121 and 122, customers 131, 132, 133 and 134, and a name translation station 140 are provided which are connected to the communications network 100. Furthermore, a filter station 150 is provided which is adapted according to the presel1t invention. The filter station 150 has a processor 155 and a memory 160 connected thereto.
Like the processor 84 and "le"lo~ 82 (FIG 3) of the filter station 80 (FIG 3) of the conventional filter station 80 (FIG 3), the processor 155 and memory 160 can CA 022244C,7 1997-12-11 WO 96/42059 PCT/US~G~ 703 perform various functions for preventing disclosure to the advertisers 121-122 of the raw data. The processor 155 and 160 can also perform functions for preventing deduction by the advertisers 121-122 of private infommation from the identification of customers (from their networK addresses). The processor 155 can receive de,,loyldphics information from the customers 131-134 and can construct a demographics relational dat~h~ce. The processor 155 can store the demographics relational d~t~h~se in the memory 160. The processor 155 can also receive from the advertisers 121-122, such as the advertiser 122, profiles containing queries for execution against the relational d~t~hA~e. In response, the processor 155 identifies the tuples of the relational ~l~t~h~.ce which match the profile. The processor 155 then transmits the identifier and the aliases to the advertiser 122.
The processor 155 and memory 160 of the filter station 150 are also capable of processing the demographics relational database to reduce the ability of advertisers to deduce private infommation from results retumed by the filter station 150 in response to profile queries submitted by the advertisers. In the discussions below, it is presumed that the advertisers use the number of returned aliases todedllce private information, although the discussion is general enough to apply to any result retumed in response to profile queries.
The processing of the processor 155 and memory 160 can be summarized as partitioning the ,i~t~h~.se into public attributes, for which no confidentiality protection need be provided, and private attributes, for which confidentiality protection is provided. In providing confidentiality protection, it should be noted that some of the information of the demographics relational ~t~h~.se is already assumed to be public, or otherwise not worthy of confidentiality protection. For2 5 instance, consider a frequent flyer d~t~h~se which contains the following attributes:
zip code, telephone number, occup~tion, dietary restrictions and income level. The telephone number of an individual customer may be widely published in a telephone directory. Furthermore, the occur~tion of an individual customer, while not widely published, may be considered non-confidential or non-personal. On the other hand, 3 0 other infommation such as dietary restrictions and income level may be presumed to be personal and confidential information. After partitioning the d~t~h~e, the CA 022244~7 1997-12-11 W O 96/42059 PCTAJ~3'1/~3703 correlation between public attributes and private attributes is reduced by camouflaging some highly correlative public attribute values and outright removing some tuples containing highly correlative public attribute values which are difficult to camouflage.
The processor 155 may also partition out an identification attribute from the database which uniquely identifies each tuple. Such an identification could be anetwork address, social security number, etc. Such information can only be the subject of a profile query if that query does not execute against private attributes or is merely used to update the corresponding tuple of the d~t~hA.se.
Illustratively, the public attributes are further divided into important public attributes and non-important public attributes. Advertisers are permitted to specify attribute values of important public attributes with a greater degree of certainty than non-important public attributes. Illustratively, the advertisers may specify which of the attributes are to be treated as important. The invention is illustrated below with important and non-important public attribute partitioning.
In the ~iscussion below, the vector A represents the public attributes of a specified set or group of tuples and each component <A1,...,A">, of A represents an individual public attribute vector. The vector A' represents the important public attributes of a specified set or group of tuples and each component <A;,...,A'"~ of 2 o A ', represents an individual important public attribute vector. The vector A "
represents the non-important public attributes of a specified set or group of tuples and each component <A"1,...,A"p of A" represents an individual non-important public attribute vector. The vector P represents the private attributes of a specified set or group of tuples and the components <P1~ Pq;~ represents an individual 2 5 private attribute vector. The vector K represents a vector of uncertainty thresholds for the private attributes P. Illustratively, each scaler component ki of K is athreshold count of distinctly different private attribute values in Pi. Each threshold of uncertainty ki can be fixed or dynamically adjusted by the processor 155 to adjust the level of confidenliality protection. The vectors V, V' V' V"' and U represent 3 o distinct vectors of particular scaler attribute values <v"..., vn>, <v'"...,v'j,...,v'm>, etc.
for the public attributes A, A', or A" of a single tuple. Herein, the notation A;=v,.....
CA 022244~7 1997-12-11 W O 96/42059 PCT~US3~ 3~3 A n-vn, refers to a single tuple (i.e., row of the relational dAtAhAce) for which each designated public attribute vector, e.g., A'1, takes on the corresponding, distinct, scaler attribute value, e.g., v,.
FIG 5 is a flowchart which schematically illustrates a process executed by the processor 155 and memory 160 for ensuring the confidentiality of demographicinfommation from deduction by the advertisers 121-122. In a first step 202, the processor 155 partitions the attributes of the dAtAhA.se into public attributes A1,...,An, containing non-confidential information and private attributes P~ Pq~ containingconfidential information. For example, suppose the attributes are age, height, 10 religious afliliation and salary. The attributes age and height might be designated as public attributes whereas the attributes religious affiliation and salary might be designated as private attributes.
Next, in steps 204-226, the processor 155 removes high correlations between public and private attributes of tuples in the dAtAhA~e. Stated another way, 15 consider a specific vector of particular attribute values V such that A1=v"
A2=vz~ An=vn. This vector V identifies a group of tuples which have values for public attributes Al~ An that match V. The rlA1AhAse is processed to ensure thatfor any such group of tuples identified by any vector V, there is a threshold level of uncertainty k, about the values of any jth private attribute Pj in the identified set. For 2 o example, consider a dAtAhA-se having only public attributes of age and occupation and only private attributes of salary range. The dAtAhA~e may have certain vectors of age and occupation (e.g., ~age:35, occupation: doctor>) for which there are relatively few diflerent values for salary (e.g., salary: top 5%). In processing the database, certain attribute values are combined in an all~:",pt to "camouflage"
2 5 tuples which otherwise would have easily deducible private attributes. Other tuples which cannot be camouflaged are removed.
(As ~iscussed in greater detail below, "removed" tuples can be treated in one of a number of ways. For instance, the removed tuples can be excluded from queryexecution and thus would never receive a targeted advertisement. Altematively, the 3 o "removed" tuples are not excluded from either query execution or targeted advertising. However, the processor 155 must take steps to ensure that the CA 022244~7 1997-12-11 W O 96142059 PCT~US9~'0~1~3 - 14-confidentiality of private attribute values of such removed tuples is not compromised by query execution.) In steps 204-210, the processor 155 partitions the cl~t~thA-se into a "safe" setF and an "unsafe" set R of tuples. In step 204, the processor forms each possible 5 vector of important public attribute values V' which vector V' includes one attribute value <v'"...,v'j,...,v'm> for each important public attribute A ;,. ..,A j,... ,A m For example, the following are distinct vectors which may be formed on a ~t~h~se with important public attributes age, weight and occupation and private attribute salary:
cage=53, occupation=doctor>; <age=35, occupation=doctor>; ~age=35, occupation=minister>; etc. A group of tuples corresponds to each of these vectors V'. That is, each tuple in a particular group contains the same important attribute values as the vector V' to which the group corresponds. For example, the vector <age=35, occupation= minister> might identify the tuples:
age=35, occupation=minister, salary= 70%
age=35, occup~tion=minister, salary= 70%
age=35, occupation=minister, salary= 65%
age=35, occupation=minister, salary= 35%
age=35, occupation=minister, salary= 40%
age=35, occupation=minister, salary= 40%
age=35, occupation=minister, salary= 15%
In step 206, for each group thus formed, the processor 155 compares the number of distinct attribute values in each jth private attribute Pi of the group to the corresponding uncertainty threshold ~. If there are at least ~ distinct private attribute values in the group for each jth private attribute Pj, the processor 155 adds the group of tuples to the set F in step 208. Othe~ise, the processor 155 adds the group of tuples to the set R in step 210. For example, suppose that 1< is set to 4 in the above age, occupation, salary example. In such a case, there are 5 distinct values for the private attribute salary, namely, 70%, 65%, 40%, 35% and 15%.
Thus, all of these tuples may be added to the set F. On the other hand, suppose another group of tuples was identified for the vector <age=35 occupation=doctor>as follows:
CA 022244~7 1997-12-11 W O 96/42059 PCT~US9~/05703 age=35, occupation=doctor, salary=5%
age=35, occupation=doctor, salary=5%
age=35, occupation=doctor, salary=10%
age=35, occupation=doctor, salary=10%
age=35, occupation=doctor, salary=5%
age=35, occup~tion=doctor, salary=10%
age=35, occupation=doctor, salary=5%
age=35, occupation=doctor, salary=15%
age=35, occupation=doctor, salary=5%
age=35, occupation=doctor, salary=5%
age=35, occupation=doctor, salary=15%
This group has only 3 distinct salary attribute values, namely, 5%, 10%, and 15%.
Thus, the processor 155 would add these tuples to the set R.
Next in steps 212-222, the processor 155 combines selected important public attribute values. In step 212, the processor 155 selects an important attribute A;. Illustratively, the processor 155 selects each j~h important attribute in decreasing number of distinct attribute values over the entire r~t~hase The processor 155 then executes the steps 214-226 with the selected important public attribute A;. In step 214, the processor 155 identifies each distinct value v'; of the selected important public attribute A; in the set R. In step 216, the processor 155 then identifies each tuple in the both sets F and R having each important public attribute value v';
(identified in the set R) for the important public attribute A;. For example, suppose age is selected as the attribute A;. Then age=35 is a public attribute value that is contained by the tuples with public attribute values <age=35, occupation=doctor> in the set R. Age=35 is also a public attribute value contained by the tuples with public attribute values <age=35, occupation=minister> in the set F. Therefore, the following tuples in sets R and F are identified:
age=35, occupation=minister, salary= 70%
age=35, occupation=minister, salary= 70%
age=35, occupation=minister, salary= 65%
age=35, occupation=minister, salary= 35%
CA 022244~7 1997-12-11 W O 96/42059 PCTAJ~3~ 9703 age=35, occl Ip~tion=miniSter, salary= 40%
age=35, occupation=minister, salary= 40%
age=35, occupation=minister, salary= 15%
age=35, occupation=doctor, salary=5%
age=35, occupation=doctor, salary=5%
age=35, occupation=doctor, salary=10%
age=35, occupation=doctor, salary=10%
age=35, occupation=doctor, salary=5%
age=35, occupation=doctor, salary=10%
age=35, occupation=doctor, salary=5%
age=35, occupation=doctor, salary=15%
age=35, occupation=doctor, salary=5%
age=35, occupation=doctor, salary=5%
age=35, occupation=doctor, salary=15%
Next in step 218, the processor identifies each distinct vector V" in the identified tuples of sets F and R where the vector V" includes important public attribute values v""...,v"j."v"j+"...,v"m on the important public attributes A',.....
A j.1,A j",...,A m other than A j. A group of the tuples which were identified in the sets R and F corresponds to each distinct vector V" That is, each tuple in a particular group has the attribute values of the particular attribute value vector V" to which the group corresponds. Such tuples are identified by the processor 155 in step 218.
For example, suppose the public attributes are age, weight and height and the private attribute is salary. Suppose the values v'j=35 and v'j=53 identify the following tuples:
age=35, weight=150, height=6', salary= 5%
age=53, weight=150, height=6', salary= 10%
age=35, weight=160, height=6', salary= 10%
age=53, weight=160, height=5.5', salary= 15%
age=35, weight=150, height=5.5', salary= 5%
age=53, weight=150, height=5.5', salary= 10%
age=35, weight=150, height=5.5', salary= 15%
CA 022244~7 l997-l2-ll W O 96/42059 PCT~US96/09703 age=53, weight=160, height=6', salary= 20%
The vectors V" are: <weight=150, height=6'~; <weight=160, height=6'>, <weight=150, height=5.5'> and <weight=160, height=5.5'>. The identified groups are as follows:
weight=150. height=6' age=35, weight=150, height=6', salary= 5%
age=53, weight=150, height=6', salary= 10%
weight=160. height=6' age=35, weight=160, height=6', salary= 10%
age=53, weight=160, height=6', salary= 20%
weight=160, height=5.5' age=53, weight=160, height=5.5', salary= 15%
weight=150, height=5.5' age=35, weight=150, height=5.5', salary= 5%
age=53, weight=150, height=5.5', salary= 10%
age=35, weight=150, height=5.5', salary= 15%
Next, in step 220, if there are at least ki distinct private attributes values in a group for each jth private attribute Pi, the processor 155 combines all of the values in the group for the important public attribute A,. Illustratively, each value v'; may only be combined once. For example, suppose k=3 for salary. Then the group corresponding to vector V"-<weight=150, height=5.5'> satisfies the threshold of uncertainty. The age attribute values are therefore combined to produce the tuples:
age={35,53}, weight=150, height=5.5', salary= 5%
age={35,53}, weight=150, height=5.5', salary= 10%
age={35,53}, weight=150, height=5.5', salary= 15%
In step 222, the processor 155 substitutes a representative public attribute value for each combination. Continuing with our example, the representative value may be the first public attribute value v'j selected, i.e., age=35, to produce the tuples:
3 0 age=35, weight=150, height=5.5', salary= 5%
age=35, weight=150, height=5.5', salary= 10%
CA 022244~7 1997-12-11 W O 96142059 PCTAJSg'1~703 age=35, weight=150, height=5.5', salary= 15%
In step 224, the processor 155 identifies each distinct vector V"' of the important public attributes A' in the set F. In step 226, the processor 155 alsoidentifies each vector U of non-important public attribute values, i.e., the values 5 ul,,ut such that A",=u" A'2=u2,,A"Fut, which occur with each distinct attribute value vector V"'of the important public attributes A'. In step 226, the processor 155 combines each vector U of non-important public attribute values with the distinct attribute value vector V"'of the important public attributes A'with which it occurs.
For example, suppose the set F contained the important attributes sex and age, the non-important attributes height and weight and the private attribute salary.
Furtherrnore, suppose the set F contains the following tuples before this step:
sex=M, age=35, weight=180, height=6', salary=10%
sex=M, age=35, weight=175, height=5', salary=15%
sex=M, age=35, weight=180, height=6', salary=25%
sex=M, age=35, weight=180, height=6', salary=15%
sex=M, age=35, weight=175, height=6', salary=15%
sex=M, age=35, weight=180, height=5', salary=10%
sex=M, age=35, weight=175, height=5', salary=10%
sex=F, age=35, weight=120, height=6', salary=10%
sex=F, ~ge-35, weight=120, height=6', salary=15%
sex=F, ~ge-35, weight=120, height=5', salary=25%
sex=F, age=30, weight=110, height=5', salary=10%
sex=F, age=30, weight=110, height=5', salary=15%
sex=F, age=30, weight=120, height=6', salary=15%
2 5 sex=F, age=30, weight=110, height=5', salary=25%
The distinct vectors V"' of important public attribute values A' are <sex=F, age=35>, <sex=F, age=30> and <sex=M, age=35>. The vectors U occurring with V"'_<sex=F, age=35> are <weight=120, height=6'>, <wei~lll=120, height=5'>. The vectors U occurring with V"'-<sex=F, age=30> are <weight=110, height=5'> and <weight=120, height=6'>. The vectors Uoccurring with V"'_<sex=M, age=35> are ...., ..~.....
CA 022244~7 l997-l2-ll W O 96142059 PCTAUS~ 03 <weight=180, height=6'>, <weight=175, height=6'>, cweight=175, height=5'> and <weight=180, height=5'>. The combined tuples are as follows:
sex=M, age=35, <weight=180,175>, <height=6',5'>, salary=10%
sex=M, age=35, <weight=180,175>, <height=6',5'>, salary=15%
sex=M, age=35, <weight=180,175>, <height=6',5'>, salary=25%
sex=M, age=35, <weight=180,175>, <height=6',5'>, salary=15%
sex=M, age=35, <weight=180,175>, <height=6',5'>, salary=15%
sex=M, age=35, <weight=180,175>, <height=6',5'>, salary=10%
sex=M, age-35, <weight=180,175>, <height=6',5'>, salary=10%
sex=F, age=35, <weight=120,110>, <height=6',5'>, salary=10%
sex=F, age=35, cweight=120,110~, <height=6',5'>, salary=15%
sex=F, age=35, <weight=120,110>, <height=6',5'>, salary=25%
sex=F, age=3~, <weight=120,110>, <height=6',5'>, salary=10%
sex=F, age=30, <weight=120,110>, <height=6',5'>, salary=15%
sex=F, age=30, ~weight=120,110>, <height=6',5'>, salary=15%
sex=F, age=30, <weight=120,110>, <height=6',5'>, salary=25%
Note, that in the above process, where the public attributes are partitioned into important public attributes and non-important public attributes, only the important public attributes are checked to detemmine if they might require camouflaging. The non-important public attributes are simply combined as set outin step 224. As mentioned above, the advertisers.illustratively specify which of the public attributes A are important public attributes A' and which are non-important public attributes A". This is significant because the partitioning of the publicattributes into important and non-important govems which public attributes are checked to determine if they require camouflaging and which public attributes are simply combined in step 224.
After executing steps 202-224, the processor 155 can store the tuples of the set F as the new demographics relational d~tAhA.se Illustratively, the processor155 discards, i.e., does not execute queries against, the tuples of the set R.
Queries may then be executed against the new demographics relational database.
However, the advertisers must be cognizant of the existence of combined values CA 022244~7 1997-12-11 and should refer to the combined public attribute values in formulating the profile queries.
Alternatively, instead of constructing a new demographics relational cl~tAh~.se, the processor 155 maintains a record in the memory 160 indicating the 5 partitioning of the attribute values. Consider the above dA~tAhAse discussed in connection with step 224. The following are examples of partitions resulting from steps 202-224:
(1) for sex=F, age=35, the tuples:
sex=F, age=35, cweight=120,110>, <height=6',5'>, salary=10%
sex=F, age=35, <weight=120,110>, <height=6',5'>, salary=15%
sex=F, age=35, <weight=120,110>, <height=6',5'>, salary=25%
(2) for sex=F, age=30, the tuples:
sex=F, age=30, <weight=120,110>, <height=6',5'>, salary=10%
sex=F, age=30, <weight=120,110>, <height=6',5'>, salary=15%
sex=F, age=30, <weight=120,110>, <height=6',5'>, salary=15%
sex=F, age=30, <weight=120,110>, <height=6',5'>, salary=25%
The processor 155 maintains a record containing indications of the partitions.
However, if this is done, the processor 155 must perform some post processing to ensure that no profile queries violate the partition. That is, queries which identify all tuples within a partition do not violate the partition. However, queries which attempt to identify only some of the tuples within a partition violate the partition. More formally stated, a query is said to violate a partition if the following occurs. Suppose there are two tuples, represented as ~AtAhAse row vectors T1=<Afvl~ A~=vk~ An~vm> and T2=<A1=u,~ ~Ak=uk~ ~A~um>~ wherein both tuples Tl and T2 are in the same partition. That is, for each important attribute A"...,Ak, v1=ul, v2=u2,..., and vk=uk. A query violates the partition if it has criteria directed to both public and private attributes and if the query is satisfied by the tuple T, but not by the tuple T2. To determine if a profile query violates the partition, the processor 155 can execute the profile query against the demographics relational 30 dAtAhA.~e. The processor 155 can then compare the tuples identified by the profile query to the non-identified tuples of the demographics relational ~iAtAhA~e to CA 022244~7 1997-12-11 W O 96/42059 PCTAJS96~'09703 detemmine if a non-identified tuple T2 and an identified tuple T1 exists for which the corresponding attribute values are in the same partitions as described above.
If a profile query violates the partition, the processor 155 can outright rejectthe profile query. Altematively, the processor 155 modifies the set of identified tuples by also identifying, i.e., including the tuples T2 which were not initially identified by the query, to remove the partition violation. However, if such modifications are performed, the processor 155 should notify the advertiser of the modification and its nature. Illustratively, the processor 155 achieves this by describing the contents of the partitions of the attributes specified in the advertiser's query. For example, the processor 155 can transmit a message regarding the modifications to the advertiser.
In short, a system and method are disclosed for protecting a database against deduction of confidential attribute values therein. A memory is provided for storing the d~t~h~-ce and a processor is provided for processing the database.
Using the processor, the dAtAhAce is electronically partitioned into public attributes, containing non-confidential attribute values, and private attributes, containingprivate attribute values. The processor is then used to electronically process the private attribute values to reduce any high correlation between public attributevalues and private attribute values. Specifically, the processor can partition the d~tAhAse into safe tuples and unsafe tuples, such that each unsafe tuple is a member of a group:
(1 ) identified by a vector of attribute values (i.e., each tuple of the group has public attribute values matching the vector), and (2) which group has a level of uncertainty as to at least one value of a 2 5 private attribute that is less than a threshold level of uncertainty.
The processor can then selectively combine the public attribute values of the tuples to camouflage such tuples from deduction of their private attribute values beyond a threshold level of uncertainty or remove such tuples from the dAt~h~.se. This isachieved by:
CA 022244~7 1997-12-11 W O 96/42059 PCT~US96/09703 (1) identifying all tuples containing particular attribute values for a selected public attribute, which particular values are contained by at least one tuple with a highly correlative public attribute value, (2) identifying groups of tuples corresponding to, i.e., containing public attribute values that match, distinct vectors of values for the public attributes other than the selected public attribute, (3) combining values of the selected public attribute of each group if there is at least a threshold level of uncertainty for each private attribute value in the group, and (4) removing unsafe tuples for which no combination can be performed to camouflage the unsafe tuples.
Finally, the above discussion is intended to be merely illustrative of the invention. Numerous alternative embodiments may be devised by those having ordinary skill in the art without departing from the spirit and scope of the following 1 5 claims.
Claims (47)
1. A method for protecting a database against deduction of confidential attribute values therein comprising the steps of:
using a processor, electronically partitioning said database into public attributes, containing public attribute values, and private attributes containing private attribute values, and using a processor, electronically processing said values to reduce any high correlation between public attribute values and private attribute values.
using a processor, electronically partitioning said database into public attributes, containing public attribute values, and private attributes containing private attribute values, and using a processor, electronically processing said values to reduce any high correlation between public attribute values and private attribute values.
2. The method of claim 1 wherein said step of processing further comprises the step of:
using said processor, electronically partitioning tuples of said database into a safe set and an unsafe set.
using said processor, electronically partitioning tuples of said database into a safe set and an unsafe set.
3. The method of claim 2 wherein said step of processing further comprises the step of:
using said processor, electronically combining a plurality of public attribute values of tuples in said safe and unsafe sets.
using said processor, electronically combining a plurality of public attribute values of tuples in said safe and unsafe sets.
4. The method of claim 2 wherein tuples are partitioned into said unsafe set if:
a vector of attribute values exists which identifies a group of tuples having said vector of attribute values for corresponding public attributes thereof wherein a level of uncertainty as to a value of at least one of said private attributes of said group is less than a threshold level of uncertainty.
a vector of attribute values exists which identifies a group of tuples having said vector of attribute values for corresponding public attributes thereof wherein a level of uncertainty as to a value of at least one of said private attributes of said group is less than a threshold level of uncertainty.
5. The method of claim 4 wherein said level of uncertainty as to a value of a private attribute of said group is less than said threshold level of uncertainty if said group contains fewer distinct ones of said values of said one private attribute than a threshold number.
6. The method of claim 2 wherein said public attribute values are further partitioned into important public attribute values and non-important public attribute values and wherein said tuples are partitioned into said unsafe set if:
a vector of attribute values exists which identifies a group of tuples having said vector of attribute values for corresponding important public attributes thereof wherein a level of uncertainty as to a value of at least one of said private attributes of said group is less than a threshold level of uncertainty.
a vector of attribute values exists which identifies a group of tuples having said vector of attribute values for corresponding important public attributes thereof wherein a level of uncertainty as to a value of at least one of said private attributes of said group is less than a threshold level of uncertainty.
7. The method of claim 2 wherein said step of partitioning said tuples into safe and unsafe sets further comprises the steps of:
using said processor, electronically forming different possible vectors of public attribute values for said public attributes, and using said processor, for each group of tuples identified by said vectors of public attribute values, electronically partitioning said tuples of said group into said safe set if there is at least a threshold level of uncertainty for private attribute values in said group and partitioning said tuples of said group into said unsafe set otherwise.
using said processor, electronically forming different possible vectors of public attribute values for said public attributes, and using said processor, for each group of tuples identified by said vectors of public attribute values, electronically partitioning said tuples of said group into said safe set if there is at least a threshold level of uncertainty for private attribute values in said group and partitioning said tuples of said group into said unsafe set otherwise.
8. The method of claim 7 wherein each possible vector is formed in said step of forming.
9. The method of claim 7 wherein said vectors contain only important public attribute values.
10. The method of claim 1 wherein said step of processing further comprises the step of:
using said processor, electronically combining a plurality of public attribute values of tuples so as to prevent deduction, beyond a threshold level of uncertainty, of private attribute values of said tuples.
using said processor, electronically combining a plurality of public attribute values of tuples so as to prevent deduction, beyond a threshold level of uncertainty, of private attribute values of said tuples.
11. The method of claim 10 wherein only important public attribute values are combined in said step of combining.
12. The method of claim 10 further comprising the steps of:
using said processor, electronically identifying all tuples containing particular values for a selected public attribute, which particular values are contained by at least one tuple with a highly correlative public attribute value, using said processor, electronically identifying distinct vectors having a particular value for each public attribute other than said selected public attribute, and electronically identifying a group of tuples for each one of said distinct vectors, wherein each tuple of said identified group has said distinct vector of values for public attributes thereof, other than said particular public attribute, and using said processor, electronically combining values of said selected public attribute of one of said groups corresponding to one of said distinct vectors if there is at least a threshold level of uncertainty for each private attribute value in said group corresponding to said distinct vector.
using said processor, electronically identifying all tuples containing particular values for a selected public attribute, which particular values are contained by at least one tuple with a highly correlative public attribute value, using said processor, electronically identifying distinct vectors having a particular value for each public attribute other than said selected public attribute, and electronically identifying a group of tuples for each one of said distinct vectors, wherein each tuple of said identified group has said distinct vector of values for public attributes thereof, other than said particular public attribute, and using said processor, electronically combining values of said selected public attribute of one of said groups corresponding to one of said distinct vectors if there is at least a threshold level of uncertainty for each private attribute value in said group corresponding to said distinct vector.
13. The method of claim 12 wherein each possible distinct vector is identified in said step of identifying.
14. The method of claim 12 wherein each at least one tuple with a highly correlative attribute value is a member of a group of tuples which satisfies:
a vector of attribute values exists which identifies said group of tuples having said vector of attribute values for corresponding public attributes thereof wherein a level of uncertainty as to a value of a private attribute of said group is less than a threshold level of uncertainty.
a vector of attribute values exists which identifies said group of tuples having said vector of attribute values for corresponding public attributes thereof wherein a level of uncertainty as to a value of a private attribute of said group is less than a threshold level of uncertainty.
15. The method of claim 10 further comprising the step of:
using said processor, electronically substituting a representative value for said combined public attribute values.
using said processor, electronically substituting a representative value for said combined public attribute values.
16. The method of claim 10 wherein said public attributes are divided into important public attributes and non-important public attributes, wherein said step of combining is performed only on said important public attribute values and wherein said method further comprises the steps of:
using said processor, electronically identifying each distinct vector of important public attribute values, and using said processor, electronically combining each distinct vector of non-important public attribute values which occur with each of said distinct vectors of important public attribute values.
using said processor, electronically identifying each distinct vector of important public attribute values, and using said processor, electronically combining each distinct vector of non-important public attribute values which occur with each of said distinct vectors of important public attribute values.
17. The method of claim 1 further comprising the steps of:
using said processor, electronically storing in a memory a database resulting from said steps of partitioning and processing, using said processor, electronically receiving a profile query from an advertiser, and using said processor, electronically executing said profile query against said database stored in said memory.
using said processor, electronically storing in a memory a database resulting from said steps of partitioning and processing, using said processor, electronically receiving a profile query from an advertiser, and using said processor, electronically executing said profile query against said database stored in said memory.
18. The method of claim 1 further comprising the steps of:
prior to said steps of partitioning and processing, using said processor, electronically storing a database in a memory, and after said steps of partitioning and processing, using said processor, electronically storing indications of modifications to said database stored in said memory which modifications result from said steps of partitioning and processing, using said processor, electronically receiving a profile query from an advertiser, using said processor, electronically executing said profile query against said database stored in said memory, and using said processor, electronically rejecting said query if said query violates a partition of said database, which partition is indicated by said indications stored in said memory.
prior to said steps of partitioning and processing, using said processor, electronically storing a database in a memory, and after said steps of partitioning and processing, using said processor, electronically storing indications of modifications to said database stored in said memory which modifications result from said steps of partitioning and processing, using said processor, electronically receiving a profile query from an advertiser, using said processor, electronically executing said profile query against said database stored in said memory, and using said processor, electronically rejecting said query if said query violates a partition of said database, which partition is indicated by said indications stored in said memory.
19. The method of claim 18 wherein said query violates said partition if:
said indications indicate that said database includes first and second tuples in the same partition, and said profile query specifies criteria directed to both public and private attributes and said query is satisfied by said first tuple but not said second tuple.
said indications indicate that said database includes first and second tuples in the same partition, and said profile query specifies criteria directed to both public and private attributes and said query is satisfied by said first tuple but not said second tuple.
20. The method of claim 1 further comprising the steps of:
prior to said steps of partitioning and processing, using said processor, electronically storing a database in a memory, and after said steps of partitioning and processing, using said processor, electronically storing indications of modifications to said database stored in said memory which modifications result from said steps of partitioning and processing, using said processor, electronically receiving a profile query from an advertiser, using said processor, electronically executing said profile query against said database stored in said memory, and using said processor, if said profile query violates a partition of said database, which partition is indicated by said indications stored in said memory, then identifying tuples of said database including those tuples which said query failed to identify and which violate said partition of said database.
prior to said steps of partitioning and processing, using said processor, electronically storing a database in a memory, and after said steps of partitioning and processing, using said processor, electronically storing indications of modifications to said database stored in said memory which modifications result from said steps of partitioning and processing, using said processor, electronically receiving a profile query from an advertiser, using said processor, electronically executing said profile query against said database stored in said memory, and using said processor, if said profile query violates a partition of said database, which partition is indicated by said indications stored in said memory, then identifying tuples of said database including those tuples which said query failed to identify and which violate said partition of said database.
21. The method of claim 1 further comprising the steps of:
after said steps of partitioning and processing, using said processor, electronically receiving a profile query from advertisers, using said processor, electronically executing said profile query against said database, and using said processor, electronically transmitting an identifier corresponding to said profile query and aliases of tuples identified by said profile query to said advertiser.
after said steps of partitioning and processing, using said processor, electronically receiving a profile query from advertisers, using said processor, electronically executing said profile query against said database, and using said processor, electronically transmitting an identifier corresponding to said profile query and aliases of tuples identified by said profile query to said advertiser.
22. The method of claim 21 further comprising the steps of:
using said processor, electronically constructing a table for translating said tuple aliases to network addresses of said tuples, and using said processor, electronically transmitting said identifier for said profile query and said table to a name translator station.
using said processor, electronically constructing a table for translating said tuple aliases to network addresses of said tuples, and using said processor, electronically transmitting said identifier for said profile query and said table to a name translator station.
23. The method of claim 22 further comprising the steps of:
transmitting an advertisement, said tuple aliases and said profile query identifier from said advertiser to a communications network, receiving said advertisement, said tuple aliases and said profile query identifier from said communications network at said name translator station, at said name translator station, translating said tuple aliases into network addresses of said tuples using said table, and transmitting said advertisement to customers via said communications network using said network addresses of said tuples.
transmitting an advertisement, said tuple aliases and said profile query identifier from said advertiser to a communications network, receiving said advertisement, said tuple aliases and said profile query identifier from said communications network at said name translator station, at said name translator station, translating said tuple aliases into network addresses of said tuples using said table, and transmitting said advertisement to customers via said communications network using said network addresses of said tuples.
24. A system for protecting a database against deduction of confidential attribute values therein comprising:
a memory for storing said database, and a processor, for electronically partitioning said database into public attributes, containing public attribute values, and private attributes containing private attribute values, and for electronically processing said values to reduce any high correlation between public attribute values and private attribute values.
a memory for storing said database, and a processor, for electronically partitioning said database into public attributes, containing public attribute values, and private attributes containing private attribute values, and for electronically processing said values to reduce any high correlation between public attribute values and private attribute values.
25. The system of claim 24 wherein said processor electronically partitions tuples of said database into a safe set and an unsafe set.
26. The system of claim 25 wherein said processor electronically combines a plurality of public attribute values of tuples in said safe and unsafe sets.
27. The system of claim 25 wherein said processor partitions said tuples into said unsafe set if:
a vector of attribute values exists which identifies a group of tuples having said vector of attribute values for corresponding public attributes thereof wherein a level of uncertainty as to a value of at least one of said private attributes of said group is less than a threshold level of uncertainty.
a vector of attribute values exists which identifies a group of tuples having said vector of attribute values for corresponding public attributes thereof wherein a level of uncertainty as to a value of at least one of said private attributes of said group is less than a threshold level of uncertainty.
28. The system of claim 27 wherein said level of uncertainty as to a value of a private attribute of said group is less than said threshold level of uncertainty if said group contains fewer distinct ones of said values of said one private attribute than a threshold number.
29. The system of claim 25 wherein said processor further partitions said public attribute values into important public attribute values and non-important public attribute values and wherein said processor partitions said tuples into said unsafe set if:
a vector of attribute values exists which identifies a group of tuples having said vector of attribute values for corresponding important public attributes thereof wherein a level of uncertainty as to a value of at least one of said public attributes of said group is less than a threshold level of uncertainty.
a vector of attribute values exists which identifies a group of tuples having said vector of attribute values for corresponding important public attributes thereof wherein a level of uncertainty as to a value of at least one of said public attributes of said group is less than a threshold level of uncertainty.
30. The system of claim 25 wherein said processor electronically forms different possible vectors of public attribute values for said public attributes, and for each group of tuples identified by said vectors of public attribute values, electronically partitions said tuples of said group into said safe set if there is at least a threshold level of uncertainty for private attribute values in said group and partitioning said tuples of said group into said unsafe set otherwise.
31. The system of claim 30 wherein said processor electronically forms each possible vector of public attribute values.
32. The system of claim 30 wherein said vectors contain only important public attribute values.
33. The system of claim 24 wherein said processor electronically combines a plurality of public attribute values of tuples so as to prevent deduction, beyond a threshold level of uncertainty, of private attribute values of said tuples.
34. The system of claim 33 wherein only important public attribute values are combined by said processor.
35. The system of claim 33 wherein said processor electronically identifies all tuples containing particular values for a selected public attribute, which particular values are contained by at least one tuple with a highly correlative public attribute value, wherein said processor electronically identifies distinct vectors having a particular value for each public attribute other than said selected public attribute, wherein said processor electronically identifies a group of tuples for each one of said distinct vectors, wherein each tuple of said identified group has said distinct vector of values for public attributes thereof, other than said particular public attribute, and wherein said processor electronically combines values of said selected public attribute of one of said groups corresponding to one of said distinct vectors if there is at least a threshold level of uncertainty for each private attribute value in said group corresponding to said distinct vector.
36. The system of claim 35 wherein said processor electronically identifies each distinct vector having a particular value for each public attribute other than said selected public attribute.
37. The system of claim 35 wherein each at least one tuple with a highly correlative attribute value is a member of a group of tuples which satisfies:
a vector of attribute values exists which identifies said group of tuples having said vector of attribute values for corresponding public attributes thereof wherein a level of uncertainty as to a value of a private attribute of said group is less than a threshold level of uncertainty.
a vector of attribute values exists which identifies said group of tuples having said vector of attribute values for corresponding public attributes thereof wherein a level of uncertainty as to a value of a private attribute of said group is less than a threshold level of uncertainty.
38. The system of claim 33 wherein said processor electronically substitutes a representative value for said combined public attribute values.
39. The system of claim 33 wherein said processor partitions said public attributes into important public attributes and non-important public attributes, wherein said processor combines only said important public attribute values, wherein said processor electronically identifies each distinct vector of important public attribute values, and wherein said processor electronically combines each distinct vector of non-important public attribute values which occur with each of said distinct vectors of important public attribute values.
40. A communications system comprising:
a filter station comprising:
a memory for storing a database, a processor for electronically partitioning said database into public attributes, containing public attribute values, and private attributes containing private attribute values, and for electronically processing said values to reduce any high correlation between public attribute values and private attribute values, and an advertiser, for transmitting a profile query to said processor of said filter station.
a filter station comprising:
a memory for storing a database, a processor for electronically partitioning said database into public attributes, containing public attribute values, and private attributes containing private attribute values, and for electronically processing said values to reduce any high correlation between public attribute values and private attribute values, and an advertiser, for transmitting a profile query to said processor of said filter station.
41. The communications system of claim 40 wherein said processor electronically stores in a memory a database resulting from said partitioning and processing of said database, and wherein said processor electronically executes said profile query against said database stored in said memory.
42. The communications system of claim 40 wherein prior to said partitioning and processing of said database, said processor, electronically stores a database in said memory, and after said steps of partitioning and processing, said processor electronically stores indications of modifications to said database stored in said memory which modifications result from said steps of partitioning and processing, said processor electronically executes said profile query against said database stored in said memory, and said processor electronically rejects said query if said query violates a partition of said database, which partition is indicated by said indications stored in said memory.
43. The communications system of claim 42 wherein said profile query violates said partition if:
said indications indicate that said database includes first and second tuples in the same partition, said profile query specifies criteria directed to both public and private attributes and said query is satisfied by said first tuple but not said second tuple.
said indications indicate that said database includes first and second tuples in the same partition, said profile query specifies criteria directed to both public and private attributes and said query is satisfied by said first tuple but not said second tuple.
44. The communications system of claim 40 wherein prior to said partitioning and processing of said database, said processor, electronically stores a database in said memory, and after said steps of partitioning and processing, said processor electronically stores indications of modifications to said database stored in said memory which modifications result from said steps of partitioning and processing, said processor electronically executes said profile query against said database stored in said memory, and, if said profile query violates a partition of said database, which partition is indicated by said indications stored in said memory, then said processor electronically identifies tuples of said database including those tuples which said query failed to identify and which violate said partition of said database.
45. The communications system of claim 40 wherein after said processor electronically partitions and processes said database, said processor electronically executes said profile query against said database, and wherein said processor electronically transmits an identifier from said profile query and aliases of tuples identified by said profile query to said advertiser.
46. The communications system of claim 45 further comprising:
a name translator station, and wherein said processor electronically constructs a table for translating said tuple aliases to network addresses of said tuples, and electronically transmits said identifier for said profile query and said table to said name translator station.
a name translator station, and wherein said processor electronically constructs a table for translating said tuple aliases to network addresses of said tuples, and electronically transmits said identifier for said profile query and said table to said name translator station.
47. The communications system of claim 46 further comprising:
a plurality of customers, each of said customers having a network address for delivery of advertisements, and a communications network interconnecting said advertiser, said processor of said filter station, said name translator station and said plurality of customers, wherein said advertiser transmits an advertisement, said tuple aliases and said profile query identifier to said communications network, and wherein said name translator station receives said advertisement, said tuple aliases and said profile query identifier from said communications network, translates said tuple aliases into network addresses of said tuples using said table, and transmits said advertisement to said particular ones of said plurality of customers via said communications network using said network addresses of said tuples.
a plurality of customers, each of said customers having a network address for delivery of advertisements, and a communications network interconnecting said advertiser, said processor of said filter station, said name translator station and said plurality of customers, wherein said advertiser transmits an advertisement, said tuple aliases and said profile query identifier to said communications network, and wherein said name translator station receives said advertisement, said tuple aliases and said profile query identifier from said communications network, translates said tuple aliases into network addresses of said tuples using said table, and transmits said advertisement to said particular ones of said plurality of customers via said communications network using said network addresses of said tuples.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US08/490,001 | 1995-06-12 | ||
US08/490,001 US5614927A (en) | 1995-01-13 | 1995-06-12 | Protecting confidential information in a database for enabling targeted advertising in a communications network |
PCT/US1996/009703 WO1996042059A1 (en) | 1995-06-12 | 1996-06-10 | Protecting confidential information in a database for enabling targeted advertising in a communications network |
Publications (2)
Publication Number | Publication Date |
---|---|
CA2224457A1 CA2224457A1 (en) | 1996-12-27 |
CA2224457C true CA2224457C (en) | 2001-05-15 |
Family
ID=23946195
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002224457A Expired - Lifetime CA2224457C (en) | 1995-06-12 | 1996-06-10 | Protecting confidential information in a database for enabling targeted advertising in a communications network |
Country Status (6)
Country | Link |
---|---|
US (1) | US5614927A (en) |
EP (1) | EP0834142A4 (en) |
AU (1) | AU697133B2 (en) |
CA (1) | CA2224457C (en) |
NZ (1) | NZ310293A (en) |
WO (1) | WO1996042059A1 (en) |
Families Citing this family (51)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5799301A (en) * | 1995-08-10 | 1998-08-25 | International Business Machines Corporation | Apparatus and method for performing adaptive similarity searching in a sequence database |
IL119444A (en) * | 1995-10-20 | 2001-10-31 | Yeda Res & Dev | Private information retrieval |
WO1997015885A1 (en) * | 1995-10-25 | 1997-05-01 | Open Market, Inc. | Managing transfers of information in a communications network |
US5805820A (en) * | 1996-07-15 | 1998-09-08 | At&T Corp. | Method and apparatus for restricting access to private information in domain name systems by redirecting query requests |
WO1998019255A1 (en) * | 1996-10-28 | 1998-05-07 | Pfu Limited | Information reception/distribution system |
US6125376A (en) * | 1997-04-10 | 2000-09-26 | At&T Corp | Method and apparatus for voice interaction over a network using parameterized interaction definitions |
IL120684A (en) | 1997-04-16 | 2009-08-03 | Handelman Doron | Entertainment system |
SE510438C2 (en) * | 1997-07-02 | 1999-05-25 | Telia Ab | Method and system for collecting and distributing information over the Internet |
US6119098A (en) * | 1997-10-14 | 2000-09-12 | Patrice D. Guyot | System and method for targeting and distributing advertisements over a distributed network |
JP4006796B2 (en) | 1997-11-17 | 2007-11-14 | 株式会社日立製作所 | Personal information management method and apparatus |
US6769019B2 (en) | 1997-12-10 | 2004-07-27 | Xavier Ferguson | Method of background downloading of information from a computer network |
US7328350B2 (en) * | 2001-03-29 | 2008-02-05 | Arcot Systems, Inc. | Method and apparatus for secure cryptographic key generation, certification and use |
US7454782B2 (en) * | 1997-12-23 | 2008-11-18 | Arcot Systems, Inc. | Method and system for camouflaging access-controlled data |
US20080034113A1 (en) | 1998-05-04 | 2008-02-07 | Frank Montero | Method of contextually determining missing components of an incomplete uniform resource locator |
US6133912A (en) * | 1998-05-04 | 2000-10-17 | Montero; Frank J. | Method of delivering information over a communication network |
US6360222B1 (en) * | 1998-05-06 | 2002-03-19 | Oracle Corporation | Method and system thereof for organizing and updating an information directory based on relationships between users |
US6327574B1 (en) | 1998-07-07 | 2001-12-04 | Encirq Corporation | Hierarchical models of consumer attributes for targeting content in a privacy-preserving manner |
EP1126392A3 (en) * | 1998-07-07 | 2001-10-17 | Encirq Corporation | Customization of electronic content based on consumer attributes |
US7246150B1 (en) | 1998-09-01 | 2007-07-17 | Bigfix, Inc. | Advice provided for offering highly targeted advice without compromising individual privacy |
US6256664B1 (en) | 1998-09-01 | 2001-07-03 | Bigfix, Inc. | Method and apparatus for computed relevance messaging |
US8914507B2 (en) * | 1998-09-01 | 2014-12-16 | International Business Machines Corporation | Advice provided for offering highly targeted advice without compromising individual privacy |
US7197534B2 (en) * | 1998-09-01 | 2007-03-27 | Big Fix, Inc. | Method and apparatus for inspecting the properties of a computer |
US6263362B1 (en) * | 1998-09-01 | 2001-07-17 | Bigfix, Inc. | Inspector for computed relevance messaging |
US6480850B1 (en) * | 1998-10-02 | 2002-11-12 | Ncr Corporation | System and method for managing data privacy in a database management system including a dependently connected privacy data mart |
US6275824B1 (en) * | 1998-10-02 | 2001-08-14 | Ncr Corporation | System and method for managing data privacy in a database management system |
US6253203B1 (en) * | 1998-10-02 | 2001-06-26 | Ncr Corporation | Privacy-enhanced database |
US7277919B1 (en) | 1999-03-19 | 2007-10-02 | Bigfix, Inc. | Relevance clause for computed relevance messaging |
US6202063B1 (en) * | 1999-05-28 | 2001-03-13 | Lucent Technologies Inc. | Methods and apparatus for generating and using safe constraint queries |
US20020026351A1 (en) * | 1999-06-30 | 2002-02-28 | Thomas E. Coleman | Method and system for delivery of targeted commercial messages |
US6732113B1 (en) | 1999-09-20 | 2004-05-04 | Verispan, L.L.C. | System and method for generating de-identified health care data |
AU7596500A (en) | 1999-09-20 | 2001-04-24 | Quintiles Transnational Corporation | System and method for analyzing de-identified health care data |
US9451310B2 (en) | 1999-09-21 | 2016-09-20 | Quantum Stream Inc. | Content distribution system and method |
CA2298194A1 (en) * | 2000-02-07 | 2001-08-07 | Profilium Inc. | Method and system for delivering and targeting advertisements over wireless networks |
US6618721B1 (en) * | 2000-04-25 | 2003-09-09 | Pharsight Corporation | Method and mechanism for data screening |
GB2366051B (en) * | 2000-05-02 | 2005-01-05 | Ibm | Method, system and program product for private data access or use based on related public data |
WO2002030037A1 (en) * | 2000-10-05 | 2002-04-11 | Ira Spector | Apparatus and method of uploading and downloading anonymous data to and from a central database by use of a key file |
WO2002042982A2 (en) * | 2000-11-27 | 2002-05-30 | Nextworth, Inc. | Anonymous transaction system |
US7603317B2 (en) * | 2001-06-19 | 2009-10-13 | International Business Machines Corporation | Using a privacy agreement framework to improve handling of personally identifiable information |
US7962962B2 (en) * | 2001-06-19 | 2011-06-14 | International Business Machines Corporation | Using an object model to improve handling of personally identifiable information |
US20020184530A1 (en) * | 2002-05-29 | 2002-12-05 | Ira Spector | Apparatus and method of uploading and downloading anonymous data to and from a central database by use of a key file |
US20060108880A1 (en) * | 2004-11-24 | 2006-05-25 | Lg Electronics Inc. | Linear compressor |
US20060293950A1 (en) * | 2005-06-28 | 2006-12-28 | Microsoft Corporation | Automatic ad placement |
US9355273B2 (en) | 2006-12-18 | 2016-05-31 | Bank Of America, N.A., As Collateral Agent | System and method for the protection and de-identification of health care data |
US8121896B1 (en) | 2007-01-05 | 2012-02-21 | Coolsoft, LLC | System and method for presenting advertisements |
US7860859B2 (en) * | 2007-06-01 | 2010-12-28 | Google Inc. | Determining search query statistical data for an advertising campaign based on user-selected criteria |
US20100198865A1 (en) * | 2009-01-30 | 2010-08-05 | Bering Media Incorporated | System and method for detecting, managing, and preventing location inference in advertising over a communications network |
US9141758B2 (en) | 2009-02-20 | 2015-09-22 | Ims Health Incorporated | System and method for encrypting provider identifiers on medical service claim transactions |
US9704203B2 (en) * | 2009-07-31 | 2017-07-11 | International Business Machines Corporation | Providing and managing privacy scores |
US20110238482A1 (en) * | 2010-03-29 | 2011-09-29 | Carney John S | Digital Profile System of Personal Attributes, Tendencies, Recommended Actions, and Historical Events with Privacy Preserving Controls |
EP2426891A1 (en) * | 2010-08-31 | 2012-03-07 | Alcatel Lucent | A system to profile and expose mobile subscriber data without compromising privacy |
US10938561B2 (en) * | 2018-06-21 | 2021-03-02 | International Business Machines Corporation | Tuple level security for streams processing |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4962533A (en) * | 1989-02-17 | 1990-10-09 | Texas Instrument Incorporated | Data protection for computer systems |
EP0390563A3 (en) * | 1989-03-31 | 1992-12-02 | Matsushita Electric Industrial Co., Ltd. | Fuzzy multi-stage inference apparatus |
US5481700A (en) * | 1991-09-27 | 1996-01-02 | The Mitre Corporation | Apparatus for design of a multilevel secure database management system based on a multilevel logic programming system |
US5355474A (en) * | 1991-09-27 | 1994-10-11 | Thuraisngham Bhavani M | System for multilevel secure database management using a knowledge base with release-based and other security constraints for query, response and update modification |
-
1995
- 1995-06-12 US US08/490,001 patent/US5614927A/en not_active Expired - Lifetime
-
1996
- 1996-06-10 NZ NZ310293A patent/NZ310293A/en unknown
- 1996-06-10 EP EP96918385A patent/EP0834142A4/en not_active Withdrawn
- 1996-06-10 WO PCT/US1996/009703 patent/WO1996042059A1/en not_active Application Discontinuation
- 1996-06-10 AU AU61063/96A patent/AU697133B2/en not_active Ceased
- 1996-06-10 CA CA002224457A patent/CA2224457C/en not_active Expired - Lifetime
Also Published As
Publication number | Publication date |
---|---|
WO1996042059A1 (en) | 1996-12-27 |
NZ310293A (en) | 1998-07-28 |
CA2224457A1 (en) | 1996-12-27 |
AU6106396A (en) | 1997-01-09 |
EP0834142A4 (en) | 1998-09-30 |
US5614927A (en) | 1997-03-25 |
AU697133B2 (en) | 1998-09-24 |
EP0834142A1 (en) | 1998-04-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2224457C (en) | Protecting confidential information in a database for enabling targeted advertising in a communications network | |
US11949676B2 (en) | Query analysis using a protective layer at the data source | |
Komishani et al. | PPTD: Preserving personalized privacy in trajectory data publishing by sensitive attribute generalization and trajectory local suppression | |
US20240007543A1 (en) | Anonymous eCommerce Behavior Tracking | |
US20110010563A1 (en) | Method and apparatus for anonymous data processing | |
US20160335455A1 (en) | Method and apparatus for managing access to a database | |
US20230370245A1 (en) | Privacy-Preserving Domain Name Services (DNS) | |
KR20150115778A (en) | Privacy against interference attack for large data | |
US11836243B2 (en) | Centralized applications credentials management | |
US20240031274A1 (en) | Techniques for in-band topology connections in a proxy | |
CN103685318B (en) | Data processing method and device for network safety prevention | |
US20230198960A1 (en) | Data masking | |
JP3270483B2 (en) | System and method for protecting sensitive information in a database and enabling targeted advertising in a communication network | |
MXPA97010080A (en) | Protection of confidential information in a database to activate announcements objectives in a communication network | |
Thomas et al. | Emendation of undesirable attack on multiparty data sharing with anonymous Id assignment using AIDA algorithm | |
CN117941321A (en) | Privacy preserving Domain Name Service (DNS) | |
Elmeleegy et al. | Preserving Privacy and Fairness in Peer Data Management Systems | |
Salleh et al. | A Technique of Data Privacy Preservation in Deploying Third Party Mining Tools over the Cloud Using SVD and LSA | |
Soltani et al. | Separating Indexes from Data: A Distributed Scheme for Secure Database Outsourcing. | |
Soltani et al. | ISeCure | |
Kiyomoto et al. | A First Step towards Privacy Leakage Diagnosis and Protection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request | ||
MKEX | Expiry |
Effective date: 20160610 |