WO2015103342A1

WO2015103342A1 - A dynamic mechanism for selling online advertisements with user feedback

Info

Publication number: WO2015103342A1
Application number: PCT/US2014/072904
Authority: WO
Inventors: Nadia FAWAZ; Fernando Jorge Silveira Filho; Vijay Sukumar KAMBLE
Original assignee: Thomson Licensing
Priority date: 2013-12-31
Filing date: 2014-12-30
Publication date: 2015-07-09

Abstract

A method to present advertisements to a viewer of a streaming video includes selecting a class of advertisement to be presented in the streaming video and arranging multiple advertisements of the selected class into an ordered list of decreasing value. The value is related to a bid value that advertisers place on any one advertisement for insertion into the video. 5 The highest value advertisement is inserted first into the video and presented to a viewer. The viewer provides feedback concerning the inserted advertisement. If the viewer feedback is positive, then more advertisements from the selected class are shown. If the feedback is negative, then a new class is selected and the most valued advertisement of the new class is presented to the viewer.

Description

A DYNAMIC MECHANISM FOR SELLING ONLINE ADVERTISEMENTS WITH

USER FEEDBACK

CROSS REFERENCES

This application claims priority to a U.S. Provisional Application Serial No. 61/922198, filed on December 31, 2013, which is herein incorporated by reference in its entirety.

FIELD

[0001] The present invention relates generally to the feedback systems. More specifically, the invention relates to a system to insert advertisements into a video relying on user feedback.

BACKGROUND

[0002] Video consumption on the web (e.g., movies, TV shows) is on a steady rise due to the combination of over-the-top services¹ and Internet-enabled reception devices such as smart TVs and set-top boxes. Today, video already accounts for more than half of Internet traffic and it is projected to increase by a third before 2016. As TV audiences begin to move towards Internet video streaming, so do the billions of dollars spent every year in advertising. However, the annual spending in online video advertisement (ads) in the US is still an order of magnitude smaller than that of TV ads. These facts show that there is still a large potential for the growth of targeted advertising in online video platforms.

[0003] While there is a lot of research on mechanisms to sell display ad slots in sponsored search , the online video setting has received very little, if any attention. This video setting presents some novel design problems which mechanisms employed in sponsored search do not fully account for. Overall, placing ads in online video content combines benefits from both TV and sponsored search ads. On one hand, video ad opportunities are like commercial breaks; effectively diverting the viewer's attention from the main content (e.g., a TV show or movie) to the advertisement. This makes video ad slots more valuable than traditional sponsored search slots. On the other hand, the delivery of online video ads can be easily tracked and profiled, enabling per-user targeting which is common in the online ad platforms but not possible with broadcast TV.

¹ Over-the-top (OTT) services distribute content to an end user without being associated to the user's Internet Service Provider (ISP). Examples, as of this writing, include Hulu, Netflix, and Amazon Instant Video. [0004] The potential to exploit such fine-grained ad targeting depends on the ability to determine whether an ad is relevant to a user. Current online video services explicitly ask users about the relevance of ads shown in commercial interruptions². In turn, users provide binary feedback (i.e., the ad is either relevant or not) which is used by the service to learn about user preferences. While the service tries to exploit that knowledge to decide what ads to show next in order to optimize the user's overall satisfaction, it also seeks to collect revenue from competing advertisers at the same time. It is desirable to design generic mechanisms with provably good properties which synergistically perform both of these two tasks.

SUMMARY

[0005] This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. The Summary is not intended to identify key features or essential features of the claimed subject matter, not is it intended to be used to limit the scope of the claimed subject matter.

[0006] Inspired by the problem of designing market mechanisms for selling advertisement opportunities in video hosting websites, the inventors developed a generic dynamic ad allocation mechanism applicable in such markets. One feature is that a user is being successively presented with ads during a play session and the service elicits a binary feedback about the user experience after showing each ad. Designed is a mechanism which dynamically learns the preferences of the user and shows the most relevant ads, while at the same time generates revenue from competing advertisers who are strategic. This design exploits the assumption that ads can be divided into categories (e.g. cars, shoes etc.) and user preferences are sensitive to categories rather than individual ads. There is a two layered structure in the mechanism: the lower layer adaptively selects categories of advertisers to optimize the relevance feedback from the user, and the upper layer is a truthful auction mechanism that allots ad opportunities to advertisers in the categories chosen by the lower layer, charging them contingent on relevance. The relevance optimization problem in the lower layer was analyzed and proven are certain properties of the optimal policy. The instance where the user dynamics in the session is assumed to be Markovian is considered, such that a user stays to see each additional ad with a fixed probability β independent of the past or leaves the session. Using the knowledge of properties of the optimal policy, proven is that in this case an appropriately defined greedy algorithm achieves a constant factor of the optimal

²As of this writing, examples include Hulu and YouTube, each with millions of users. payoff. Truthful dynamic auctions are designed for the upper layer, motivated by classical goals of efficiency and revenue optimality.

[0007] Aspects of the invention include a general framework of a two layered mechanism, which can be used to dynamically allocate advertisements when a binary feedback can be obtained from the user after each allocation. The mechanism is structured around the idea that user preferences are sensitive to classes of ads rather than individual ads. One technical contribution is the analysis of the optimal dynamic allocation problem in the lower layer, which captures any such scenario where a designer needs to adaptively make a sequence of relevant recommendations to a user by using his feedback.

[0008] Additional features and advantages of the invention will be made apparent from the following detailed description of illustrative embodiments which proceeds with reference to the accompanying figures.

BRIEF DESCRIPTION OF THE DRAWINGS

[0009] The foregoing summary of the invention, as well as the following detailed description of illustrative embodiments, is better understood when read in conjunction with the accompanying drawings, which are included by way of example, and not by way of limitation with regard to the claimed invention.

Figure 1 illustrates a system diagram that serves as an environment for the invention;

Figure 2 illustrates a flow of activities within the environment;

Figure 3 illustrates an example media playback device block diagram;

Figure 4 illustrates an example web publisher block diagram; and

Figure 5 illustrates an example information dependency diagram.

DETAILED DISCUSSION OF THE EMBODIMENTS

[0010] In the following description of various illustrative embodiments, reference is made to the accompanying drawings, which form a part thereof, and in which is shown, by way of illustration, various embodiments wherein the invention may be practiced. It is to be understood that other embodiments may be utilized and structural and functional modification may be made without departing from the scope of the present invention.

[0011] A model is considered which assumes that ads can be divided into a finite number of categories (e.g., ads on cars, travel, apparel, or other products or services) and that a user's preference profile is a binary vector where each dimension represents an ad category and its value determines whether ads in the corresponding category are relevant for the user. This preference profile is not known when the user arrives but the service is assumed to know a probability distribution over possible profiles, which are called 'types' of a user. A probabilistic model is assumed for the number of ad opportunities that are available during a viewing session. The challenge is then to find an adaptive mechanism, that fulfils two goals. First, the mechanism should learn the user profile based on the user feedback obtained after each ad, in order to maximize the number of relevant ads shown during a session. This algorithm needs to take into account the limited tolerance that users have to watching repetitions of a same advertisement (ad). Studies have shown that users are most likely to skip ads that they have seen repeatedly. Second, this mechanism should also prescribe how to assign these ad slots to competing strategic advertisers given their willingness to pay, and how to charge them to generate revenue.

[0012] One aspect of the invention is the design and analysis of a generic dynamic ad allocation mechanism which achieves these objectives. The design exploits the observation that the preferences of the users are sensitive to categories of ads rather than individual ads. Using this, the inventors have developed a mechanism with a two layered alternating structure. The lower layer, concerned with relevance optimization, adaptively selects categories of advertisers to maximize the relevance feedback from the user. The upper layer is a truthful auction mechanism that dynamically allots advertisement opportunities to competing advertisers in categories prescribed by the lower layer, charging them contingent on relevance.

[0013] First defined and analyzed is the learning problem solved by the lower layer to maximize ad relevance based on user feedback. This is a stochastic dynamic programming problem on a large state space which is computationally hard to solve. However, various properties of the structure of the optimal allocation rule are derived. The specific case where the dynamics of the user in a session are Markovian, resulting in a geometric distribution on the number of ad opportunities in a viewing session is considered. For this case, it is shown, using the properties proven for the optimal policy, that an appropriately defined Greedy heuristic algorithm achieves a constant factor of the optimal payoff. Next, the problem of designing truthful auction mechanisms for the upper layer is considered. Here, the transition from the lower layer to the upper layer is discussed, where it is shown that the two layers can be treated separately for the sake of analysis, a fact which depends crucially on a property proved for the optimal allocation rule in the lower layer. Truthful auctions are designed inspired by the two classical goals of efficiency and revenue optimality. An implementation of the efficient Vickrey- Clarke- Groves (VCG) mechanism is used, which is called a 'drop-out price' (DOP) auction. The optimal mechanism is characterized. In the case where the valuations of the advertisers are i.i.d., it can be implemented as a DOP auction with a reserve price.

[0014] MAB Learning. The allocation problem in the lower layer the mechanism is related to the class of multi-armed bandit (MAB) problems with correlated arms that have been studied in recent years. Much of the prior work focuses on a specific correlation structure in which the rewards of the different arms are a linear function of a hidden global state vector. The written art considers the objective of minimizing the rate of the growth of regret due to incomplete information. In the present arrangement, the rewards are binary, the correlation structure is completely general, and there is a natural discounting on the sequence of rewards. Hence, solving this problem in an exact manner requires stochastic dynamic programming over a large state-space. Further, because of the correlation between arms, index based optimal policies may not exist.

[0015] Static Ad Auctions. The class of online ad auctions closest to the upper layer setting are position auctions, which are used in the context of sponsored search to sell advertisement space, called ad slots. In the static setting, the publisher is concerned with allocating at once a block of n impressions to advertisers, who are most commonly charged on a per-click basis. Ad slots differ in the attention they can draw from users and in their click- through rates (CTR), which is the probability of a user clicking on an ad displayed in that position. Usually, slots located higher on a page attract more attention and more clicks and hence they can be ranked from top to bottom. A popular mechanism to sell these slots in practice, is the Generalized Second Price auction (GSP). GSP ranks all advertisers in the order of their bids, which is their signal of the maximum amount they are willing to pay per click/impression, and allots the ad slots in the order of their ranking: the highest bidder is allotted the highest ranked slot. GSP payment rule is that each advertiser who is allotted a position pays a price equal to the bid of the advertiser in the next highest position per click/impression. The key difference in the current setting, inspired from video, is that the aspect of dynamic tracking and online feedback can be used to learn the preferences of users and adaptively decide the order in which the ads are shown, whereas in position auctions, this allocation is done at once a priori.

[0016] Another mechanism used in sponsored search is the Vickrey-Clarke-Groves mechanism (VCG) which is a generic mechanism that maximizes the social welfare. In this mechanism the publisher, upon receiving the bids, makes an allocation which maximizes the social welfare and makes each advertiser pay the 'externality' (the loss in utility) he imposes on other advertisers at social optimum. Since the publisher cannot anticipate which ad will clicked and has to make the allocation in a single block, the average social welfare is instead optimized using the click-through rates for the different slots. This leads to the same allocation of ads as the GSP but the externality computations and hence the payments depend on the CTRs. In the current setting, inspired from video, since a user and his feedback can be tracked dynamically, the publisher has complete control over the order in which the user sees the ads. Hence the externality computations required in the implementation of VCG can be done per session, thus obviating the need for average metrics like CTRs in the payment computations.

[0017] Dynamic MAB mechanisms. In the dynamic setting of sponsored search, the closest work to the current arrangement has looked at the problem of designing truthful mechanisms which combine the two aspects of learning the CTRs as well as finding good allocation and pricing rules (maximizing revenue or social welfare), which come under the general umbrella term of 'MAB mechanisms'. The present arrangement can be seen as a contribution to this body of literature, although in a different setting.

[0018] Figure 1 depicts an environment 100 in which the invention may be practiced. The system environment includes a router/gateway 106 connected via cable or wireless connection 111 to an internet service provider (ISP) 110 enabling access via connections 113 to a network 120. The router can be a standard router that is compatible with internet service provider equipment for routing internet protocol packets or a suitable gateway or modem. The router may be either a public or a private router. The router or gateway 106 can provide access to the network 120 via wired or wireless links to multiple user equipment such as user playback device 104. Although only one user playback or content consumption device is shown, many such devices can be supported. Each properly configured playback device 104 can operate independently to access a media service.

[0019] Also included in the environment of Figure 1 is a publisher 118 and multiple advertisers 102a, 102b, and 102c. The publisher is a web-based entity that is providing a download of multimedia content, such as a video to the user media player 104 via the ISP 110 and gateway 106. The publisher 118 can also provide streaming video to the user's media playback device 104. In this context, the publisher can inject advertisements into the video stream so that the user can view the advertisements. Advertisers 102a, 102b, and 102c have an interest in placing their product or service advertisements in the video stream of the media playback device 104 so that the user can be afforded the opportunity to see the advertiser' s product or service. Although only three advertisers are shown, many may be present.

[0020] In accordance with aspects of the current arrangement, web-based publisher 118 communicates with advertisers and users via connection 119 to the network 120. Publisher 118 communicates with advertisers 102 a, 102b, and 102c via connections 117a, 117b, and 117c respectively. Publisher 118 requests monetary bids from advertisers 102 to place advertisements (ads) into the users multimedia video stream for playback on users playback device 104. Advertisers respond to the request for bids from publisher and, according to aspects of the invention, the publisher insert the advertisements into the user playback device 104 video stream. The ads may be available to the publisher via a database 121 of advertisements from the advertisers.

[0021] After viewing an ad, the user has an opportunity to rate the add by providing binary feedback as to whether the ad is relevant to the user or not. According to aspects of the invention, the selection of advertisements for insertion is calculated according to the bid prices offered by the user and the user' s feedback response. In order for the advertiser to gain maximum potential for a sale and in order for a publisher to gain maximum profit from successful ad placements, a mathematical basis for ad placement and monetization is as described below.

[0022] User preference model. Consider the setting of a publisher hosting multiple video files on a website and L advertisers, each having a single ad, interested in showing their ads during the play session of a particular user. These will be video ads which will be displayed while interrupting the video. A play session begins when the user enters the website and starts watching a video and ends when he leaves the website after watching possibly several videos. The advertisers are divided into categories, each category representing a set of similar advertisers, e.g. selling similar products such as shoes or car insurance. Let j = \, · · · , denote these categories labels. Each category j has Lj advertisers. A given user considers some set of categories to be relevant to him. A priori this set is not known and an important feature of the ad allocation problem is to learn these user preferences. For this purpose, it is assumed that the ad allocation mechanism can elicit explicit feedback about the relevance of an ad after it is shown. This feedback is obtained as an answer to an explicit question, is assumed to be binary, and takes value 1 when the ad is relevant and 0 when the ad is not relevant.

[0023] The uncertainty in the preferences of the user is captured by assuming that the user is one of N possible types and the actual type of the user is a latent random variable which is not observed at the beginning of the session. Let X G {1, · · · , N] denote this random variable. Let P_x be the corresponding probability distribution. It is assumed that the publisher only knows the distribution P_x on the types. This distribution is assumed to be estimated from the behavior of different users in past sessions. For each user type i and for each advertiser category j, let G {0,1} denote the binary relevance feedback of the user of type i to that category. The type of the user is not known, and so for each category j, introduce a random variable Yj G {0,1} which represents the binary feedback of a user for any advertisement in that category. Note that implicitly assumed here is that the relevance of an ad depends only on its category. The probability distribution of Yj is thus induced by P_x and the feedback values

{qjⁱ} as P(Y_J = l) =∑?₌₁ qjⁱ P_x (C) .

[0024] It is convenient to associate each user type i with an //-length binary vector of the {qj^l} values for different categories. Hence, define a N x H "relevance" matrix Q = {qj}, whose rows represent user types, and columns ad categories³. Table 1 is an example of a relevance matrix with six types of users labeled 1 to 6 and five ad categories labeled A to E. Each category has the specified number of advertisers.

Table 1. A sample relevance matrix

³This type space is quite general. If a user has a joint distribution over finding different categories relevant, such a user can be decomposed into a probability distribution over realizations of binary relevance vectors. Type 3, /?=0.1 1 1 0 0 1

Type 4, p=03 1 1 1 0 1

Type 5, p=0A 0 1 0 1 0

Type 6, p=0A 0 0 1 1 0

[0025] The term display rules is defined as a set of fixed rules which decide when an ad can be shown to a user during the session. For each possible play dynamics of a user, a display rule dynamically assigns ads to the user at different time instances in the play session. A bound on the number of ad opportunities that may come up in a play session is not limited. However, it is assumed that an advertisement cannot be shown more than once to the same user during the session, a constraint that will be called the matching constraint. Hence the maximum number of ads that can be shown is restricted to L, the number of advertisers. Most of the following analysis may be extended appropriately if there are instead any finite bound on the number of repetitions of an advertisement (ad).

[0026] Since the intended play dynamics of a user during a play session are not known, it is not known how many opportunities will be available in advance. A model of the number of display opportunities is a random variable C £ (1,2, ··· } with a probability distribution P_c. It is further assumed that the random variable C is independent of the user type X.

[0027] Once a user enters the website, at each ad opportunity the publisher has to decide which ad should be shown to the user under the matching constraint. The publisher can elicit feedback from the user after every ad. However, since the feedback for an ad is the same for every ad in the category, eliciting feedback again for every such ad is vacuous. Thus one can assume that the feedback is obtained only when the ad allotted belongs to a category which has not been shown before. The present approach relies on a dynamic mechanism which iterates between two layers, which are discussed below. More precisely, the lower layer exploits user feedback to allocate ad opportunities to ad categories, in order to maximize relevance. On top of the lower layer, the upper layer runs a truthful ad auction mechanism to allocate each ad opportunity to an advertiser from the categories determined by the lower layer.

[0028] The lower layer : Dynamic relevance maximization. One objective of the present mechanism is to maximize the expected number of relevant ads shown to a user in the session under the constraint that no advertisement is shown more than once. This forms the lower layer of the mechanism. The objective is defined formally as follows. A policy ψ for the publisher is the sequence of maps

= {ψ₁,^■■■ , xjj_L] where each map ifj_t '- H_t→ A_t is a mapping from the set of possible observations of user feedback till time t, denoted by H_t, to the set of possible actions A_t, which is the set of choices of advertisers, such that each of these maps obey the matching constraint, which is the constraint that an advertisement cannot be shown more than once to the same user during the session. Let Ψ be the set of all feasible policies. The objective of the publisher is to find a policy which maximizes the expected number of relevant ads shown in a session. Let U_t be the random variable denoting the advertiser chosen at time t under a policy

and with some abuse of notation, let g(U_t) be the category of the advertiser. Then the objective of the publisher is:

Equation (1 )

[0029] Solving the relevance optimization problem requires stochastic dynamic programming over a large state space, which could be computationally prohibitive. Instead, a few key structural properties of the optimal allocation policy are proven which solves equation (1). These properties form the basis of most of later results.

[0030] Properties of the optimal allocation policy. If there is an ad which is surely relevant conditional on past observations, then it should be shown immediately.

[0031] Lemma 1 : The optimal dynamic ad allocation policy has the property that if at any opportunity there exists a set of advertisers who will generate a positive feedback with probability 1 conditional on the past observations, then they are all scheduled to be allotted immediately in any order. A proof is given in Appendix 1. This property implies that if a positive feedback is received for an ad belonging to a particular category, then all advertisers of that category are scheduled to be allotted in the immediately following opportunities.

[0032] To describe the next property, a few ideas are first defined. In the dynamic allocation of ads to the opportunities, an opportunity t is termed an experimentation opportunity if conditional on information obtained till time t— 1, there is not a single category j such that Yj = 1 with probability 1. If there existed such a category, the previous lemma would be used to exhaust all the advertisers in that category. But since there is no such category, an experimentation opportunity brings one to the non-trivial problem of deciding which category to present to the user next. Thus all the non-trivial decisions in the optimal dynamic allocation rule are taken at the experimentation opportunities. Note that after observing the feedback from the allocation made at an experimentation opportunity, the set of possible user types, i.e. the set of values i of the random variable X such that P(X = i \H_t > 0 reduces by at least 1. Let S(t) be the set of probable user types conditional on the information obtained till opportunity t— 1. For each category j, which has not been exhausted till time t, let Mj (t) = {i e S(t) : = 1} which is the set of user types in S(t) which find category j relevant.

[0033] Definition 1 : A category j is said to dominate category j' at opportunity t if Mji (t) c Mj (t). The categories which are not dominated by any other category will be called non-dominated categories. For eg. in Table 1, A, B, C and D are the only non-dominated categories since A dominates E. Then, one can show that:

[0034] Lemma 2. In the optimal allocation rule, at any experimentation opportunity, the ad is allotted to an advertiser in a category which is not dominated by any other category. The proof uses an interchange argument and is deferred to Appendix 2.

[0035] Approximate optimality of a Greedy algorithm for Markovian user dynamics In this section it is assumed that the number of display opportunities C is a geometric random variable with distribution P_c(k) = ?^fe_1 (l— ?). In other words, it is assumed that a Markovian model for the user dynamics such that a user stays for each additional ad opportunity with probability β independent of the past. Here it is assumed that at least one opportunity is always available, say for example by showing an ad in the beginning of a session. The optimization problem (1) takes the following form

oo

max ^-¹ Equation (2)

1 = 1

[0036] In this case, it is shown that an appropriately defined greedy policy achieves a constant factor of the optimal payoff. First, a few more definitions are presented.

[0037] Non-dominated equivalence classes: Let S be the set of probable (with positive probability) types at the first ad opportunity, with a probability distribution P_x and let E be the set of ad categories available. For each j° G E, let M_j = (i £ S: q- = 1} be the set of user types which find category j relevant. As described earlier, category j dominates category j' if Mji (t) c Mj (t). Let J be the set of non-dominated ad categories. Note that Ujej Mj = S. According to Lemma 2, the immediate ad category shown in the optimal allocation is one in J.

[0038] Now let U be a class of categories such that M = M_j for all G U. Thus U is the class of categories found relevant by exactly same set of types. U will be called a non- dominated equivalence class of categories and ydenotes the set of types which find the class U relevant. Allow for singleton categories in the definition and so suppose there are K such non-dominated equivalence classes {U₁, U_K] which partition the set of non-dominated categories in the relevance matrix. If furthermore the sets of types {M_0I, ... , M_UK] are mutually disjoint, then the set of non-dominated equivalence classes partition the type space. This means that the relevance matrix is of the form of a permutation matrix of K smaller block matrices, with each block matrix corresponding to an equivalent non-dominated class. Such a small block is composed of columns of all Is, one for each category in the class, and columns corresponding to the categories that the class dominates. Let L^k =∑ _ey_fc L_j and define j mⁱn _ _mmfe ^^fe j which is the minimum number of advertisers that can be shown in any non- dominated equivalence class of ad categories, in the relevance matrix at the first ad opportunity.

[0039] As ads are presented and as the relevance matrix is recomputed after each feedback, one looses non-dominated categories or new categories may become non- dominated. Thus the set of non-dominated equivalence classes will change. But categories in an equivalence class in the relevance matrix at the first ad opportunity will continue to remain in the same class as long as they are non-dominated and they have not been presented. Suppose that a class U_k in subsequent ad opportunities is identified by equivalence to the set of categories in U_k at the first ad opportunity. At the heart of the result is the following property of the ad allocation policy.

[0040] Lemma 3: Consider an initial set of non-dominated classes of ad categories {U₁, U_K] (which have not yet been presented) and a category from a class k^* is presented. If a negative feedback is received for this category, then the set of non-dominated equivalence classes of ad categories for the new relevance matrix left after computing the posterior distribution results from the removal of class U_k* and further can only result from

- Removal of some other classes since they become dominated.

- Merging of pairs of classes into a single class since they become equivalent.

- Addition of new members to the classes.

Here when two classes merge, they are identified with any of the original classes. Hence as long as the publisher keeps receiving negative feedback, the set of non-dominated equivalence classes of ad categories at each step does not grow, it can only shrink.

[0041] On the other hand, after a positive feedback, completely new non-dominated equivalence classes can appear in the new relevance matrix computed after the posterior update. Lemma 2 coupled with this property reveals the following structure of the optimal policy. Beginning from a set of non-dominated equivalence classes of categories, dynamic programming can be used to decide the optimal order in which to present these classes as long as the publisher keeps getting negative feedback. If any class obtains a positive feedback in the process, then one can 'zoom in' to the next level (eliminating all the other types) and restart with a new set of non-dominated equivalence classes. This observation hints at considering a simple heuristic policy, where the order in which to present the non-dominated equivalence classes is chosen greedily. Following is the proposed greedy policy.

[0042] Definition 2: A greedy algorithm is defined as the one in which at each experimentation opportunity, amongst all non-dominated ad categories, any ad category belonging to that non-dominated equivalence class which has the maximum expected number of ads with positive feedback conditional on the history is presented. If the feedback is positive then all the advertisers in the categories in that class are exhausted.

[0043] Note that the expectation in the definition is over the randomness in the number of future ad opportunities as well as the type of the user. The following theorem follows.

[0044] Theorem 1 : The Greedy algorithm is [ max {

optimal, where L is the total number of advertisers.

[0045] Proof. The lower bound of ^₊ ^_K which holds for β close to 1 is to be proven.

Consider the set {U₁, U_K] of non-dominated classes of ad categories at the first experimentation opportunity. Consider a class of policies which shows the categories in {U_lt U_K] in some fixed order (by showing any category in a particular class) untill it gets a positive feedback. Once it gets a positive feedback, it exhausts all advertisers in that class and allots optimally from that point onwards. For each k = 1, ... , K, and each π c (1, ... , K] such that /c i it, let E be the event in which the user type has a negative feedback for all categories in π but a positive feedback for class U_k. Let P(n,j) be the probability of this event. This is the probability that the user type is one of the set of types which satisfies this condition. Let V ¹ be the expected payoff to go on the event E, under a policy which has shown all the categories in π in some fixed order and which will now present class U_k. . Note that on learning this event, the policy will first present all the ads in category class U_k. and then allot optimally thereafter. Thus: V£ = P n, ).

Here L^K are the number of ads in class k and V_K is the optimal payoff-to-go conditional on the event E given that class k is also used up. An approximate payoff by is defined as

1 - β

The ratio of the two quantities is

= ¹ - ^{pLk≥ 1} - ^{Equation (3)}

-7Γ B^L

Where the first inequality follows since β ^KV_K <— - and the second inequality follows from the definition of L^MIN , since Lemma 3 says that the number of ads in a class can only grow. Now the optimal policy finds the best order in which to present the non-dominated equivalence classes of ad categories which solves the following optimization problem.

OPT = max V_tl + βν£ + β²ν^ +^■■■ + β^ν^→ Equation (4)

Consider instead

OPT' = max _¾1 + Equation (5)

OPT _ . „imin „_T , , . OPT

Clearly equation (3) implies that ≥ 1— β . Now the greedy algorithm attains _i+ ^_K in equation 5. To prove this, one can use induction in the dynamic programming problem that solves equation 5. Let ctj be the lower bound on the ratio of the payoff to go under the greedy policy and that under the optimal policy when the number of categories left is i where i varies i from 1 to K in the problem of equation 5. There is interest in proving that a_K≥ _i+ ^_K . Now if t_x,^■■■ , t_K_₁ is decided then there is only one option left for t_K and hence the greedy policy gives the same payoff as the optimal payoff to go. Thus <x_x = \. Now fix an i≥ 2. And consider the payoff to go under the optimal policy when K— i classes in the order have been selected. Let the set of these categories already selected be denoted by π and denote this optimal payoff to go by G_{QPT '}- Denote the payoff to go under the greedy policy by Gg . Let the class selected by the greedy policy next be U_k for k G (1, ... , Κ\π}. Then, by the definition of Gg = μϊ + β G ^k≥μ% + βα_ί_₁ G , Equation (6)

Now,

^GOPT = *_ε{¾_π} ^{+ P G}0PT- Equation (7)

Now suppose an oracle tells you the feedback for a a class U_K for free at this point. Then the optimal payoff under this new information is higher than the optimal payoff if you don't have this information, i.e. G^_{PT '} (because you can always choose to ignore the oracle). Denote this optimal payoff under the new information structure as G_QPT' - Then under this new information, clearly if the oracle lets you know that the feedback for U_K is positive then you exhaust all the advertisers in U_K where as if the oracle tells you that the feedback is negative then you remove U_K from the set of available categories and move on without wasting any opportunity on testing U_K . Thus

0_ο ^π _ρτ. = μ + G¾ ≥ G_Q ^N _PT: Equation (8)

And thus,

⁶ΟΡΤ' ^{≥ 6}ΟΡΤ' - ^- Equation (9)

Substituting equation 9 into equation 6:

Equation (10) Further observe that since the greedy policy chooses k, then ΰ^_ρτ· < μΐ~^~ ^or fe—

^Gopr' ¾ ^{and thus:}

Gg ^ (! - ?) (! , „

≥ ;— τ +

OPT '

Equation (11)

Here the second inequality follows since i < K. Now consider the recurrence equation

(Ι - βΧΐ - βα^)

^■ + β<*ί-ι Equation (12)

1 - β^κ

Now, ct_j < for a_i_₁≥ ^₊ ^_K and hence the sequence (ct_j) generated by the recurrence

1 1

relation, with a_x = 1 is decreasing as long as a_i > ^₊ . Further, for a_i_₁≥ ^₊ , then, 1

≤ OC; i -1

1 - β^κ 1 + β - β^κ

1

Thus one can conclude that the sequence {α is uniformly bounded below by a^* =

ι+β-β^κ

1

which is the fixed point of the recurrence equation . Thus, which proves the ι+β-β^κ

result.

A lower bound on the ratio of payoffs under the greedy algorithm can be obtained and the optimal algorithm for values of β is close to 1. This follows from the observation that if user stays for long enough so that the number of ad opportunities available is greater than the number of advertisers L, then any policy obtains all the positive feedback that one can possibly obtain. For a user of type i, the total number of advertisements with positive feedback is given by r_t =∑^₌₁ q) L_j . Thus the expected total number of advertisements with positive feedback is

On the event W that the number of advertisement opportunities is greater than L, the greedy policy obtains the full payoff of B. Thus its expected payoff is bounded by

V_G≥ P(W)R = P(C≥ L)R = /?^L_1i?

Further the optimal policy cannot attain a payoff greater than R. Thus the ratio of the payoff under the greedy policy and that under the optimal policy is at least ?^L_1.

[0046] The upper layer : A dynamic auction for selling ad opportunities. The above was concerned with finding the optimal dynamic ad allocation mechanism for allotting ad opportunities to ad categories. In what follows, the problem of designing a market mechanism to allot these opportunities to individual advertisers in the different categories is considered. The overall mechanism has a two layered structure: the lower layer is concerned with relevance maximization and the upper layer is a market mechanism. This proposal is facilitated by the following observation. At every experimentation opportunity, since the optimal policy (or the greedy heuristic) requires one to present a non-dominated equivalence class of ad categories, one can do so by presenting any ad from any of the categories in the class. This enables consideration of the problem of designing an auction for advertisers in that class separately. Suppose this auction recommends allotting the highest bidder first, then one can test the relevance for the class using this bidder. If the feedback is positive, then one continues the auction since no other class emerges to be relevant with probability 1 after the posterior update.

[0047] Mechanism design. The model for the valuations of the advertisers for showing their ads is now discussed. Consider a non-dominated equivalence class S with L advertisers. Conditional on the ad category being deemed relevant with a positive feedback, advertisers have some privately known valuation for showing their ad to the user in the session, which is the maximum amount they are willing to pay for the ad shown. If one assumes that the private valuation of an advertiser i is drawn from a continuous distribution _¾ with support on the set Vi = [0, a;] for i = 1,^■■■ , L. Further, the valuations of the advertisers are assumed to be mutually independent. Assume quasi-linear utility functions for the advertisers, i.e. if an advertiser i with valuation v_t gets his ad shown and is charged an amount p_t and if the ad is relevant to the user, then its utility is u_t = v_t — p_t . Similarly if p_t is the payment received by the publisher from advertiser i, then his revenue is u_p =∑_¾=1 ρ_¾. Further assume that the advertiser has no value for an ad shown if it is deemed irrelevant with a negative feedback.

[0048] The efficient auction: Drop-out price auction An auction mechanism which is called the drop-out price auction is defined which is an implementation of the efficient VCG mechanism for the present setting.

1. Bidding rule : Each advertiser i submits a bid bi which is the maximum amount it is willing to pay for an ad shown and deemed relevant.

2. Allocation rule : The publisher ranks the advertisers in descending order of their bids and allocates the ad opportunities in the order of the ranking till either the user leaves or all the advertisers in the class S are exhausted.

3. Payment rule : All the advertisers whose ads get shown pay an amount equal to the bid of the next highest bidder in the ranking when the user leaves, called the 'drop-out' price. If all advertisers are exhausted, no-one pays anything. [0049] The revenue optimal auction: Drop-out price auction with reserve price. A dropout price auction with a reserve price is a modification of the above mechanism so that there is a fixed price r which is the minimum amount any advertiser has to pay if he gets allotted. Thus the only advertisers ranked are the ones whose bid is higher than the reserve price. As stated before, the valuation v_t of advertiser i is distributed according to a continuous cumulative distribution function _t defined on the support v_t G [0, a_t]. Assume that this distribution has a density function f_t. Then the function h_t(v) = is called the hazard rate function of the distribution. Define the virtual valuation of an advertiser i to be

1

qOi) = vt -—— forvi G [O. di]

[0050] The mechanism design problem is said to be regular if this function is monotone non-decreasing in v_t which is true if h_t is non-decreasing. The following theorem is proven in Appendix 3.

[0051] Theorem 2. Assume that the hazard rate functions hi are non-decreasing for the distributions of the valuations of each advertiser i. Then the optimal auction for allotting the advertisement opportunities ranks all the advertisers i for whom Ci (b ) > 0 in decreasing order of Ci (b{) and allot the opportunities in that order. Let c^* be the virtual bid of the advertiser who is next on the ranked list when the user drops out, if such an advertiser exists and 0 otherwise. Then each advertiser i whose ad was shown pays an amount equal to

[0052] Corollary 1. In the case where the valuations of all advertisers are identically distributed, with a cumulative distribution function F, density /, and hazard rate h, the optimal auction can be implemented as a drop-out price auction with reserve price r, where r is a solution of v = .

h(y)

Note that the optimal auction may prescribe not allotting an advertisement opportunity to an advertiser if his valuation is too low even if he is relevant. Hence this auction goes against the objective of relevance optimality of the lower layer. This is not the case with the drop-out price auction.

[0053] Overall Mechanism Discussion.

Lower layer: Follow an optimal or greedy dynamic allocation policy. If at an experimentation opportunity, the policy prescribes allotting an advertiser from an equivalent non-dominated class S, choose the top advertiser prescribed by the upper layer in the auction mechanism run on the set of advertisers in S. If the feedback is negative, move on to the next experimentation opportunity. If it is positive, switch to the upper layer and continue with the auction on the class S

Upper layer: At any experimentation opportunity in the lower layer, for a prescribed non- dominated equivalence class S, create an order of advertisers to be presented according to any of the two auction mechanisms. If the top advertiser is deemed relevant, continue allotting according to the order till the user drops out or all advertisers scheduled to be allotted are exhausted and then charge the allotted advertisers according to the payment rule of the auction.

[0054] Figure 2 depicts a flow diagram 200 of a method according to aspects of the invention. The method 200 starts at step 205. The ads are to be inserted into the video file at times given by the display rules discussed earlier. The ads may be organized according to categories. Some categories may be indicative of products whereas other categories may be indicative of services. Product categories may include such categories as cars, electronics, stationary, major appliances, personal items and the like. Service categories may include such categories as travel agents, financial planning, catering, event planning, elder care, and the like. Multiple advertisers may have ads in each category. Thus, advertisers can bid on ads in a category. As an option, the advertisers can also bid on ads at step 220 below. The process 200 moves to step 210 where a starting class is chosen. The class determined in step 210 is chosen as the non-dominated equivalence class of ad categories with the maximum expected relevant ads. As such, it may show ads in the same class but from different categories. An equivalence class of ad categories is a set of ad categories. Each category contains multiple ads.

[0055] In step 220, the ads related to the chosen class are arranged in order of their bid values from the various advertisers who belong to that class. The first advertisement in the ordered list is the advertisement having the highest bid value in the class chosen. It is then displayed to a user while viewing a video. At step 230, once this first and most relevant ad in the chosen class is shown to the user, then the user is asked for feedback on whether the ad is relevant to her. The user responds with a binary yes or no response. The user response in this instance is solicited and provided only once per ad equivalence class.

[0056] If the ad is relevant to the user, the method moves to step 240. At step 240, the presentation of ads to the user in the category of ads continues. As such, the insertion of ads into the video stream is continued. The ads presented are from different advertisers derived from the ordered list of ads created in step 220. The presentation of ads continues until the class of ads is exhausted or until the user is determined to have stopped viewing the video (i.e. the user exits). The class of ads becomes exhausted when all of the ads from all of the advertisers in that class have been inserted into the video being viewed by the user. At step 240, the advertisers are charged according to the drop out price rule or the drop out price with reserve price rule.

[0057] If at step 230, it is determined that the ad was not relevant to the user, then the method 200 moves to step 235 where a fee may be charged to the advertiser for presentation of his ad in the video. Even though the user found it non-relevant, the service of advertisement placement was provided and the single fee is appropriate. After registering the fee, the method moves to step 250 where it is determined if the user has exited the video viewing session. If the user has exited, then the process stops at step 265. If the user has not exited, then it is determined that the class of ads is not of interest to the viewer and a different class of ads may be chosen in step 260. The detection of an exit can be automatically sensed by observing a users' on-line presence.

[0058] At step 260, the current class of ads is removed because it has no further relevance to the user viewing the video. A new relevance matrix is computed. The system returns to step 210 for a new class of categories of ads to be selected by finding the class of categories that has the next highest expected relevance. After selection of the next class of categories of ads to display to the user, the method returns to step 220.

[0059] At step 220, the order of advertisers in the newly selected class of advertisement is generated. The first advertisement from the highest bidder in this new class of ads is then displayed to a user in the video being viewed by the user. Step 220 flows to step 230 and the steps are repeated according to their functions described above.

[0060] In one alternative embodiment, it may not be advisable to exhaust all the advertisers in a category once a user deems a category relevant, since the user may want to look at something else. One way to account for this in the mechanism is to put a finite bound on the number of advertisers shown in any category. This does not pose a problem in the lower layer; one moves on to experiment as soon as that bound is reached. In the upper layer also, irrespective of the auction mechanism used, the ranking of advertisers in a non- dominated equivalence can be constrained so that the ranked advertisers of any category in the class does not exceed this bound. Since such a bound is exogenous to the bids, one can easily see that such a modification does not disturb the strategy-proofness of the mechanisms. Another possible option is choosing the optimal auction mechanism for the upper layer, which only allots the advertisers with high enough valuations so that it could naturally restrict the number of advertisers shown of a particular category. [0061] Figure 3 is an example block diagram of the media device 104 of Figure 1. The block diagram configuration includes a bus-oriented 350 configuration interconnecting a processor 320, and a memory 345. The configuration of Figure 3 also includes a network interface 101 to a gateway, such as router or gateway 106 of Figure 1. The router or gateway may utilize either a wired or a wireless interface to the media device.

[0062] Processor 320 provides computation functions for the media device, such as the one depicted in Figure 1. The processor 320 can be any form of CPU or controller that utilizes communications between elements of the media device to control communication and computation processes. Those of skill in the art recognize that bus 350 provides a communication path between the various elements of embodiment 104 and that other point-to- point interconnection options (e.g. non-bus architecture) are also feasible.

[0063] Memory 345 can act as a repository for memory related to any of the methods that incorporate the functionality of the media device. Memory 345 can provide the repository for storage of information such as program memory, downloads, uploads, or scratchpad calculations as well as the storage of streamed video and inserted advertisements. Those of skill in the art will recognize that memory 345 may be incorporated all or in part of processor 320. Network interface 325 has both receiver and transmitter elements for communication as known to those of skill in the art.

[0064] User interface and display 310 is driven by interface circuit 315. The interface 310 is used as a multimedia interface having both audio and video capability to display streamed audio and/or video obtained via network interface 325 and connection 101 to a network. The user interface also allows the user of the media device to supply feedback to the publisher 118 concerning the relevance of an advertisement placed in the streaming multimedia being rendered for the user. In one embodiment, the publisher sends a request for feedback to the media device after display of an advertisement. The User interface and display 310 allows the user of the media device to respond to the publishers request for feedback.

[0065] Figure 4 is an example block diagram of the publisher of Figure 1. The block diagram configuration is simplified and is depicted as if the publisher were a single device. However, as is known to those of skill in the art, a web-entity, such as the publisher 118 may include multiple electronic systems interconnected to function as a web server or other equivalent web entity. If configured as a single device, the publisher 118 of Figure 1 may take the embodiment of Figure 4. One of skill in the art recognizes that the publisher depicted in Figure 4 is only one embodiment and that many embodiments, including those of a multi- element system may include some of the features shown in Figure 4 even if publisher 118 is a distributed entity.

[0066] In the embodiment of the publisher shown in Figure 4, the publisher includes a bus-oriented 450 configuration interconnecting a processor 420, and a memory 445. The configuration of Figure 4 also includes a network interface 425 to a network link 119, such as an interface to a wide area public or private network. The network interface 425 may utilize either a wired or a wireless interface to connect the publisher with the network 120.

[0067] Processor 420 provides computation functions for the publisher. However, one of skill in the art recognizes that such processor 420 may be a single processing device or a distributed capability. If located in one device, the processor 420 can be any form of CPU or controller that utilizes communications between elements of the publisher to control communication and computation processes. Those of skill in the art recognize that bus 450 provides a communication path between the various elements of embodiment 118 and that other point-to-point interconnection options (e.g. non-bus type architecture) are also feasible that could support distributed elements over a public or private network.

[0068] Memory 445 can act as a repository for memory related to any of the methods that incorporate the functionality of the publisher. Memory 445 can provide the repository for storage of information such as program memory, downloads, uploads, or scratchpad calculations as well as the storage of streamed video and inserted advertisements. Those of skill in the art will recognize that memory 445 may be incorporated all or in part of processor 420. Network interface 425 has both receiver and transmitter elements for communication as known to those of skill in the art. Database interface 435 is used to connect the publisher to a database of advertisements. In one embodiment, as shown in Figure 1, the advertisement database 121 is external to the publisher. In alternate embodiments, the database 121 can be accessed via a public or private network and the database interface 435 represents the network interface connecting to the database. In another embodiment, the database interface 435 is used to connect to a local database. User interface 410 and its respective display driver 415 is used to locally manage the publisher.

[0069] Although specific architectures are shown for the implementation of an advertisement placement system such as those of Figures 1, 4, and 5, one of skill in the art will recognize that implementation options exist such as distributed functionality of components, consolidation of components, and location of the publisher in a server. Such options are equivalent to the functionality and structure of the depicted and described arrangements. [0070] Appendix 1. Proof of Lemma 1

Proof. The result follows from a simple interchange argument. Suppose at time t, the posterior distribution over the set of possible types is P_t = (P_t , '" _> P_t ) ^and the set of remaining advertisers is A_t. Consider the event W with a fixed realization of the user type X = i and a fixed realization of the random variable C = c, which is the time at which the user leaves. Thus on this event, the string of binary feedback for the different categories is {qj^l}. Then for any fixed policy

the sequence of allocations of the ads from time t onwards till time c is dictated by the policy and is determinate. Let this sequence of allocations be {u_t, u_t+1> - , u_c} and the corresponding sequence of feedback be 0¾(_t__t), 3¾(_t__t+1)' " "

Suppose there exists an advertiser k^* G A_t such that conditional on observations till time t— 1, Yg(k*) ⁼ 1 w-P- 1· Now consider 2 cases:

[0071] Case 1 : Assume that on event W, for policy

k^* G {u_t, u_t+1,^■■■ , u_c}. Say u_ti = j^* for t' G {t,^■■■ , c}. Then if t'≠ t, construct a policy which generates the sequence of allocations {j^*, u_t, --- , _ti_₁, u_ti₊₁, --- , _c] and thus give the same payoff on event W. This policy is the following:

- Allot advertiser k^* at time t.

- From time t + 1 onwards follow policy

assuming H_t+1 = H_t

- When ψ prescribes allotting k^* at time t' + 1, use the information y_k* = 1 to update the history to H_t<₊₁ and remove k^* from the set of available advertisers. Follow the prescription of for allotting advertisers from t' + 1 and onwards.

Clearly, this policy gives the same payoff on event W since only the time at which k^* is allotted has been interchanged.

[0072] Case 2 : Assume that on event W, for policy

k^* £ {u_t, u_t+1,^■■■, u_c}. Then observe that the policy

generates the following sequence of allocations: {j^*, u_t, u_t+1,^■■■ , u_c_₁}. Thus the difference in payoff is given by

Yg(k^*) - Yg(u_c) = 1 - Yg(u_c) ≥ 0

Thus the number of relevant ads shown under policy

is at least as high as that under policy for every such event W from a set of disjoint events whose union is the entire probability space. Thus the expected number of relevant ads shown is also at least as high. [0073] Appendix 2. Proof of Lemma 2

Proof. The proof also follows from a slightly more involved interchange argument than the one used for lemma 1. Suppose that at opportunity t, the optimal policy

allots an advertiser k ' of category j ' while there exists a category j such that Mji (t) c (t). Let k be any generic user of category j. Now consider a policy

which allots category j before category j ' by allotting k at opportunity t. To further describe this policy, consider two cases:

- If the user finds j relevant, it exhausts all the advertisers in that category and then moves on to allotting j '. After it allots j' it behaves as if j was never allotted until

prescribes allotting j, upon which you update the information that j is relevant and move on to allot the next category prescribed by

and so on.

- If the user finds j not relevant, then from time t + 1 onwards it acts as if it allotted j ' at time t and found that j' is not relevant. Then when

prescribes allotting j, you update the information that j is not relevant and move on to allot the next category prescribed by

and so on.

[0074] On every disjoint event of the underlying probability space you show at least as many relevant ads by following policy

Consider an event W on which the user leaves after opportunity c≥ t and on which the realization of the user type is X = i. Then for the policy

the sequence of allocations of the ads from time t onwards till time c is dictated by the policy and is determinate. Let this sequence of allocations be {u_t, u_t+1,^■■■ , u_c] and the corresponding sequence of feedback be { ₅(_Ut)i ¾(u_t+1)_{< >} yg(u_c)}- These allocations and feedback depend on the type i that was realized on W . For this, consider 3 mutually exclusive and exhaustive cases:

[0075] Case 1: First consider the case where on the event W , X = i G Mji (t). Then observe that y_g(u_t) ⁼ Vgik') ⁼ 1 ^an^ immediately the publisher deduces that Yj = Y = 1. Thus since

is optimal, by the previous lemma w.l.o.g. the first Tr + rnj allocations in the sequence {u_t, u_t+1,^■■■ , u_c] can be assumed to be all the advertisers belonging to categories j' and j and thus the feedback is a sequence of + mj Is. Note that policy

will be operating under case (1) and thus it will also generate a sequence of allocations in which the first m.ji + m.j allocations in the sequence {u_t, u_t+1,^■■■, u_c] will be all the advertisers belonging to categories j' and j (in a different order) and after that the rest of the sequence of allocations is identical to that under

Thus on such an event W both the policies generate the same sequence of relevance feedback. [0076] Case 2: Now consider the case where on the event W, X = i G (t) - ' (t). In this case y_g(u_t) ⁼ Vgik') ⁼ 0 but Y_j = 1. In this case the policy

generates the sequence of allocations {k', u_t+1, ··· , u_c ^~} and gets the feedback {0, ¾(_Ut+1), · · · , y_{g Uc})}- Where as observe that the policy

operates under case (1) and the publisher discovers that Y_j = 1 by alloting k and then continues to exhaust all the advertisers in j before switching to the prescriptions of ψ. Thus generates a sequence of allocations in which all the advertisers in j are allotted first and then the prescription of

is followed as described in case (1). In the case where prescribed allotting j at some opportunity and was able to allot all the advertisers in j until the final opportunity c, this leads to a sequence of allocations which is just a different ordering of the elements of the sequence {k^r, u_t+1,^■■■ , u_c] and thus generatesthe same number of relevant ads shown till time c. In the case where

allotted 0 < r < rn_j advertisers of category j up till the final opportunity c, then under policy

the last rri_j—r advertisers in the sequence { _> yg(u_t+1)_> ' " _> yg(u_c)} ^are dropped out in lieu of those many number of advertisers in category j in the beginning. But since all advertisers in j are relevant, this number of relevant ads under policy

is still at least as high as that under psi'.

[0077] Case 3: Now consider the case where on the event W, X = i G S(t)— M_j (t). In this case y_g(u_t) ⁼ Vgik') ⁼ 0

generates the sequence of allocations {k', u_t+1, --- , _c) and gets the feedback {0, _5(Uf+1), ··· , y_{g Uc})}- Iⁿ this ^case Ψ' operates under the condition (2). Now in the case where k G {k^f, u_t+1,^■■■ , u_c] for any advertiser k in category j,

generates the same sequence of feedback (0, y_g u_{t+ 1} " ' _> ¾(u_e)}- In the case where k G {k' , u_t+1, -- - , u_c] for some advertiser k in category j, then under

since j has already been tested in the beginning, the negative feedback of category j is not repeated by re-allotting it. In lieu of that the policy moves on and a new feedback is obtained at the end which may be 1 or 0. Thus

generates at least as many relevant ads as

[0078] Thus the number of relevant ads shown under policy

is at least as high as that under policy

for every such event W from a set of disjoint events whose union is the entire probability space. Thus the expected number of relevant ads shown is also at least as high. [0079] Appendix 3. Proof of Theorem 2

An auction mechanism consists of two rules specified by the publisher: the allocation rule and the payment rule. The set of possible rules are defined as follows.

[0080] The allocation rule : Let Ψ be the set of all possible infinite length orderings composed of elements from the set A = {1, ··· , L, ω) corresponding to the advertisers i = 1,•••L and where ω is an element corresponding to a null assignment. No advertiser can appear more than once in an ordering. Ψ is then the set of feasible allocations of ad opportunities. For any element ψ Ε Ψ,

is the allocation for the y^'th ad opportunity: if opportunity j comes up and ijj_j = i then advertiser i gets the allotment, but if ijj_j = ω, then no-one gets the slot and the ad opportunity is passed up. An allocation rule is a mapping from the set of possible bid vectors to the set of probability distributions over Ψ, denoted by A(Psi) i.e. a function <p(b₁, ··· , &_L):x ₌₁ V_t→ Δ(Ψ). Here w.l.o.g. assume that the set of possible bids for advertiser i is same as his set of possible valuations, which is justified by the revelation principle in mechanism design (see discussion below). Denote the probability of choosing a particular ordering

by φ-ψ ₁, ··· , b_L) and with some abuse of notation denote the random variable denoting the ordering chosen by Q)₁, cdots, b_L ^'). The interpretation of the allocation rule is that, once the bids have been collected, the publisher draws the ordering xfj(b₁,^■■■ , b_L) according to the distribution φ^₁,^■■■ , b_L) and implements it.

[0081] The payment rule : The payment rule consists of specifying functions j :x ₌₁ V_k→ IRL for all i = 1,^■■■ , L. Here Mi b_lt ··· , b_L) is the payment made by advertiser i to the publisher given the reported bids.

[0082] Once the allocation rule and the payment rule are fixed, the bidding strategies of the advertisers constitute a Nash equilibrium in the resulting game. Amongst all the possible joint bidding, allocation and payment rules, it is desirable to find a rule which results in the maximum expected revenue for the publisher in the resulting equilibrium. Here the expectation is over the probability space of possible valuations of the advertisers and the possible number of advertisement opportunities C.

[0083] Designing allocation and payment rules which have certain desirable properties in the equilibria they induce seems prohibitively difficult. But the revelation principle in mechanism design says that one can restrict the search to mechanisms in which it is optimal for the agents to participate in the mechanism and to bid their true valuations. In order to define this space of mechanisms, first define what are known as the interim expected allocations and payments of each advertiser dependent on what valuations they report. Consider the function ni b ) = £^, [l{_{3 e}{i ... c}s t i ) (& v-ⁱ)=i}] ^^{or eacn} advertiser i and for each bi G Vi , which is the probability that an advertiser i getsan ad opportunity allotted to him when he reports his valuation to be b_t , assuming that all other advertisers are reporting their true valuation. The randomness in this expectation is due to the randomness of C, the randomness of the valuations of the other advertisers v^~l and the randomness of the ordering chosen as a function of the resulting bid vector.

[0084] Similarly, consider the functions m^bi) = E [Mi bi, v^~1)] for each advertiser i and for each bi G V which is the expected payment of the buyer i when he reports bi assuming that all other buyers are reporting their true valuation. Now, specify the design space of possible mechanisms by specifying the constraints for it to be optimal for each advertiser to participate (termed as the individual rationality constraint) and to bid truthfully (termed as the Bayesian incentive compatibility constraint) and state the problem for finding a mechanism which maximizes the expected revenue of the publisher.

[0085] Appendix 4. The revenue maximization problem

The optimization problem of finding the optimal mechanism can be stated as follows max [Mi -,¾)] = ^E[m_i v_i)] Equation 13 i = l i = l

subject to

B. I. C. ViUiiVi - rriiivi ≥ v^^bi - m^) V i = 1,^■■■ , Land^, b_t G V_i

I. R. ViUiivi — τπι(τ>ι ≥ 0 V i = 1, ··· , Landv_t G V_t

(Α)π_ί(ύ_ί) = ^[l{_{3 e}{i,..,c}_S._t.^,(_&i,i-')=i}] Vⁱand^ V_t

(B m_t(bi) = EiMii t.v-¹ ] Viands G V_t

- , b_M) G Δ(Φ) V - , b_L GX^ V;

[0086] An auction mechanism consists of analyzing the problem in a similar way as in the classical analysis, a mechanism is incentive compatible if and only if the associated n_t is non- decreasing and the associated interim payments satisfy

7Πι(ν>ι = mi(0) + Ti_iy{)v_i - n_t (t_t)dti Equation 14

Hence, publisher's objective is to find a mechanism that maximizes

Equation 15

1— Fi(Ui _L

— ^E t¹!^3'e{i, -,c}s.t.i<_;(u₁, -,u_i)=i}] Π_ί=1/; (u_t)di

subject to the constraint that the mechanism is:

• B.I.C. Bayesian incentive compatible which is equivalent to the requirement that n_t is non-decreasing for each i and that (14) is satisfied.

• I.R. Individually rational which is equivalent to the condition that m^O) < 0 for all i. Maximizing the expression above, it is clear that mj (O) = 0 for all i.

Choosing an allocation rule φ(ι½, · · · , u_L) which maximizes the expression

L

∑ 1— Fi (Ui

C ^ui (ιι^ ^^E [¹{3 e{i, -,c}s.t.₁ <(u₁, -,u_i)=i}]

i = l

for each possible u_x · · · , u_L

V_t and thus maximizes the integral above. To do so, allot the maximum probability of having an ad shown to the advertiser with the maximum value of ^ui— ^{1 Fl (}-^U^ as long as it is non-negative, the next highest probability to the advertiser with fi (^ui)

the next highest non-negative value of Uj— acl— ^and ^{so οη}· Specifically for each vector of valuations ( · · · , ¾), choose an ordering ^■■■ , u_L) which ranks the advertisers in the decreasing order of their virtual valuations g (u_t) = u_t — ¹ f ^U as long as ^ui ^{~ 1 U}^ > 0. If less than L advertisers are ranked in this way then rest of the ordering is filled with the null element ω, i.e. the ad opportunities are passed up. Now this allocation rule clearly maximizes the expression for the expected revenue above. Check for incentive compatibility that the resulting interim allocation rule is non-decreasing in the valuations of the advertisers. This will be true if cjf (Wj) is non-decreasing in Uj for all i. But this is true by regularity assumption on the hazard rate functions of the distributions that h_t (u) = is non-decreasing for each i, which implies that cjf (Wj) is also non-decreasing. Thus this is indeed the optimal allocation rule. Observe that this is exactly the allocation rule of the drop-out price auction with the reserve prices. The reserve prices ensure that only those advertisers form a part of the ranking for whom cjf (Wj) > 0.

[0087] For the corresponding payment rule, deduce it from equation 14. But instead, observe that any payment rule which makes truthful reporting optimal for the advertisers and for which (0) = 0 has to give the same expected interim payment for the advertisers as that given by (14) and hence the same expected payment to the publisher. And this is the case with the payment rule in the drop-out price auction with reserve prices: truthful reporting of valuations is optimal. Also if the valuation of an advertiser is 0, then his payment is 0. Thus the payment rule of the drop-out price auction is the optimal payment rule corresponding to the optimal allocation rule. [0088] Appendix 5. An efficient optimal algorithm for the lower layer

At a first glance, observe that the dynamic allocation problem for user optimality can be solved by a recursive algorithm, which defines a function that takes as input the set of types, the set of ad categories, the probability distribution over the types and the relevance matrix and gives the optimal payoff and the optimal immediate action. In the process of computing the optimal payoff, it calls itself on a smaller sub-problem and so on. But it is well known that such recursive algorithms can be very inefficient. The usual problem is when recursion leads to repeating work. That happens when you have overlapping subproblems, which is unfortunately the case in the allocation problem. Turning one of these inefficient recursive solutions into efficient iterative solutions is the role of dynamic programming.

[0089] Solving this dynamic allocation problem using iterative stochastic dynamic programming requires us to define a state space of possible histories for each opportunity k (the categories used up and the feedback obtained for each category). In order to avoid working with a state space which is larger than needed, ascertain which histories are reachable at opportunity k. This itself is fairly cumbersome. Only way to do so is to actually enumerate all the feasible sequences of binary feedbacks by fixing one user type at a time. This is the aspect where a recursive algorithm is better, since it evaluates the optimal payoff only for the states that it encounters in its path. In this section provide an efficient algorithm which combines the two approaches to solve this dynamic allocation problem by using the properties proved in lemma 2.

[0090] Continuing with the notation that was introduced in section 4, let S be the set of probable types at the first ad opportunity, with a probability distribution P_x and let E be the set of ad categories available. For each j G E, let Mj = {i G S: qj^l = 1} be the set of user types which find category j relevant. As described earlier, say that category j dominates category j' if Mji c Mj . Let J be the set of non-dominated ad categories. According to lemma 2, the immediate ad category shown in the optimal allocation is one in J.

[0091] As mentioned earlier, an important property of the optimal ad allocation mechanism is the following. Suppose you start with a set of non-dominated ad categories (which have not yet been presented) and you present one of these categories. Now, if you receive a negative feedback for this category, then the set of non-dominated categories for the new relevance matrix left after computing the posterior distribution is a subset of the set of non-dominated categories you started off with. Hence as long as you keep receiving negative feedback, the set of non-dominated categories does not grow, it can only shrink. On the other hand, after a positive feedback, completely new set of categories may become non-dominated. This presents a line of attack. Beginning from a set of non-dominated categories, use dynamic programming to decide the optimal order in which to present these categories as long as one gets a negative feedback. If any category obtains a positive feedback in the process, then 'zoom in' to the next level (eliminating all the other types) and restart with a new set of non- dominated categories. The benefit of doing so is that the state space for the dynamic program at each level can be efficiently defined. Dynamic programs for finding an optimal ordering of n objects are typically solved by computing the optimal 'payoff-to-go' for every fixed choice of first k elements in the order, for each k < n and inducting backwards. For the current problem, the key point is that this payoff-to-go for each choice of k categories does not depend on the order in which these categories were shown. This reduces the state space considerably since a state does not need to remember the order in which categories were presented, thus eliminating a lot of redundant work that a recursive algorithm would have performed.

[0092] The description of the resulting algorithm is facilitated by constructing an 'information dependency graph' which is described next. For each non-dominated ad category j G J, partition Fj = {Mj, S \ Mj] = {Fj-, F†} of S into aset of types which like that category and a set which does not. Thus by obtaining the user feedback for category j, one know which element of the partition Fj contains the true type of the user. Let the partition F be the coarsest common refinementof the set of partitions {Fj -. j G /'}, defined as

F = {n_jeJ F^ -. kj = 1,2 V;^'}. Equation 1 6

[0093] Re-label the elements of F as F = {S_lt ^■■■ , S_K] where S = u ₌₁ S_k. The idea behind the coarsest common refinement of the partitions is that by the knowledge of the feedback from all the advertisers in J, one knows precisely which element of F contains the true type of the user. Now, first note that each Sj G Mj for at least one j. Then define the following graph with two kinds of nodes:

- I/I nodes representing non-dominated ad categories (boxes), one for each ad category in/

- K nodes representing sets of user types (circles), one for each set Sj G F A type i node is connected to a category node j if S_t £ Mj . This is called the information dependency graph. Consider the example in the table 1. There are 6 types labeled from 1 to 6 and 5 ad categories labeled A to E. The number of advertisers in each category are shown in the table and so is the probability distribution of the types. Observe that category E is dominated by category A. None of the other categories are dominated and hence / = {A, B, C, D}. The information partitions for the different ad categories are as follows: F_A = {(1,2,3,4), (5,6)}, F_B = {(3,4,5), (1,2,6)}, F_c = {(4,6), (1,2,3,5)} and F_D = {(5,6), (1,2,3,4)}. The coarsest common refinement of these partitions is F = {(1,2), (3), (4), (5), (6)}. The information dependency graph is shown in Figure 5.

[0094] The information dependency graph has an important property. Since all the category nodes are non-dominated, there does not exist a category node in the graph such that all the type nodes connected to it are also connected to some other category node.

[0095] A summary of the algorithm is as follows. First decide the optimal order in which the categories in J are presented to the user. Continue presenting the categories in this order till one continues to get a negative feedback, at each time pruning off the node corresponding to the shown category along with the type nodes connected to it. The process of pruning a category node may lead to some other category node being dominated. It may also lead to a category node being redundant, which means that no type node is connected to it. Thus all such nodes have to be removed from the information dependency graph after the pruning of a node. If for any category, the feedback is 1, then exhaust all the advertisers in that category and consider a new problem with the type set being the type nodes connected to this category and the set of ad categories being all the categories that are yet to be exhausted. Define the information graph for this new problem and perform the same procedure again and so on. Thus the basic idea is that in each 'stage' you gain feedback from the non-dominated categories in an optimal order. Once you get a positive feedback from some category, you 'zoom in' to the next stage and restart the procedure. For any set W of user types with a probability distribution P, a set of ad categories E, and the feedback matrix Q , let V(W, P, E, Q) be the expected payoff in a session (from equation (2)) under the optimal allocation policy. The optimal order in which the categories in J are presented is decided through a dynamic program which makes recursive calls to this function V. To describe the algorithm, define a few functions as follows.

• A(l*V, E, Q) : Given a set of user types W , a set of ad categories E and the feedback matrix Q, this function identifies the non-dominated category nodes and returns an information dependency graph.

• Φ('_</) '· Given an information dependency graph /, and a set of category nodes J, this function returns an information graph with all the nodes of categories in J stripped off. It also removes redundant category nodes and dominated category nodes that result from this removal.

• Q(j_> ') ^: Given an information dependency graph /, and a category node j this function returns the set of user types attached to the category node j.

• Let g(I) be the set of all types and h(I) be the set of categories in an information dependency graph /.

Define the function V(W, P, E, Q).

Function V(W,P,E,Q):

If \W\ categories in E such that q[ = 1. Let M'

• Else, let J be the set of categories in A(W, E, Q). For each C G 2^] , define U (C) iteratively by backward induction from |C| = \J\ to |C| = 0 as follows:

U(J) = 0.

If = 0 then U(C) = 0, otherwise

P(i e q<j,0(I, Q

+ ^MJV(q(, (I,C , ,J\(C Uj),Q ]

P{q(j, {l,C))

P(q(j,0(I,Q

+/?(!- ^■)t (CU{/}).

P(g(0(I,C)

Return V(W, P, E, Q) = [/(ø).

Claims

Claims:

1. A method to present advertisements to a viewer of a streaming video, the method comprising: selecting a first class of advertisement to be presented in the streaming video;

arranging multiple advertisements of the first selected class into a first ordered list of decreasing value;

inserting a first advertisement of the first ordered list into the streaming video for presentation to the viewer;

requesting feedback from the viewer concerning the first advertisement of the first ordered list;

inserting a next advertisement from the first ordered list into the streaming video when the feedback is positive, the inserting step continuing sequentially until the first ordered list is exhausted, wherein feedback is requested only on the first advertisement of the first ordered list;

selecting a second class of advertisement upon exhaustion of the first ordered list, wherein the second class is also selected when the feedback is negative.

2. The method of claim 1, wherein the first class is a non-dominated equivalence class.

3. The method of claim 1, where the first class contains multiple categories of advertisement.

4. The method of claim 1, wherein arranging multiple advertisements of the first selected class into a first ordered list of decreasing value comprises arranging multiple advertisements where all of the advertisements of the first selected class are from different advertisers.

5. The method of claim 1, further comprising:

arranging multiple advertisements of the second selected class into a second ordered list of decreasing value;

inserting a first advertisement of the second ordered list into the streaming video for presentation to the viewer; requesting feedback from the viewer concerning the first advertisement of the second ordered list; and

inserting a next advertisement from the ordered list into the streaming video when the feedback is positive, the inserting step continuing sequentially until the second ordered list is exhausted, wherein feedback is requested only on the first advertisement of the second ordered list.

6. The method of claim 1, wherein inserting a next advertisement from the first ordered list into the streaming video when the feedback is positive comprises inserting a next advertisement that is within a same category of advertisements of the first class of advertisements.

7. The method of claim 1, further comprising:

charging an advertiser for presentation of the first ordered list to the viewer.

8. The method of claim 7, wherein charging an advertiser for presentation of the first ordered list to the viewer comprises using a drop-out price.

9. The method of claim 7, wherein charging an advertiser for presentation of the first ordered list to the viewer comprises using a drop-out price with reserve price.

10. The method of claim 1, wherein the step of arranging multiple advertisements of the first selected class into a first ordered list of decreasing value comprises arranging the multiple advertisements of the first selected class according to bid amounts provided by various advertisers of the multiple advertisements of the first selected class.

11. The method of claim 1 , wherein selecting a first class of advertisement to be presented in the streaming video comprises using a dynamic allocation policy.

12. The method of claim 1, wherein selecting a first class of advertisement to be presented in the streaming video comprises using a greedy allocation policy.

13. An apparatus that generates a streaming video for a viewer of a media device, the apparatus comprising: a processor having access to memory, the processor acting to select a first class of advertisement to be inserted in the streaming video;

a database interface to access multiple advertisements included in the selected first class;

a network interface used to transmit the streaming video to a media device over a network, the streaming video having inserted advertisements, the inserted advertisements presented according to a first ordered list of advertisements of the first class of advertisement, the network interface also used to request and receive feedback from the media device concerning a first advertisement in a category of advertisements;

wherein the processor inserts a next a next advertisement from the first ordered list into the streaming video when the feedback is positive, the inserting step continuing sequentially until the first ordered list is exhausted, wherein feedback is requested only on the first advertisement of the first ordered list, and wherein the processor selects a second class of advertisement upon exhaustion of the first ordered list, wherein the second class is also selected when the feedback is negative.

14. The apparatus of claim 13, wherein the processor acts further acts to arrange multiple advertisements of the second selected class into a second ordered list of decreasing value, and inserts a first advertisement of the second ordered list into the streaming video for presentation to the viewer.

15. The apparatus of claim 14, wherein the processor acts further acts to request feedback from the media device concerning the first advertisement of the second ordered list, and inserts a next advertisement from the ordered list into the streaming video when the feedback is positive, the insertion of a next advertisement continuing sequentially until the second ordered list is exhausted, wherein feedback is requested only on the first advertisement of the second ordered list.

16. The apparatus of claim 15, wherein the insertion of a next advertisement from the first ordered list into the streaming video when the feedback is positive comprises inserting a next advertisement that is within a same category of advertisements of the second class of advertisements.