8 January 2024

T 0183/21 - Inventive training method for machine learning

Key points


  • The application relates to a system for recommending content in a video-on-demand platform. The system is trained and retrained.
  • The application states that "providing data from a huge number of users to retrain a recommender system presents challenges in that it takes up system resources." Moreover, "training activities have a high computational cost for the recommender system. Thus, the need to retrain must be balanced against the quality of recommendations being provided."
  • The claim specifies, essentially, measuring the performance, comparing it to a desired performance level, and using the comparison result to set the amount of training data provided to the recommender system. Moreover, the recommender system is trained for each user separately.
  • "[t]he recommender input controller 16 seeks to adapt the type and amount of usage data provided as training data to the recommender system 18 for each individual client device 22 to provide the minimum amount of data to drive the recommender system 18 towards the predetermined level of recommendation performance yref for each client device 22"
  • " there is a positive correlation between the amount of training data specified by the control parameter and the measured performance metric received in the subsequent iteration"
  •  "In the subsequent iteration, training data are derived from "usage data" from the client device associated with a user to which the recommendations are provided. The amount of training data derived from the usage data is based on the generated value or values of the control parameter"
  • " The technical effect of the distinguishing features listed under point 8.1 is that the use of network bandwidth required to provide the training data to the recommender system is minimised, as is the amount of storage necessary for storing said training data in the communications system including the client device and the recommender system"
  • "The board has come to the conclusion that this technical effect is achieved, on average, over substantially the whole scope of the claim (see, for example, G 0001/19, point 82)."
  • " Since achieving a maximum performance metric of the recommender system is of paramount importance in the method of document D1, the skilled person would not use a "reference performance metric" which might be different from [i.e. lower than] a "maximum achievable performance metric", and would have no motivation to consider using a closed-loop control algorithm as claimed."
  •   Starting from the disclosure of document D1, if the reference performance metric is exceeded, the skilled person would stop changing the amount of training data but would not decrease it, so that the measured performance metric oscillates towards the reference performance metric" [i.e. the possibility of decreasing the amount of training data results in oscillation of the performance].
  • The board therefore considers the subject-matter of claim 10 and that of the corresponding claims 1 and 14 of the main request to be inventive (Article 56 EPC). It follows that the decision of the examining division is to be set aside.
EPO 
The link to the decision is provided after the jump, as well as (an extract of) the decision text.




Application

1. The application relates to controlling a recommender configured to provide up-to-date predictions of user preferences for products within a large set, for example within a Video on Demand (VOD) catalogue (see the description as published, page 1, lines 3 to 7).

2. The description of the application states that providing data from a huge number of users to retrain a recommender system presents challenges in that it takes up system resources. For example, transferring user preference information from a large number of client devices takes up bandwidth within the communications network connecting the client devices to the recommender system. Moreover, training activities have a high computational cost for the recommender system. Thus, the need to retrain must be balanced against the quality of recommendations being provided (see page 2, lines 15 to 21).

3. The type and amount of training data provided to the recommender system is thus controlled in order to drive the performance of the recommender system towards a desired performance value (or to be maintained within a desired performance range). The desired performance value may not be the optimum level achievable, but may be a level available to the majority (if not all) of the users of the services being offered. The performance of the recommendations is determined using the user interaction data in the system shown in Figure 1 (see page 10, lines 4 to 11).

4. A model of a damped harmonic oscillator can be used to model the recommender system (see page 36, lines 7 to 8).

Inventive step - claim 10 of the main request

5. Lack of novelty of feature (A)

5.1 Claim 10 specifies a method of automatically controlling the performance of a recommender system in a communications system. The communications system includes a client device associated with a user to which the recommendations are provided (feature (A) of claim 10 of the main request).

5.1.1 Document D1 discloses a product recommendation system in which client computers 320, 320a and a server computer 322 are connected to communications network 380 by way of communications interfaces 382 (see paragraphs [0068], [0069] and Figure 3).

5.1.2 In the method disclosed by document D1, consumers use client computers 320, 320a to communicate subjective and/or objective consumer data 310 to server 322. Server 322 then acts upon and/or stores the consumer data in data storage element 370. Server 322 uses the consumer data as well as other information stored in storage element 370 to generate product recommendations 314. The product recommendations 314 are delivered over communications network 380 for presentation to the consumer at the requesting client computer 320, 320a (see paragraph [0078]). Figure 3 of document D1 is reproduced below:

FORMULA/TABLE/GRAPHIC

5.1.3 Therefore, document D1 discloses "a recommender system in a communications system, the communications system including a client device associated with a user to which the recommendations are provided", i.e. a part of feature (A).

5.1.4 In the method of document D1, product recommendations and ancillary information are generated to periodically improve the accuracy of the recommendations (see paragraph [0003]). The product recommendation engine of document D1 may utilise a neural network (see paragraph [0082]). Where the recommendation engine utilises a neural network, predictions and actual consumer responses to product use are used periodically to re-train the algorithms residing in the hidden layers so that its future outputs (e.g., product recommendations) correlate more closely with the consumer feedback (see paragraph [0084]).

5.1.5 Thus document D1 also discloses the remaining part of feature (A), i.e. a method of automatically controlling the performance of the recommender system in a communications systems.

6. Lack of novelty of feature (B)

6.1 In the method of claim 10, a "measured performance metric" of the recommender system derived from "a combination of one or more recommendations previously provided by said recommender system to a user and usage data from the client device associated with the user to which the recommendations were provided" is iteratively received (feature (B)).

6.2 In the method of document D1, a consumer's experience with a product (for example a skin care product such as soap, see paragraphs [0032] and [0066]) is recorded in terms of preference and/or performance metrics. Preference reflects the user's overall experience, and may include factors related to any perceived improvement in the consumer's various concerns, as well as more subjective aesthetic factors. Performance rates the extent to which a product reduced the signs or other conditions or symptoms associated with each concern in a category and may comprise subjective and objective components. Diagnostic data may be obtained from one or more measurement tools that measure a property related to a concern of the consumer (see paragraphs [0092] to [0094]). The use of diagnostic tools enables objective measurements that help dimension the needs levels of consumers to be obtained (system input) and/or the responses of a substrate to a particular product to be tracked (performance feedback) (see paragraph [0099]).

During the retraining of the product recommendation engine of system 1200, the product recommendations 1204 are compared to actual consumer feedback 1208 in order to adjust product attributes 1201 (see paragraph [0170] and Figure 12, as well as paragraph [0100] and Figure 7).

In the forward or recommending aspect of the method, the state or condition and any historical diagnostic responses of a substrate measured with the devices may be used to generate product recommendations. In the reverse or retraining aspect of the method, the objective measurements of substrate responses to products may be used to retrain the product recommendation engine, which may include product attribute refinement, and/or to update consumer profiles (see paragraph [0101]).

6.3 The board has mapped the "one or more recommendations" of feature B of claim 10 to the "product recommendations" feature of document D1 and the "usage data" of feature B of claim 10 to one or more of the "performance feedback" and "consumer feedback" features of document D1.

6.4 Therefore, document D1 also discloses feature (B).

7. Recommending products is not generally recognised as having technical character

7.1 The board notes that recommending products is not generally recognised as having technical character (see T 1869/08, Reasons 2.6 to 2.10, and T 0306/10, Reasons 5.2) and based on the minutes of the oral proceedings before the examining division, point 2, the appellant agreed that recommending products does not have technical character. It argued that the purpose of the invention was rather to limit the amount of resources used.

8. The distinguishing features of claim 10 of the main request over document D1

8.1 Features C to H constitute the distinguishing features of claim 10 of the main request over document D1.

8.1.1 The method of claim 10 differs from the method disclosed by document D1 in that a "predetermined reference performance metric (yref)" of the recommender system is compared with the "received measured performance metric" of a previous iteration (y(ti-1)) (feature (C)). In this way, a "difference value" of the previous iteration (e(ti-1)) is determined.

The description of the application states that the "predetermined reference performance metric (yref)" is received from a service platform or service provider (see the description of the published application, page 31, lines 1, 2 and 18 to 24; Figure 8). The board notes that achieving this "predetermined reference performance metric (yref)" is not a technical purpose but is instead part of a requirements specification, because the term "performance of a recommender" relates here to the relevance to the user(s) of the recommendation(s) provided and not to a performance related to a technical effect. According to the description, the term "performance level" or "metric" is used to represent a measurable performance characteristic such as the percentage of recommended items being interacted with in some way by the user, e.g. one out of 100 recommendations was selected by a user (see description, page 11, lines 15 to 18).

Moreover, the "predetermined reference performance metric (yref)" may not be the optimum level achievable (see the description, page 10, lines 4 to 8).

8.1.2 In the method of claim 10, the difference value of the previous iteration (e(ti-1)) is input to a closed-loop control algorithm in order to generate a value or values of a control parameter or parameters (u(ti)) (feature (D)).

Equation (6) of the description and Figure 9 of the application are reproduced below. These illustrate an example of control parameter(s) (u(ti)) (or number of (new) training samples):

FORMULA/TABLE/GRAPHIC

FORMULA/TABLE/GRAPHIC

In this example, a PID (proportional, integral, differential) closed-loop controller is used with a set of values for the three control variables (referred to below as C, B and D for the proportional, integral and differential control variables respectively) with values for the control variables C, B and D preferably being chosen using a model of a damped harmonic oscillator to model the dynamic performance of the recommender system (see the description, page 31, line 26, to page 37, line 32; see also page 10, lines 29 to 32).

This is based on the analysis which reveals that the rate of change of performance is proportional to the number of new training samples u(t) entering the system (see page 32, lines 3 to 5). In fact, users continue to rate items over time (see page 31, lines 5 to 7).

8.1.3 The control parameter or parameters (u(ti)) control(s) the recommender system (18) in such a way as to cause the difference value in a subsequent iteration to tend towards zero, or, in other words, the measured performance metric of the recommendation system in the subsequent iteration (y(ti)) tends towards the predetermined reference performance metric (yref). This is achieved by providing the calculated amount u(ti) of training data to the recommender system in the subsequent iteration (feature (E) and the part of feature (F) after the phrase "such that").

The board refers to page 31, line 5, of the description of the application, which states that the amount of data needed to drive y(t) towards yref is determined, and also to page 31, lines 12 to 16, which states that "[t]he recommender input controller 16 seeks to adapt the type and amount of usage data provided as training data to the recommender system 18 for each individual client device 22 to provide the minimum amount of data to drive the recommender system 18 towards the predetermined level of recommendation performance yref for each client device 22" (emphasis added by the board).

8.1.4 The training data have the characteristic that there is a positive correlation between the amount of training data specified by the control parameter and the measured performance metric received in the subsequent iteration (y(ti)) (the remaining part of feature (F)).

The board notes that the "iteration" corresponds to the "recommender cycle" which is the "repeatable process by which the recommender system is trained". The duration of a recommender cycle is the time interval which elapses between recommendations being generated (see description, page 3, lines 31 to 35).

The board also notes that a "positive correlation" exists between two variables when one variable decreases as the other variable decreases, or when one variable increases while the other increases. The term "positive correlation" was first used in dependent claims 2 and 12 as originally filed. The description states that "[t]he amount of training data submitted may be increased or decreased by specifying that only data of a certain level of ranking (in terms of likeliness of indicating a user preference for some particular content) is submitted, and then increasing or lowering the ranking of data records and/or data categories assigned according to their likely usefulness to the recommender system which are sent to the recommendation system so that the recommendations are driven towards a desired level of accuracy in terms of predicting future user preferences" (see the description, page 5, lines 9 to 16; see also page 4, lines 4 to 9, and page 10, lines 4 to 8).

The description also states that by means of a modelling of the dynamic variation of the recommendation performance "the recommender input controller 16a adjusts the amount of training data it provides to drive the recommender system 18 in cycle i to provide recommendations in cycle i+1 which have a lower performance (e.g. they are less accurate recommendations) if in the i-1 cycle the recommendations for that user were determined to be above a reference accuracy/performance value (yREF), and to drive the recommendations provided towards an improved performance level in cycle i+1 if in cycle i-1 the recommendations were less accurate than specified by the reference accuracy value (yREF) (i.e. if the measured recommender system performance is below the reference performance level yREF)" (see page 10, line 29 to page 11, line 7).

8.1.5 In the subsequent iteration, training data are derived from "usage data" from the client device associated with a user to which the recommendations are provided. The amount of training data derived from the usage data is based on the generated value or values of the control parameter or parameters (u(ti)) (feature (G)).

The board notes that the "training data" are thus data chosen according to their estimated usefulness for training the recommender system as to the user's preferences. In other words, the client device 22 processes stored "user interaction data" to generate "likeness records" (specifying an indication of a heuristic user preference towards a particular item) and to categorise and/or rank the relevance of each item or record of usage data or each generated likeness record, i.e. to determine the (possibly relative) estimated usefulness of such an item or record for training the recommender system as to that user's preferences (see the description, page 14, lines 9 to 14, and page 14, line 16, to page 15, line 5).

In other words, recommender input controller 16 is performing two separate functions based on the received usage data/heuristic data records/likeness records, namely calculating a performance metric and selecting a portion of the data to be input to the recommender system as training data (see the description, page 21, lines 1 to 4).

Moreover, a ranking condition for forwarding, for example, a minimum ranking level, is imposed on all of the user interaction data records by the input controller to ensure that only data records which have been assigned a category and/or ranking in terms of their relevance above a dynamically adjustable level are forwarded to the recommender system. This ranking condition for forwarding, however, is capable of being individually determined for each individual user interaction data record set, i.e., for each user, which enables a varying amount of data to be provided to the recommender system to retrain its performance for the respective user (see the description, page 27, lines 26 to 34).

8.1.6 The derived training data are provided to the recommender system (18) via the communications system (feature (H)).

9. The technical effect of the distinguishing features

9.1 The technical effect of the distinguishing features listed under point 8.1 is that the use of network bandwidth required to provide the training data to the recommender system is minimised, as is the amount of storage necessary for storing said training data in the communications system including the client device and the recommender system (see the description, page 5, lines 8 and 9; see also feature (A)). The amount of training data is indirectly limited via the tendency/convergence of the measured performance metric towards, or oscillation around, the predetermined level of recommendation performance yref, which is not necessarily the maximum achievable level of recommendation performance.

9.2 The board has come to the conclusion that this technical effect is achieved, on average, over substantially the whole scope of the claim (see, for example, G 0001/19, point 82).

10. The objective technical problem to be solved

10.1 The objective technical problem to be solved is therefore to reduce the use of network bandwidth and the amount of storage in a communications system including a client device and a recommender system in communication with the client device.

11. Obviousness of the claimed solution

Starting from document D1, the skilled person would not arrive at the subject-matter of claim 1 of the main request for the following reasons:

11.1 Document D1 discloses that "[i]n embodiments of the invention that periodically examine the quality of predictions, the neural network operating on all available inputs can find better predictive models for each output parameter" (D1, paragraph [0163]).

Therefore, document D1 aims at finding better predictive models and thus at achieving a maximum performance (metric) of the recommender system.

11.2 Document D1 also discloses that "[t]he data processing algorithms of the invention are re-trained to reduce the differences between actual feedback and earlier predictions" [note from the board: of feedback or preferences and/or performances] (see paragraphs [0168] and [0169]).

Therefore, document D1 always strives to achieve a maximum performance (metric) of the recommender system.

11.3 Since achieving a maximum performance metric of the recommender system is of paramount importance in the method of document D1, the skilled person would not use a "reference performance metric" which might be different from a "maximum achievable performance metric", and would have no motivation to consider using a closed-loop control algorithm as claimed.

11.4 However, even if the skilled person were to use such a "reference performance metric", they would not be able, without exercising inventive skills, to derive the amount of training data and also the specific training data per se (i.e. implicitly the nature of the training data or the chosen training data, see point 8.1.5), from a "positive correlation" with the measured performance metric (yielding either an increasing or decreasing amount of training data) and usage data respectively.

11.5 But the board is of the opinion that the skilled person would incrementally increase the amount of training data until the reference performance metric is at least almost achieved. Starting from the disclosure of document D1, if the reference performance metric is exceeded, the skilled person would stop changing the amount of training data but would not decrease it, so that the measured performance metric oscillates towards the reference performance metric.

11.6 The board believes that a skilled person reading claim 1 with a mind willing to understand and in the light of the description (in particular the passages cited above), would also understand claim 10, and therefore claim 10 is clear (see Article 84 EPC).

12. The board therefore considers the subject-matter of claim 10 and that of the corresponding claims 1 and 14 of the main request to be inventive (Article 56 EPC). It follows that the decision of the examining division is to be set aside.

13. The board has not, however, examined the dependent claims. Consequently, the case is to be remitted to the department of first instance for further prosecution (Article 111(1) EPC).| |

No comments:

Post a Comment

Do not use hyperlinks in comment text or user name. Comments are welcome, even though they are strictly moderated (no politics). Moderation can take some time.