15 January 2025

T 1425/21 - Would improved machine learning be technical?

Key points

  • This decision was published in April 2024, but I missed it then. Credits to Phyllis Luana Graf (linkedin) for flagging it on LinkedIn.
  • The invention is about machine learning, applicant Google. The parent application was filed in 2015 and was granted.
  • "The application relates to machine learning models such as deep neural networks. It proposes to approximate "cumbersome" machine learning models with "distilled" machine learning models which require less computation and/or memory when deployed. For instance the distilled model may be a neural network with fewer layers or fewer parameters. The cumbersome model may be an ensemble classifier, possibly combining full classifiers with specialist classifiers. The distilled model is trained on a "plurality of training inputs" and the associated outputs of the cumbersome model, so as to "generate outputs that are not significantly less accurate than outputs generated by the cumbersome machine learning model"
  • "The Board notes that the features differentiating the invention from D1, or even the entire set of features defining the distilled model and its training, as a difference to a known cumbersome learning model, are mathematical methods which cannot, under the estab­lished case law of the boards of appeal (the "COMVIK" approach), be taken into account for inventive step unless they contribute in a causal manner to a tech­nical effect."
  • "[The Board] accepts that the distilled model has reduced memory requirements when compared to the cumbersome model; after all this is expressly claimed. However, a reduction in storage or computational requirements of a machine learning model is insufficient, by itself, to establish a technical effect. One also has to consider the performance of the "reduced" learning model"
    • The Board could have explained its reasoning in more detail. Is a technical effect the same as progress? Is it a generally accepted principle that a technical advantage constitutes no "technical effect" if there is also a technical disadvantage? E.g,. for a chemical reactor (my technical field), does a higher throughput constitute no technical effect if the reactor is more expensive? (or vice versa)? Or is this only a rule for software?
  • "It is not credible in general that any model with fewer parameters can be as accurate as the more complex one it is meant to replace. For example, the complexity or architecture of the reduced model may be insufficient or inadequate for the given problem."
  • "The Board also does not see that the temperature-based training process ensures that the smaller model has an equivalent accuracy. It is not clear how exactly the temperature must be first set (for both models), and then varied, and what accuracy may be expected. The application simply does not discuss this."
    • "The soft outputs [of the distilled model] represent a class probability obtained according to a form of the softmax equation using a "temperature" parameter T, which is set higher during training than during subsequent use. [The softmax equation uses e^(x/T) instead of e^x.
    • I can't evaluate the Board's technical assessment, of course. Apparently, there was not a lot of evidence on the file.
  • " In principle, it appears possible to argue that the smaller model represents a "good" trade-off between resource requirements and accuracy, i.e that the smaller model may be less accurate but have (predic­tably) smaller resource requirements. However, the application lacks any information in that regard."
  • Is the decision consistent with T 1952/21

EPO 
The link to the decision and an extract of it can be found after the jump.


16. The Board notes that the features differentiating the invention from D1, or even the entire set of features defining the distilled model and its training, as a difference to a known cumbersome learning model, are mathematical methods which cannot, under the estab­lished case law of the boards of appeal (the "COMVIK" approach), be taken into account for inventive step unless they contribute in a causal manner to a tech­nical effect.

17. It accepts that the distilled model has reduced memory requirements when compared to the cumbersome model; after all this is expressly claimed. However, a reduction in storage or computational requirements of a machine learning model is insufficient, by itself, to establish a technical effect. One also has to consider the performance of the "reduced" learning model (see decision T 702/20, reasons 14.1, from this same Board).

18. It is not credible in general that any model with fewer parameters can be as accurate as the more complex one it is meant to replace. For example, the complexity or architecture of the reduced model may be insufficient or inadequate for the given problem.

19. The Board disagrees with the Appellant's counter-argument that the invention (by "knowledge transfer" see point 8 above) reliably ensures that any given smaller network can provide the same accuracy as the given larger one. The input and output complexity is the same for both networks. Hence, also the smaller network must be complex enough to be able to model the input-output relationship (see e.g. D1, section 4.3, for a discussion on accuracy and complexity of approximating classifiers of a single type).

19.1 The Board also does not see that the temperature-based training process ensures that the smaller model has an equivalent accuracy. It is not clear how exactly the temperature must be first set (for both models), and then varied, and what accuracy may be expected. The application simply does not discuss this.

19.2 Since, in the Board's view, the claim does not imply a step of selecting or obtaining a smaller model, but simply defines one as a given, the Appellant's arguments relating to trial and error are not pertinent (and even if they were, they would not succeed, see below).

20. The Board concludes therefore that the technical effect advanced by the Appellant (see point 11 above) cannot be acknowledged over the whole scope of the claim, i.e. for all sets of smaller and larger models. The second model may use fewer resources, but it cannot be said to produce the same results and many smaller models will, in fact, be considerably worse.

20.1 In principle, it appears possible to argue that the smaller model represents a "good" trade-off between resource requirements and accuracy, i.e that the smaller model may be less accurate but have (predic­tably) smaller resource requirements. However, the application lacks any information in that regard.

20.2 Since no technical effect can be acknowledged, claim 1 of the main request lacks an inventive step.

No comments:

Post a Comment

Do not use hyperlinks in comment text or user name. Comments are welcome, even though they are strictly moderated (no politics). Moderation can take some time.