8 September 2016

T 0022/12 - Complexity of algorithm

Key points

  • The refused application relates to spam detection for emails. The claim specifies the use of a specific mathematical method ("SVM"). 
  • The Board finds that " the classification of messages as a function of their content is not technical per se. In this regard, it is immaterial whether the messages are electronic messages, because, even though an email has technical properties, it is the content of the email that is classified. Furthermore, mathematical methods as such are not technical and the application of a mathematical method as such in a non-technical analysis of message content does not change that." 
  • The Board applies established case law, and holds that it "does not consider that reducing the complexity of an algorithm is necessarily a technical effect, or evidence of underlying technical considerations. That is because complexity is an inherent property of the algorithm as such. If the design of the algorithm were motivated by a problem related to the internal workings of the computer, e.g. if it were adapted to a particular computer architecture, it could, arguably, be considered as technical []. However, the Board does not see any such motivations in the present case."

T 0022/12 - link

Reasons for the Decision
1. The invention
1.1 The invention concerns the classification of emails, e.g. as either spam or legitimate mail (page 10, lines 18 to 21 of the published application).
1.2 An incoming email is first analysed to determine whether it contains one or more features in a set of predetermined features that are particularly characteristic of spam (page 10, line 28 to page 11, line 3). Two types of feature are used: word-oriented and "handcrafted". The former refers to the presence of particular words, or stems of words, the latter to features determined through human judgement alone (page 22, line 25 to page 23, line 26). Examples of handcrafted features are multi-word phrases and non-word distinctions such as formatting attributes, sender address, and delivery attributes. For instance, most spam messages are sent at night from ".com" or ".net" domains (page 23, lines 11 and 18 - 19).
1.3 An N-dimensional feature vector, with one element for each feature in the set, is produced for each email (page 23, lines 28 to 30). The feature vector is input to a probabilistic classifier which generates an "output confidence level". The output confidence level is then compared with a threshold to produce an indication of whether or not the email is spam (page 24, lines 3 to 16).
1.4 In the claims of all the requests, the probabilistic classifier is a Support Vector Machine (SVM). An SVM is a learning algorithm for assigning objects one of two distinct classes. It is trained using objects with known classifications (for example, a set of emails that have been classified manually) to define a hyperplane, or hypersurface, that provides maximum separation between the two classes in feature vector space (page 39, lines 23 to 28). Objects to be classified are mapped to that same space and assigned to a class based on which side of the hyperplane they fall on.
SVMs can be implemented efficiently, because the hyperplane can normally be defined in terms of a small subset of feature vectors.
2. Main request - inventive step (Article 56 EPC)
2.1 The Examining Division considered that the invention, in claim 1 according to the main request before it, did not involve an inventive step over a notoriously known computer, since the only contribution was to a field excluded from patentability under Article 52(2) EPC, namely mathematical algorithms used for the purpose of linguistic analysis.
2.2 The Board agrees with the Examining Division that the classification of messages as a function of their content is not technical per se. In this regard, it is immaterial whether the messages are electronic messages, because, even though an email has technical properties, it is the content of the email that is classified. Furthermore, mathematical methods as such are not technical and the application of a mathematical method as such in a non-technical analysis of message content does not change that. Thus, if there is a technical effect, it can only reside in the automation of the email classification using a computer. The technicality of the computer is not enough to establish a technical effect of any method that it executes.


2.3 The appellant argued that classification based on a combination of "handcrafted features" and "word-oriented features", in claim 1 of the main request, had the technical effect of reducing processing load. In the prior art, for example in A1, where the classification was based on automatically-detectable, word-oriented features, more processing would be required to extract the same information as that provided by the handcrafted features. For example, detecting the phrase "weight loss" (a phrase that seems to occur frequently in spam email), using only word-oriented features, would require the evaluation of all combinations of two words. Since the handcrafted features allowed a more compact representation of features, they contributed to a less complex computer implementation.
2.4 The Board is not persuaded that the alleged effect is actually achieved by the invention. There is no link between the word-oriented and handcrafted features, so that the latter reduces the processing involved in the former. The handcrafted features are, rather, a different class of features that the user considers indicative of spam, but which cannot be expressed in terms of the presence of individual words. Simply adding a second class of features to the analysis increases the load rather than reducing it.
Furthermore, the Board does not consider that the de-automation of a computer-implemented method, by making a human perform steps that a computer could do automatically, is a technical solution to a technical problem. Any reduction in computer processing would be a mere consequence of the de-automation.
2.5 In the Board's view, handcrafted features relate to information content that is considered as indicative of spam. Including such features in the analysis might, if well chosen, improve the quality of the classification, but the designation of a second class of features does not provide a technical effect.
2.6 It is common ground that the SVM, being a mathematical method, does not, as such, provide a technical effect. The appellant argued, however, that there was a technical effect in the particular combination of an SVM and a sigmoid function, as in claim 1. Performing the method in two stages, first using an SVM, and then applying an adjustable sigmoid function as a threshold to the output of the SVM, reduced the processing load, which reduced the complexity of the computer implementation. Thus, the invention was motivated by technical considerations of the computer implementation.
2.7 The Board is not persuaded by the appellant's arguments on this point. The Board does not find support, anywhere in the application, for the classifier being updated by adjusting the sigmoid parameters alone, without retraining the SVM.
As shown in figure 3B, the generation of parameters for the classifier during the training phase involves two steps (page 39, lines 6 to 15):
1) first the weight vector w is determined by conventional SVM training methods (page 39, line 18 to page 50, line 22);
2) second the optimal sigmoid parameters are calculated by using a maximum likelihood on the training data (page 50, line 24 to page 54, line 2)
There is nothing to suggest that re-training may involve only one of those steps, or that the classifier may be updated by simply adjusting the parameters A and B. On the contrary, it is the teaching of the application that, when the conditions of what is considered as spam change (e.g. when the user reclassifies a message) the whole classifier is retrained (page 36, lines 14 to 30).
2.8 Furthermore, the Board does not consider that reducing the complexity of an algorithm is necessarily a technical effect, or evidence of underlying technical considerations. That is because complexity is an inherent property of the algorithm as such. If the design of the algorithm were motivated by a problem related to the internal workings of the computer, e.g. if it were adapted to a particular computer architecture, it could, arguably, be considered as technical (see T 1358/09, point 5.5, referring to T 258/03 "Auction method/HITACHI", OJ EPO 2004, 578, point 5.8). However, the Board does not see any such motivations in the present case.
2.9 Thus, the Board is not persuaded that the use of an SVM in combination with a sigmoid threshold function contributes, technically, to the the computer implementation. The Board rather considers this to be a mathematical method.
2.10 In the Board's view, the technical implementation of the method consists in programming the computer to perform the method steps. The Board concurs with the Examining Division that this would have been a routine task for the skilled programmer.
2.11 Therefore, the Board concludes that the subject-matter of claim 1 lacks an inventive step (Article 56 EPC).

No comments:

Post a Comment

Do not use hyperlinks in comment text or user name. Comments are welcome, even though they are strictly moderated (no politics). Moderation can take some time.