11 July 2016

T 1941/11 - A central difficulty with the application

Key points

  • The present decision does not contain a particular point of case law. Rather, the decision is interesting in that in this Examination appeal, the Board seems to feel the pain of the applicant (or professional representative) for the poor drafting of the application.
  • " A central difficulty with the application has been the lack of basis for a formulation of thread scheduling in relation to pipeline structure."
  • The Board also states that " Computer processors are complicated things, as the appellant argued during oral proceedings" (at [11]). Accordingly, the detailed discussion of inventive step is omitted in this blog posts.




T 1941/11 - link


Reasons for the Decision
[] The Invention
2. The application is one of several that concern the "Sandbridge Sandblaster" processor. Two others, at least, have been the subject of appeal proceedings (T 2241/10 Pipelined multioperand adder/QUALCOMM and T 1963/11 Multithreaded processor/QUALCOMM). D1 and D4 also address aspects of this processor. The processor in question is designed for use in mobile devices, especially in view of specific requirements for digital signal processing (see, for example, D1, section 6.1). The application does not identify this context, let alone explain it. As a consequence, it has been difficult for the appellant to formulate requests that reflect the true technical background.
3. The processor, shown in Figure 4 of the application, has three processing sections. One deals with branch instructions, one with integer and load/store operations, and one with vector operations. The appellant has now restricted claim 1 to a particular situation which arises in the vector processor. It concerns the interactions between a vector processing pipeline and a plurality of threads, when a particular instruction is issued twice in succession to the vector processor (this is described in more detail, below).
4. The application sets out a problem with pipelined processors generally: when two instructions are simultaneously in the pipeline and the second requires some data generated by the first, the second will have to wait (otherwise, the second instruction will read the wrong data). This is shown in Figure 2 of the application. The pipeline is idle for one or more clock cycles, which means the processor cannot be fully utilised. The application calls such idle cycles "bubbles" and seeks to avoid them.
5. It was known completely to eliminate bubbles by processing a number of different threads. Instructions from different threads are assumed to be independent, so by interposing enough instructions from other threads, it can be assumed that a first instruction has completely finished its execution before a second instruction from the same thread enters the pipeline. If the pipeline has N stages, and there are at least N threads, then each instruction of each thread will have finished execution, before the next instruction of the same thread. This, however, assumes that each thread takes a turn at most once every N threads. As explained in D5 (page 35, right-hand column), but, unfortunately, not in the application, this has a drawback: each thread gets one N-th of the processor time, and there is no flexibility for threads that need to issue instructions more frequently or which can get by with less. D5 mentions two ways of ameliorating this. Firstly, the compiler may indicate dependencies between instructions, so that the instructions from one thread can safely be fed in to a pipeline more frequently (a sort of dependency checking). Secondly, the caching technique set out in D7 may be used. The application seeks another way of allowing instructions from a single thread safely to overlap in the pipeline.
6. As the application describes it, the invention consists in allowing one thread out of several to have two or more instructions in the pipeline at one one time, and this is achieved by having a number of threads which is smaller than the number of the pipeline stages.
7. In its full generality, the invention does not use round-robin scheduling of threads but "token triggered threading", which (again unfortunately) is not described in the application (see page 9, lines 12 - 18 of the published application), but in another application (US6842848). Although this technique is described as one possible embodiment, it is in fact the most general form of thread scheduling envisaged in terms of the order in which different threads issue instructions to the pipeline. In particular, it can implement round-robin scheduling. A central difficulty with the application has been the lack of basis for a formulation of thread scheduling in relation to pipeline structure.
8. Claim 1 according to the sole request is restricted to the situation in which one thread issues consecutive V_Mul_REDUCE instructions. There are various versions of this instruction, as the application attempts to explain (page 11, lines 14 - 23), but they all use the pipeline structure shown in the bottom row of Figure 6. Although claim 1 does not define this pipeline structure, it is the only example given and the claim must be read so as to include at least this structure. The pipeline consists of thirteen stages. When implemented in the processor of D1 (the "Sandbridge Sandblaster", also set out in US2003/0120901, to which the application refers at lines 19 - 23 on page 6) so that there are eight threads, each of which issues an instruction every eight clock cycles, the second V_Mul_REDUCE instruction enters the pipeline before the first has finished. Nevertheless, there are no bubbles. This is because the accumulator read stage (the sixth stage of the pipeline) necessarily occurs after the previous instruction has written to the accumulator. That, in turn, is because there are sufficient intervening threads.

No comments:

Post a Comment

Do not use hyperlinks in comment text or user name. Comments are welcome, even though they are strictly moderated (no politics). Moderation can take some time.