Index Previous Next

Store followed by fetch

Consider the following code:

   public class test {

       public void g() {
           int i1, i2, i3, i4 ;

           i1 = i2 = i3 = 0 ;

           i4 = 9 ;
           i1 = i4;
       }
   }

The last two assignment statements result in this assembly language code:

       bipush 9
       istore 4
       iload 4
       istore_1

That is, we push 9 on the stack, store it in local variable 4, fetch the contents of local variable 4 (which must be 9) and then store that to local variable 1. This sequence of instructions is equivalent to the shorter:

       bipush 9
       dup
       istore 4
       istore_1

Or rather, it's equivalent apart from its effect on the stack. The second sequence uses one more word on the stack than the first. One of the security mechanisms implemented by the Java Virtual Machine is a check on the number of words of stack used. The compiler encodes the maximum stack size into every method. In Jasmin this is represented using the '.limit stack' directive which appears at the start of each method in the D-Java output.

Now, the compiler can work out the stack size from its knowledge of the entire method. Our peephole optimiser doesn't have this advantage. It may be that the instructions we're trying to optimise occur when the stack is already at its maximum allowed height. Adding an extra item to the stack will then cause a runtime exception.

Since we don't know for sure if this will happen we have to err on the side of caution and increase the stack limit. However, when we're performing this optimisation we've already processed the line containing the .limit directive. To work around this I use the following optimisation rule:

       istore %1
       iload %1
   =
       dup
   ; increase stack limit by 1
       istore %1

Then, after the optimisation is complete, the jopt script uses awk to replace the comment with a suitable .limit directive. Fortunately, Jasmin uses the last .limit directive that it finds in any method. In this case the final optimised code is:

       bipush 9
       dup
       .limit stack 3
       istore 4
       istore_1

Incidentally, the Java code in the example above could be rewritten as:

        i1 = i4 = 9 ;

In this case the compiler generates the same code as our final optimisation, except that it knows that the stack limit of 2 is sufficient.

The example used a local integer variable. The same sort of technique can be applied to other types of local variable, though dup2 has to be used instead of dup for longs and doubles. Similar optimisations are also possible for class variables. Instance variables are a different matter and are considered on the next page.

Here are the results of timing tests for 1.5 million iterations:

       Instructions      Bytes    Unoptimised       Optimised
                         saved     time (ms)        time (ms)

       put/getstatic Z     2          864              760 
       put/getstatic C     2          879              762 
       put/getstatic I     2          869              770 
       put/getstatic F     2          869              761 
       put/getstatic L;    2          890              773 
       put/getstatic S     2          889              762 
       put/getstatic [     2          867              763 
       put/getstatic J     2          985              878 
       put/getstatic D     2          958              870 

       istore/iload        1          667              657 
       fstore/fload        1          665              653 
       astore/aload        1          728              657 
       lstore/lload        1          679              747 
       dstore/dload        1          749              748 

The only combination which doesn't give improved performance after optimisation is the long local variable.


Index Previous Next