Consider the following code:
public class test { public void g() { int i1, i2, i3, i4 ; i1 = i2 = i3 = 0 ; i4 = 9 ; i1 = i4; } }
The last two assignment statements result in this assembly language code:
bipush 9 istore 4 iload 4 istore_1
That is, we push 9 on the stack, store it in local variable 4, fetch the contents of local variable 4 (which must be 9) and then store that to local variable 1. This sequence of instructions is equivalent to the shorter:
bipush 9 dup istore 4 istore_1
Or rather, it's equivalent apart from its effect on the stack. The
second sequence uses one more word on the stack than the first. One of
the security mechanisms implemented by the Java Virtual Machine is a
check on the number of words of stack used. The compiler encodes the
maximum stack size into every method. In Jasmin
this is represented using the '.limit stack'
directive which
appears at the start of each method in the D-Java output.
Now, the compiler can work out the stack size from its knowledge of the entire method. Our peephole optimiser doesn't have this advantage. It may be that the instructions we're trying to optimise occur when the stack is already at its maximum allowed height. Adding an extra item to the stack will then cause a runtime exception.
Since we don't know for sure if this will happen we have to err on the
side of caution and increase the stack limit. However, when we're performing
this optimisation we've already
processed the line containing the .limit
directive. To
work around this I use the following optimisation rule:
istore %1 iload %1 = dup ; increase stack limit by 1 istore %1
Then, after the optimisation is complete, the jopt
script
uses awk to replace the comment with a suitable .limit
directive. Fortunately, Jasmin uses the last .limit
directive that it finds in any method. In this case the final optimised
code is:
bipush 9 dup .limit stack 3 istore 4 istore_1
Incidentally, the Java code in the example above could be rewritten as:
i1 = i4 = 9 ;
In this case the compiler generates the same code as our final optimisation, except that it knows that the stack limit of 2 is sufficient.
The example used a local integer variable. The same sort of technique can
be applied to other types of local variable, though dup2
has
to be used instead of dup
for longs and doubles. Similar
optimisations are also possible for class variables. Instance variables
are a different matter and are considered on the next page.
Here are the results of timing tests for 1.5 million iterations:
Instructions Bytes Unoptimised Optimised saved time (ms) time (ms) put/getstatic Z 2 864 760 put/getstatic C 2 879 762 put/getstatic I 2 869 770 put/getstatic F 2 869 761 put/getstatic L; 2 890 773 put/getstatic S 2 889 762 put/getstatic [ 2 867 763 put/getstatic J 2 985 878 put/getstatic D 2 958 870 istore/iload 1 667 657 fstore/fload 1 665 653 astore/aload 1 728 657 lstore/lload 1 679 747 dstore/dload 1 749 748
The only combination which doesn't give improved performance after optimisation is the long local variable.