Index Previous Next

Store followed by fetch (2)

On the previous page we considered how to optimise a store followed by a fetch using the same local or class variable. We might consider using the same procedure for instance variables. Take, for example, this code:

   public class test {

       int i ;

       public void g() {
           int j ;

           i = 9 ;
           j = i ;
       }
   }

When this is compiled and disassembled we find that the last two assignments generate this assembly language code:

       aload_0
       bipush 9
       putfield test/i I
       aload_0
       getfield test/i I
       istore_1

To optimise this we have to identify the store and the subsequent fetch, and ensure that they refer to the same variable. The problem is that the instructions making up the store (aload_0 and putfield test/i I) are wrapped around the instruction (or instructions) to get the value to be stored (bipush 9 in this case).

At first I thought that this rule would work:

       aload%1
       %2
       putfield %3 I
       aload%1
       getfield %3 I
   =
       aload%1
       %2
   ; increase stack limit by 1
       dup_x1
       putfield %3 I

The dup_x1 instruction takes the item on top of the stack (which is 9 in the above example) and puts a copy of it under the second-to-top item. When the top item and the object reference below it are consumed by the putfield the duplicate of the top item is left on the stack as if it had been fetched.

The number of bytes saved will depend on whether the aload is the one byte or two byte form. Replacing the aload (one or two bytes) and putfield (three bytes) with a dup_x1 (one byte) will save three or four bytes. (You may have noticed that I haven't considered the wide form of any instructions. I don't think they're sufficiently common to merit much attention.)

Now, the code to obtain the value to be stored can be more than just a bipush. In some cases this can cause the optimisation rule to generate incorrect code. For example:

   public class test {

       public int i ;

       public int f() {
           return 8 ;
       }

       public void g() {
           test b = new test() ;
           int j ;

           b.i = f() ;
           j = i ;
       }
   }

Here the value to be stored is the return value of a method call. Also, it is being stored in the instance variable, i, of one test object, but the value that's being fetched in the next statement is from the same variable in a different test object. Here's what javac generates for the two assignment statements:

       aload_1
       aload_0
       invokevirtual test/f()I
       putfield test/i I
       aload_0
       getfield test/i I
       istore_2

The code to fetch the value to be stored consists of two instructions (aload_0 and invokevirtual test/f()I). The simple optimisation rule generates a false match. This results in the incorrect code:

       aload_1
       aload_0
       invokevirtual test/f()I
       .limit stack 3
       dup_x1
       putfield test/i I
       istore_2

The correct method is called (this.f()) and the return value is put under the first two stack items. Then the value is stored in the correct instance variable (b.i), leaving the value at the top of the stack. But it isn't the value we wanted! It should be this.i, but instead it's b.i,

The problem is that the optimisation rule isn't sufficiently precise. The '%2' should only match instructions which put an integer value on the stack. To make the rule work we need to specify examples of such instructions explicitly. The following rule will certainly work:

       aload%1
       bipush %2
       putfield %3 I
       aload%1
       getfield %3 I
   =
       aload%1
       bipush %2
   ; increase stack limit by 1
       dup_x1
       putfield %3 I

In similar vein we could use iconst_%2, sipush %2, ldc %2 and iload%2. In each case we have a single instruction which pushes an integer onto the stack. Actually, ldc %2 can push a float or a String, but it's use here is clear from the context. Booleans are a simpler case, as they only require iconst_%2 and iload%2 to be handled.

It becomes quite tedious to have to consider all these special cases. The original (but wrong) substitution at least had the advantage of being concise. After adding these special cases for integers and booleans to the rules file I got bored and haven't handled any other types.

There some other special cases which are sufficiently common to merit attention. Firstly, there is the construction of an object followed by a call to one of its methods. This seems to be quite a common sequence, as objects are first created, stored to an instance variable and then used.

   public class test {
       public test t ;

       public void f() {
       }

       public void g() {
           t = new test() ;
           t.f() ;
       }
   }

The code generated by javac for this is:

       aload_0
       new test
       dup
       invokespecial test/<init>()V
       putfield test/t Ltest;
       aload_0
       getfield test/t Ltest;
       invokevirtual test/f()V

which can be rewritten as:

       aload_0
       new test
       dup
       invokespecial test/<init>()V
       .limit stack 4
       dup_x1
       putfield test/t Ltest;
       invokevirtual test/f()V

The second special case is where the value to be stored is an object reference from a local variable. This can be handled by the rule:

       aload%1
       aload%2
       putfield %3 L%4;
       aload%1
       getfield %3 L%4;
   =
       aload%1
       aload%2
   ; increase stack limit by 1
       dup_x1
       putfield %3 L%4;

Once again I have made some performance measurements:

       Instructions      Bytes    Unoptimised       Optimised
                         saved     time (ms)        time (ms)

       bipush              3         1921             1278 
       new object          3         1132              587 

1,500,000 iterations were used in the first case, 500,000 in the second. Results for sipush and ldc are much the same as for bipush and are not shown. Actually, I'd have expected the timings for the new object case to be similar as well, but even allowing for the different number of iterations they clearly aren't.


Index Previous Next