Question

91

Why is webAssembly function almost 300 time slower than same JS function

rated 0 times [ 97] [ 6] / answers: 1 / hits: 15523 / 7 Years ago, tue, january 9, 2018, 12:00:00

Find length of line 300* slower

First of I have read the answer to Why is my WebAssembly function slower than the JavaScript equivalent?

But it has shed little light on the problem, and I have invested a lot of time that may well be that yellow stuff against the wall.

I do not use globals, I do not use any memory. I have two simple functions that find the length of a line segment and compare them to the same thing in plain old Javascript. I have 4 params 3 more locals and returns a float or double.

On Chrome the Javascript is 40 times faster than the webAssembly and on firefox the wasm is almost 300 times slower than the Javascript.

jsPref test case.

I have added a test case to jsPref WebAssembly V Javascript math

What am I doing wrong?

Either

I have missed an obvious bug, bad practice, or I am suffering coder stupidity.

WebAssembly is not for 32bit OS (win 10 laptop i7CPU)

WebAssembly is far from a ready technology.

Please please be option 1.

I have read the webAssembly use case

Re-use existing code by targeting WebAssembly, embedded in a larger
JavaScript / HTML application. This could be anything from simple
helper libraries, to compute-oriented task offload.

I was hoping I could replace some geometry libs with webAssembly to get some extra performance. I was hoping that it would be awesome, like 10 or more times faster. BUT 300 times slower WTF.

UPDATE

This is not a JS optimisation issues.

To ensure that optimisation has as little as possible effect I have tested using the following methods to reduce or eliminate any optimisation bias..

counter c += length(... to ensure all code is executed.

bigCount += c to ensure whole function is executed. Not needed

4 lines for each function to reduce a inlining skew. Not Needed

all values are randomly generated doubles

each function call returns a different result.

add slower length calculation in JS using Math.hypot to prove code is being run.

added empty call that return first param JS to see overhead

// setup and associated functions

    const setOf = (count, callback) => {var a = [],i = 0; while (i < count) { a.push(callback(i ++)) } return a };

    const rand  = (min = 1, max = min + (min = 0)) => Math.random() * (max - min) + min;

    const a = setOf(100009,i=>rand(-100000,100000));

    var bigCount = 0;









    function len(x,y,x1,y1){

        var nx = x1 - x;

        var ny = y1 - y;

        return Math.sqrt(nx * nx + ny * ny);

    }

    function lenSlow(x,y,x1,y1){

        var nx = x1 - x;

        var ny = y1 - y;

        return Math.hypot(nx,ny);

    }

    function lenEmpty(x,y,x1,y1){

        return x;

    }





// Test functions in same scope as above. None is in global scope

// Each function is copied 4 time and tests are performed randomly.

// c += length(...  to ensure all code is executed. 

// bigCount += c to ensure whole function is executed.

// 4 lines for each function to reduce a inlining skew

// all values are randomly generated doubles 

// each function call returns a different result.



tests : [{

        func : function (){

            var i,c=0,a1,a2,a3,a4;

            for (i = 0; i < 10000; i += 1) {

                a1 = a[i];

                a2 = a[i+1];

                a3 = a[i+2];

                a4 = a[i+3];

                c += length(a1,a2,a3,a4);

                c += length(a2,a3,a4,a1);

                c += length(a3,a4,a1,a2);

                c += length(a4,a1,a2,a3);

            }

            bigCount = (bigCount + c) % 1000;

        },

        name : length64,

    },{

        func : function (){

            var i,c=0,a1,a2,a3,a4;

            for (i = 0; i < 10000; i += 1) {

                a1 = a[i];

                a2 = a[i+1];

                a3 = a[i+2];

                a4 = a[i+3];

                c += lengthF(a1,a2,a3,a4);

                c += lengthF(a2,a3,a4,a1);

                c += lengthF(a3,a4,a1,a2);

                c += lengthF(a4,a1,a2,a3);

            }

            bigCount = (bigCount + c) % 1000;

        },

        name : length32,

    },{

        func : function (){

            var i,c=0,a1,a2,a3,a4;

            for (i = 0; i < 10000; i += 1) {

                a1 = a[i];

                a2 = a[i+1];

                a3 = a[i+2];

                a4 = a[i+3];                    

                c += len(a1,a2,a3,a4);

                c += len(a2,a3,a4,a1);

                c += len(a3,a4,a1,a2);

                c += len(a4,a1,a2,a3);

            }

            bigCount = (bigCount + c) % 1000;

        },

        name : length JS,

    },{

        func : function (){

            var i,c=0,a1,a2,a3,a4;

            for (i = 0; i < 10000; i += 1) {

                a1 = a[i];

                a2 = a[i+1];

                a3 = a[i+2];

                a4 = a[i+3];                    

                c += lenSlow(a1,a2,a3,a4);

                c += lenSlow(a2,a3,a4,a1);

                c += lenSlow(a3,a4,a1,a2);

                c += lenSlow(a4,a1,a2,a3);

            }

            bigCount = (bigCount + c) % 1000;

        },

        name : Length JS Slow,

    },{

        func : function (){

            var i,c=0,a1,a2,a3,a4;

            for (i = 0; i < 10000; i += 1) {

                a1 = a[i];

                a2 = a[i+1];

                a3 = a[i+2];

                a4 = a[i+3];                    

                c += lenEmpty(a1,a2,a3,a4);

                c += lenEmpty(a2,a3,a4,a1);

                c += lenEmpty(a3,a4,a1,a2);

                c += lenEmpty(a4,a1,a2,a3);

            }

            bigCount = (bigCount + c) % 1000;

        },

        name : Empty,

    }

],

Results from update.

Because there is a lot more overhead in the test the results are closer but the JS code is still two orders of magnitude faster.

Note how slow the function Math.hypot is. If optimisation was in effect that function would be near the faster len function.

WebAssembly 13389µs

Javascript 728µs

/*

=======================================

Performance test. : WebAssm V Javascript

Use strict....... : true

Data view........ : false

Duplicates....... : 4

Cycles........... : 147

Samples per cycle : 100

Tests per Sample. : undefined

---------------------------------------------

Test : 'length64'

Mean : 12736µs ±69µs (*) 3013 samples

---------------------------------------------

Test : 'length32'

Mean : 13389µs ±94µs (*) 2914 samples

---------------------------------------------

Test : 'length JS'

Mean : 728µs ±6µs (*) 2906 samples

---------------------------------------------

Test : 'Length JS Slow'

Mean : 23374µs ±191µs (*) 2939 samples   << This function use Math.hypot 

                                            rather than Math.sqrt

---------------------------------------------

Test : 'Empty'

Mean : 79µs ±2µs (*) 2928 samples

-All ----------------------------------------

Mean : 10.097ms Totals time : 148431.200ms 14700 samples

(*) Error rate approximation does not represent the variance.



*/

Whats the point of WebAssambly if it does not optimise

End of update

All the stuff related to the problem.

Find length of a line.

Original source in custom language

   

// declare func the < indicates export name, the param with types and return type

func <lengthF(float x, float y, float x1, float y1) float {

    float nx, ny, dist;  // declare locals float is f32

    nx = x1 - x;

    ny = y1 - y;

    dist = sqrt(ny * ny + nx * nx);

    return dist;

}

// and as double

func <length(double x, double y, double x1, double y1) double {

    double nx, ny, dist;

    nx = x1 - x;

    ny = y1 - y;

    dist = sqrt(ny * ny + nx * nx);

    return dist;

}

Code compiles to Wat for proof read

(module

(func 

    (export lengthF)

    (param f32 f32 f32 f32)

    (result f32)

    (local f32 f32 f32)

    get_local 2

    get_local 0

    f32.sub

    set_local 4

    get_local 3

    get_local 1

    f32.sub

    tee_local 5

    get_local 5

    f32.mul

    get_local 4

    get_local 4

    f32.mul

    f32.add

    f32.sqrt

)

(func 

    (export length)

    (param f64 f64 f64 f64)

    (result f64)

    (local f64 f64 f64)

    get_local 2

    get_local 0

    f64.sub

    set_local 4

    get_local 3

    get_local 1

    f64.sub

    tee_local 5

    get_local 5

    f64.mul

    get_local 4

    get_local 4

    f64.mul

    f64.add

    f64.sqrt

)

)

As compiled wasm in hex string (Note does not include name section) and loaded using WebAssembly.compile. Exported functions then run against Javascript function len (in below snippet)

    // hex of above without the name section

    const asm = `0061736d0100000001110260047d7d7d7d017d60047c7c7c7c017c0303020001071402076c656e677468460000066c656e67746800010a3b021c01037d2002200093210420032001932205200594200420049492910b1c01037c20022000a1210420032001a122052005a220042004a2a09f0b`

    const bin = new Uint8Array(asm.length >> 1);

    for(var i = 0; i < asm.length; i+= 2){ bin[i>>1] = parseInt(asm.substr(i,2),16) }

    var length,lengthF;



    WebAssembly.compile(bin).then(module => {

        const wasmInstance = new WebAssembly.Instance(module, {});

        lengthF = wasmInstance.exports.lengthF;

        length = wasmInstance.exports.length;

    });

    // test values are const (same result if from array or literals)

    const a1 = rand(-100000,100000);

    const a2 = rand(-100000,100000);

    const a3 = rand(-100000,100000);

    const a4 = rand(-100000,100000);



    // javascript version of function

    function len(x,y,x1,y1){

        var nx = x1 - x;

        var ny = y1 - y;

        return Math.sqrt(nx * nx + ny * ny);

    }

And the test code is the same for all 3 functions and run in strict mode.

 tests : [{

        func : function (){

            var i;

            for (i = 0; i < 100000; i += 1) {

               length(a1,a2,a3,a4);



            }

        },

        name : length64,

    },{

        func : function (){

            var i;

            for (i = 0; i < 100000; i += 1) {

                lengthF(a1,a2,a3,a4);

             

            }

        },

        name : length32,

    },{

        func : function (){

            var i;

            for (i = 0; i < 100000; i += 1) {

                len(a1,a2,a3,a4);

             

            }

        },

        name : lengthNative,

    }

]

The test results on FireFox are

 /*

=======================================

Performance test. : WebAssm V Javascript

Use strict....... : true

Data view........ : false

Duplicates....... : 4

Cycles........... : 34

Samples per cycle : 100

Tests per Sample. : undefined

---------------------------------------------

Test : 'length64'

Mean : 26359µs ±128µs (*) 1128 samples

---------------------------------------------

Test : 'length32'

Mean : 27456µs ±109µs (*) 1144 samples

---------------------------------------------

Test : 'lengthNative'

Mean : 106µs ±2µs (*) 1128 samples

-All ----------------------------------------

Mean : 18.018ms Totals time : 61262.240ms 3400 samples

(*) Error rate approximation does not represent the variance.

*/

Answers

Only authorized users can answer the question. Please sign in first, or register a free account.

hugo

Add To Favorites

Follow

Total Points: 21

Total Questions: 120

Total Answers: 107

Location: Belarus

Member since Tue, Jul 20, 2021

3 Years ago

answered 7 Years ago julieth · Accepted Answer

Andreas describes a number of good reasons why the JavaScript implementation was initially observed to be x300 faster. However, there are a number of other issues with your code.

This is a classic 'micro benchmark', i.e. the code that you are testing is so small, that the other overheads within your test loop are a significant factor. For example, there is an overhead in calling WebAssembly from JavaScript, which will factor in your results. What are you trying to measure? raw processing speed? or the overhead of the language boundary?

Your results vary wildly, from x300 to x2, due to small changes in your test code. Again, this is a micro benchmark issue. Others have seen the same when using this approach to measure performance, for example this post claims wasm is x84 faster, which is clearly wrong!

The current WebAssembly VM is very new, and an MVP. It will get faster. Your JavaScript VM has had 20 years to reach its current speed. The performance of the JS <=> wasm boundary is being worked on and optimised right now.

For a more definitive answer, see the joint paper from the WebAssembly team, which outlines an expected runtime performance gain of around 30%

Finally, to answer your point:

Whats the point of WebAssembly if it does not optimise

I think you have misconceptions around what WebAssembly will do for you. Based on the paper above, the runtime performance optimisations are quite modest. However, there are still a number of performance advantages:

Its compact binary format mean and low level nature means the browser can load, parse and compile the code much faster than JavaScript. It is anticipated that WebAssembly can be compiled faster than your browser can download it.

WebAssembly has a predictable runtime performance. With JavaScript the performance generally increases with each iteration as it is further optimised. It can also decrease due to se-optimisation.

There are also a number of non-performance related advantages too.

For a more realistic performance measurement, take a look at:

Its use within Figma

Results from using it with PDFKit

Both are practical, production codebases.