June 9, 2015
A user on r/javascript posted an interesting question last night, and I figured I’d jump in to try to find out the answer.
Why would this code:
function chunk(arr, size) {
var n = [],
i = 0,
x = arr.length,
t = size;
for (; x > 0; i++) {
n[i] = arr.slice(0, t);
arr = arr.splice(t);
x -= t;
}
return n;
}
chunk([0, 1, 2, 3, 4, 5], 2);
be noticeably slower (40% in FF40) than this code:
function chunk(arr, size) {
// Break it up.
var iter = Math.floor(arr.length / size);
var out = [];
for (var i = 0; i < iter; i++) {
var tmp = [];
for (var x = 0; x < size; x++) {
tmp.push(arr[x]);
}
arr.splice(0, size);
out.push(tmp);
}
if (arr.length > 0) {
out.push(arr);
}
return out;
}
chunk([0, 1, 2, 3, 4, 5], 2);
My first step to try to figure out what was going on was to bump the array size up from 5 to 10000. If we were seeing something more than just noise, that would make it a lot bigger. And, sure enough, with 10k entries in the array, the slower code ran at 1/5 the speed of the faster snippet.
Let’s look at that code more closely now. I’ve cleaned it up and converted it to CoffeeScript below, since I find that it is a bit more clear.
(arr, size)->
n = []
i = 0
x = arr.length
while x > 0
n[i] = arr.slice 0,size
arr = arr.splice size
x-=size
i++
return n
So, we’re going to loop arr.length
/ size
times. Makes sense. The first
thing we do is call slice
, which creates a new array containing the
entries from 0
to size - 1
. In this test case, that means that we’re
copying the first two values. While that probably runs in linear time
relative to size, having size
<<<< arr.length
should for practical
purposes drop that down to a constant time operation.
That leaves us with the splice
operation, which is our culprit. First,
we need to look at the behavior of Array.prototype.splice
. Technically, calling splice
with a single argument is outside of the spec, but it seems like most
browsers have settled on a common behavior. Calling splice
with a single
argument removes all values from that index upwards in the original
array and returns a new array containing the removed values. For
example:
input = [0,1,2,3,4]
output = input.splice 2
#input is now [0,1]
#output is now [2,3,4]
So, in our code snippet above, we’re copying n-2 entries from the original array into a new array and discarding the original one. That means that we’re doing a O(n) operation (copying n-2 entries) n times, making our algorithm here O(n2). It’s also worth noting that we allocate n arrays of length n, so we use O(n2) of memory when we were pretty clearly trying to do this in place.
Based on that, it seems like there should be a simple fix: instead of
taking the result and storing it back into arr, just splice
out the
first two values. We can also store the result of that directly into our
result array, since we’ve just returned exactly what we wanted!
(arr, size)->
n = []
i = 0
x = arr.length
while x > 0
n[i] = arr.splice 0,size
x-=size
i++
return n
However, I am fairly sure that the splice
operation is O(n) unless Chrome and/or Firefox are smart enough to use
a data structure that allows it to happen in constant time. That means that our final algorithm is still O(n2), which is a bummer.
However, the end result of the above code is that our new fast version is just as fast as the original fast
solution in Firefox, and significantly faster that the original fast
solution in Chrome!
Is that the best we can do though? (Hint: It’s not.) And did you notice the subtle differences in behavior among the 3 solutions? That’s for the next entry!