This is what did on my first try and it takes 30mins to run through the code. However, when I did two for loops it is much faster. Taking only 15 seconds.
I guess the reason is that there are usually only two maybe three elements per key. running using vectorization is much slower because it go through the big array too many times