Hi Khoa, thanks for the update.
It is a curious finding. Theoretically, SciDB should do well at this task because of chunking. Chunking organizes data better so that when we need to do a match, we only need to look at two chunks. It is a good indexing strategy. That strategy should provide an algorithmic advantage.
However, there are other things to consider like chunk sizing, thread and memory tuning settings, the fact that this code is an old prototype and has known bugs in it, and, as you mentioned, the overhead of storing the matches.
So, it looks like there may be room for improvement if you look at the problem carefully… Always feel free to post your config, hardware, schemas and queries up here and folks can take a look and make suggestions. Alas, I don’t have a whole lot of time to spend on this but I’ll try to help if I can.