To get familiar w/ the SciDB code, I’ve been rewriting the input operator. While this is mostly an exercise, there’s a couple of things I’d like to add. The first is support for multiple file formats, which I’ve partially done (and in that process, I rewrote the current parser to address some edge cases). The second feature is support for queries such as: iquery -aq “filter(input(Two_Dim, ‘/home/miguel/array2d.txt’), a>1)”
… that is, be able to filter the input directly from its location, without actually having to load it first into the database. (Yes, not totally realistic for a wide variety of scenarios, but this is mostly an exercise and perhaps the beginning of bigger things )
Now, I’ve been trying to understand how operators are implemented and executed, in particular the filter operator. I don’t fully understand this yet, so if one of the devs has a bit of time to write down or point me to a high-level overview of this, I’d be most grateful! For instance, in the InputArray::getConstIterator, the code is such that iterators are instantiated only once and reused later (Q: when do they need to be reused through the getConstIterator call? they could be in the “wrong” position if one does getConstIterator for something that was init’ed and advanced (++) before…). The code as-is interacts badly with the filter; if that code is changed to return a new iterator on every call, then the iquery command above will work fine. Having said that, while doing a step-by-step execution of a filter operation, I was sort of surprised with the number of times FilterArray::createChunk or FilterArray::createArrayIterator are called in the execution… but I guess that’s just because I don’t quite understand the implementation!..