Iteration in AQL/AFL


#1

Hi friends,

I recently tried to implement an algorithm that includes iterations. You can think of it something like gradient decent that will gradually converge to a minimum. Is there any syntax in AQL/AFL to control the loop, e.g. “while x < y” or “for i in (1…10)”?

Thanks,
Dongfang


#2

I have been programming C++ operators to do iterative algorithms in SciDB. I think you cannot express that in pure AFL/AQL…
If you are working on a gradient decent algorithm I will be very interested. Can you elaborate?
You can also email me at zhangyiqun9164@gmail.com.

Thanks!
Yiqun


#3

Hi Yiqun,
Can you elaborate on how you implement SciDB operators in C++? High quality iterative algorithms either in C++ or any other language quickly get rather complicated and you will want to use an existing mature library wherever possible. If you are already touching the data directly in C++, you may be able to serve directly to one of these iterative algorithms libraries using a thin wrapper. Cheers, Dmitry


#4

Hi Dmitry

I saw some of Yiqun’s code here - you can take a look: https://github.com/yzhang1991/gamma-scidb
Unfortunately I haven’t had the time to try these out. I’ll let him comment on his work.

Another alternative for “touching data in parallel” is our Streaming Interface. See: https://github.com/paradigm4/stream
You can use this to implement some nontrivial computations - see the examples provided.

Also see this page for some other useful links from the community:


#5

Thanks, Alex. I’m new to SciDB, but I have experience in high-performance linear/nonlinear solver. Looking at Yiqun’s code (Correlation) to get an idea how these ops are implemented, I’d recommend interfacing to one of the high-performance iterative solver libraries (e.g., PETSc, www.mcs.anl.gov/petsc). It will handle all of the linear algebra over MPI, calling out to linear algebra libraries (e.g., Elemental – a modern alternative to ScaLAPACK, SuperLU_Dist, etc.) where necessary. My guess is that it would be more efficient than BufSend().

Full disclosure: I am a PETSc developer, but my interests with SciDB are practical – to get it to handle some of the high-performance machine learning for financial industry applications. I’m currently looking at using something like streaming to fork a collective MPI job that could run whatever iterative solver necessary and return the result to a SciDB array. I’m looking at MPILauncher as a first-order approximation, but there may be better ways.

Thanks!
Dmitry.


#6

Cool, sounds interesting!
Internally we do have pieces of code that launch MPI for GEMM and GESVD operations. See include/array/mpi/MPILauncher.h Perhaps you already found it and that’s what you meant above.

Streaming lets you prototype something quickly, this method is a lot more complex of course. Let us know how it goes!


#7

Thanks! I’ll keep looking at MPILauncher.


#8

Hi Dmitry,

I actually have a repository that does K-Means clustering in SciDB, but not calling external libraries. It is just a simple native implementation.
The only experience I had with external libraries was linking a SciDB operator with OpenACC to direct some of the matrix computations to GPU.
But what you described here is also very interesting. If you can share what you find with MPILauncher that will be great.

Y


#9

Hi Yiqun,

Using an external library doesn’t seem to be anything new for SciDB – GEMM already calls out to ScaLAPACK (although I would recommend looking at Elemental instead). In fact, a brief look at MPILauncher, MPISlaveProxy, etc. indicates that SciDB has all of the machinery in place for launching general MPI jobs to implement a query. PETSc could be wrapped exactly the same way. It implements a large number of iterative linear solvers for sparse linear systems as well as general-purpose nonlinear solvers (quasi-Newton, Picard) and a number of optimization algorithms. My plan is to provide a general mechanism for implementing optimization algorithms as SciDB queries, but it would be nice to have a small example of interest to the broader user community. KMeans could be one such example.

I’ve been running into issues building a Docker container for the dev version of SciDB (I need it to be able to reuse the MPILauncher stuff), but I’m going to give building the PETSc plugin a try. If you want to discuss how this could be applied to your specific problems, I’d be happy to chat.

Cheers,
Dmitry.