I have been implementing a parallel computable database for my own project and just stumbled onto yours.
I like very much what you are attempting, and hope to influence you towards meeting the special needs of my project.
I haven't spent enough time reviewing your documentation or code to know whether my comments are redundant.
I apologize for the enthusiastic "jumping in" that I do in this posting.
Solve perceptual paradoxes using model neurons and small (at first) neuron aggregates.
I use only a few general operations to model a neuron:
- Transduce difference signals (usually an operator like tanh) in both time and space
- Pulse when sufficient difference is detected (variable threshold)
- Collect pulses along converging high speed delay lines (linear sequence of associated voxels)
- Detect coincidences where delay lines converge (variable threshold)
- Project detected coincidences over diverging high speed delay lines to transduction-points
- Move transduction-points to neighboring voxels under certain conditions
As you can see, my model needs are not particularly draconian.
However, when one tries to attempt to produce aggregates of neurons
the interdigitation of the collection and projection trees causes me
to consider very carefully the locality of data, avoidance of race conditions,
and the co-occupation of delay lines within a given voxel.
I have solutions for these, but they are too much to describe here.
To implement my model, I chose to use a discrete Cartesian grid of voxels where
each voxel can be fragmented into a voxel-filling discrete Cartesian grid of subvoxels
and this division can be performed recursively ad-infinitum.
In addition, given a grid, I can build a grid of supervoxels around it recursively.
I chose to use a grid of 3x3x3 subvoxels with a center subvoxel having local coordinates (0,0,0).
Each voxel contains a property list which could be construed as implementing higher dimensions.
Each voxel is considered an autonomous computation element
taking its information exclusively from its property lists and immediate neighbors.
Computation is a two-step process of
1. collecting properties from neighboring voxel states
2. modifying local properties from the collected neighbor properties
I chose these criteria due to opportunities for massive simplifications of
standard classical Physics calculations like Maxwell's equations and Hertz's speed of light calculations.
Also, these criteria are intuitively familiar to a broader group of users
since they operate on the familiar 3-space,
yet enable support for single-voxel internal array properties
to implement dimensions deemed valuable for a project.
In addition, I chose this design because it has an intrinsic structure easy to distribute
over a cluster of machines with redundancies sufficient to tolerate failures of nodes.
My model is largely insensitive to the failure of a given voxel
just as most neural aggregates are insensitive to failure of a given neuron.
But, as I said, this is how I am approaching my own project.
I have some implementation ideas to contribute to your efforts (possibly redundant):
- Symmetric odd-length arrays with index 0 in the center eliminate persistent expensive opposing offsets between coordinates and indices.
- Symmetric even-length arrays straddling 0 in the center make discrete Maxwell and Hertz style calculations simpler.
- Sparse arrays implemented as (not necessarily binary) trees enable local compact representations and computational space reduction.
- Avoid floating point altogether by enabling fragmentation within limits (maximum fragmentation depth).
- Implement optimized coordinate-index-coordinate transforms (I have good implementations for my own project).
- Implement equivalents to C++ mask_array, index_array, etc... to enable nested blittable subsetting.
- Implement a scientific data visualizer (like ITK/VTK) (I have developed one using OpenGL for my own project).
- Implement a "sculpture" tool and language to enable hand and automated construction of objects and subsets.
Avoiding floating point makes nested blittable subsetting possible, so this is not a lightly given recommendation.
One last recommendation.
Do not implement just one interface language to your database.
Make, instead, a set of languages that achieve different efficiencies
and make it possible for people to write their own languages
to implement their own solution spaces in the "best" language for the purpose.
After all, underneath LISP, FORTRAN, C++, Python, MATLAB and all that
is the very same processor.
Each language constrains the user in some ways while augmenting in other ways.
Ideally, there ought to be some very primitive "machine language" for your database
on top of which is implemented the developer language on top of which is implemented the application.
I could add value to this project if the SciDB team is interested.
I am an experienced cross-platform C++ and Intel assembly developer with deep thoughts about parallel algorithms.
I welcome both comments and invitations to further explore cross-fertilization of ideas.
Again, I apologize that I just jumped in without seeing how far your team has already gone.