Thanks for the detailed response!
We seem to be slipping and slopping from forum to forum.
- First, your
create array ... statement. What you’re doing here is to create a heckuva lot of chunks:
create array starTest < oid:int32, htmid:int32, hpid:int32, rra:double, ddec:double, dpix:float, x029:int32, y029:int32, x007:int32, y007:int32, mag3:float, emag3:float, mag4:float, emag4:float, mag5:float, emag5:float >[x=0:*,1,0, y=0:*,1,0, z=0:*,1,0]
The dimension specification -
[x=0:*,1,0, y=0:*,1,0, z=0:*,1,0 ] says, “This array has three dimensions, x, y and z, which begin at 0 but are unbounded in length. I want SciDB to decompose this array into chunks which are each of length 1 in the x dimension, 1 in the y dimension, and 1 in the z dimension, and I want these chunks to have no overlap”. In other words, you will have at most 1 “cell” in each chunk.
The problem with this is that you’re creating 15 x 170,000 chunks. One of the design points we adopted in SciDB was giving users the ability to put everything that was “close together” in logical space (in this case, the [ x, y, z] space) close together in the storage. That way, if you want to see (for example) what’s around you in any of the three dimensions, you’re pretty much guaranteed that the data will be in the chunk you’re starting with.
Second - the unbounded dimensionality won’t get you the best results you want. SciDB supports it, but it makes some calculations hard. In the next release we make it considerably easier to work with data where you don’t know the dimensionality in advance. I assume you’re working with data in the RA / Decl space. I used SciDB to load (some of) the USN-B catalog, and I used the following schema:
CREATE EMPTY ARRAY Objects < Proper_Motion_RA : int32, \
Error_in_Motion_RA : int32, \
Proper_Motion_DECL : int32, \
Error_in_Motion_DECL : int32, \
Obs_Epoch : double, \
B_mag : double, \
B_mag_flag : int32, \
R_mag : double, \
R_mag_flag : int32, \
B_mag2 : double, \
B_mag2_flag : int32, \
R_mag2 : double, \
R_mag2_flag : int32 > \
[ RA=0:35999999,144000,8400, DECL=0:17999999,72000,4200]"
The idea is to divide the “space” into 250x250 chunks, each of which has about a 1/2 arc-degree. Then if you want queries like:
subarray ( starTest, (126.734*10000), (-43.3245 * 10000), (126.956*10000), (-41.9483*10000) );
We can very quickly find you everything in the specified “region of the sky”.
- SciDB exploits arrays to get our performance. Unlike SQL DBMS engines, we don’t (yet) have things like B-trees over attributes (although we completely expect to get to them). For your query:
filter( starTest, htmid=-774238732 )
Being a filter over an attribute, we are obliged to scan.
Next release, we have functionality that lets you do a better job using arrays. But … that’s not there yet.