R limitation when retrieving data from SciDB


#1

Good morning:

I created and array and I also uploaded some data using SciDB. The array is defined like this:

The array contains pixel data from a satellite image and it has 1,520,640,000 cells. As you can see, the dimensions’ start values are greater than 0 (except for time).

I cannot retrieve data using R’s data-frame notation, so, I use iQuery but I’m not able to retrieve more than 2440 rows from that array (I’m using R-Studio on my laptop for retrieving data from the server). Am I missing a configuration parameter in R? or SHIM?

Thanks for your help!

Alber

library(scidb)
options(scidb.debug=TRUE)

scidbconnect(host = "myservername.mydomain.com", port = 8080L)

# Retrieves 100 rows to R
afl <- "subarray(MOD09Q1_TEST004_20140220, 57600, 43200, 46, 57609, 43209, 46)"
res9 <- iquery(afl,`return` = TRUE, afl = TRUE, iterative=FALSE)
cat("Expected 100 rows... retrieved ")
nrow(res9)

#This only retrieves 2440 instead of 10000 rows - why?
afl <- afl <- "subarray(MOD09Q1_TEST004_20140220, 57600, 43200, 46, 57699, 43299, 46)"
res99 <- iquery(afl,`return` = TRUE, afl = TRUE, iterative=FALSE)
cat("Expected 10000 rows... retrieved ")
nrow(res99)

#2

Check out the ‘n’ option to iquery. It’s there as a safety valve to limit results. You can set n=Inf to get everything back.

If you’re downloading huge data back into R, consider using the iterator interface instead of pulling it all back in one shot.

See

help(“iquery”)

for details and examples.


#3

Hi:

Thanks! You’re right, setting n = Inf solved my problem and now my script is able to retrieve 10000 rows. However, I’m still wondering something: If the default value of n is 10000, why my original script was retrieving just 2440 rows?

I believe iquery creates a row in the data.frame for each cell in the SciDB array. Am I wrong?

Cheers!


#4

You’re right, this is curious and I don’t fully understand the short response yet. Still investigating…