Garbage collection/ temporary object in scidb


#1

Hello,

Should we take care of cleaning up objects that are created in scidb by R?
For example I have an iterative algortihm where a matrix is constantly updated according to M(t) = f(M(t-1))
M is pretty huge and for each time step a new matrix will be created. I am afraid these computations will eat up all of scidb’s available memory pretty quickly.

Are the corresponding scidb object destroyed when the R object is destroyed by the garbage collection. Or should it be handled manually?

Thanks & Regards,

Sébastien


#2

Hi Sebastien,

In SciDBR if I issue a statement like

A = scidbeval(B, temp=TRUE)

Then the array backing A will be cleaned up by the garbage collector whenever A goes out of scope. temp=TRUE means that A will be backed by a ‘temp’ array but if there are too many temp arrays to fit in memory, SciDB will spill least recently used chunks to disk.

You can also disable this behavior with:

A = scidbeval(B, temp=TRUE, gc=0, name="myArray")

The name argument is not a requirement but a nice-to-have so that you don’t lose track of the object.

There are some cases where the R session may quit unexpectedly and that may leak a temp array. Those arrays are easily found by name and can be cleaned up later.

Hope it helps.


#3

Thank you Alexander, I understand better now


#4

When I do A<-as.scidb(iris)
it creates a temporary array named R_arrayb6e46642d2c1479347672637250679… some random value
When I try to work on it, I can’t in most of the commands. I cannot use A@attributes or there are so many commands that give the error public.R_arrayb6e46642d2c1479347672637250679’ does not exist. How to convert this temporary array to the one in scidb?


#5

Yes you can use “name=” to give it any name you want and you can use “gc=0” to not engage R’s garbage collector:

> a = as.scidb(iris, name="my_iris_array", gc=0)
Warning message:
In df2scidb(X, name = name, gc = gc, ...) :
  Attribute names have been changed
> str(a)
SciDB expression  my_iris_array
SciDB schema  <Sepal_Length:double,Sepal_Width:double,Petal_Length:double,Petal_Width:double,Species:string> [tuple_no=0:*,100000,0,dst_instance_id=0:15,1,0,src_instance_id=0:15,1,0] 
         variable dimension   type nullable start end  chunk
1        tuple_no      TRUE  int64    FALSE     0   * 100000
2 dst_instance_id      TRUE  int64    FALSE     0  15      1
3 src_instance_id      TRUE  int64    FALSE     0  15      1
4    Sepal_Length     FALSE double     TRUE                 
5     Sepal_Width     FALSE double     TRUE                 
6    Petal_Length     FALSE double     TRUE                 
7     Petal_Width     FALSE double     TRUE                 
8         Species     FALSE string     TRUE                 
> rm(a)
> gc()
         used (Mb) gc trigger (Mb) max used (Mb)
Ncells 492287 26.3     940480 50.3   811563 43.4
Vcells 762735  5.9    1851483 14.2  1463709 11.2

> b = scidb("my_iris_array") ##Still there!
> count(b)
[1] 150