SLOW start/stop/initall databasename


Hi Experts,

I am running a cluster with 128 instances. Whenever I do startall or stopall or initall, it does all instances in serial. For me each instance is taking several seconds, so it takes me about half an hour to start/stop the cluster. Q:

  1. is it normal that each instance will take several seconds? (I mean, maybe I’m not setting up DNS stuff correctly so each ssh from paramiko is slow? But I can run ssh from one node to another instantly.)
  2. can you consider parallelize them in future releases? That will save a lot of time.



Well …

  1. We’re hoping that folk won’t be stopping and starting SciDB very often. And if they are, it’s probably because they’re encountering some kind of problem that we really, really need to talk about.

  2. I’ve created an entry on our “List of Things to Do” - “Parallelize Installation Startup”. Drop me a note and I’ll send you the description I’ve entered there to check that we’re both on the same page.




I have a case where I need to restart scidb. When there is a hardware error that causes a failed fsync() error, the query also fails, Everything unwinds to a valid state, and that’s great. And here is the “but …”: sometimes locks on arrays are not released. When that happens, a new query on the same array will just hang indefinitely.

When I kill the query the locked array remains locked. I stopall and startall to fix the situation. And I agree, it takes way too long.
Does SciDB have a practical limit on the number of instances it can handle? At some point it becomes unpractical to wait that long for an orderly restart.

So, yes, please, give us parallel startall/stopall!

BTW: is there a non-block function that tells me immediately if an array is locked by an ongoing other query?

Cheers, George


I have a little more news on this.

The pace of our internal test cycles was giving us problems, so we’ve made some changes to speed up the startall step. Look for these changes in Cheshire ( 12.10 ).