Data Duplication, Server Redundancy, and Master Failover


#1

I’m evaluating SciDB for a high-availability application, but there appear to be multiple single-points of failure, and I’m having a hard time distinguishing between SciDB and Paradigm4(P4) functionality.

According to http://www.scidb.org/forum/viewtopic.php?f=6&t=701 if any node’s disk goes away, all data stored on that disk is unavailable unless you have an unnamed P4 extension. Is that interpretation correct? Will the database continue running without that disk? Or will it crash?

According to http://www.scidb.org/forum/viewtopic.php?f=6&t=515 there may be some sort of worker redundancy built in so that k servers can go down without bringing down the whole SciDB database. But P4 extensions are mentioned, so is that native or P4 functionality? If native, how, if no data redundancy is provided as mentioned in the previous link?

What happens if the master node fails? What would a decent failover procedure look like?

When answering please make sure to be clear about what’s provided by native Open Source SciDB, and by Proprietary P4 extensions?


Redundancy usage
#2

Hello,

Yes, SciDB by itself cannot survive a node failure at the moment. After a node fails, the system will refuse to execute queries. To use pure SciDB with some durability, you’ d have to do regular backups and/or redundant hardware (RAIDed drives, redundant network cards, etc).

There is a built and tested P4 redundancy plugin. At the moment, it allows you to survive temporary failures. With that plugin, a node failure puts the system into a read-only state - you can issue read queries and analytics but can’t add new data. After the failed node is restored, the system will recognize it and become fully operational again. The plugin also allows you to set the number of tolerable node failures. It does not protect against coordinator failure or postgres failure.

As far as Postgres, it can be hosted on a different cluster / machine altogether and can have its own redundancy story. SciDB nodes only need to be able to talk to it.

If you’d like more info about proprietary P4 stuff (and P4 does sometimes build custom plugins for various customers) - contact info@paradigm4.com.

Hope it helps answer your question.

  • Alex Poliakov

#3

[quote=“apoliakov”]Hello,

Yes, SciDB by itself cannot survive a node failure at the moment. After a node fails, the system will refuse to execute queries. To use pure SciDB with some durability, you’ d have to do regular backups and/or redundant hardware (RAIDed drives, redundant network cards, etc).

There is a built and tested P4 redundancy plugin. At the moment, it allows you to survive temporary failures. With that plugin, a node failure puts the system into a read-only state - you can issue read queries and analytics but can’t add new data. After the failed node is restored, the system will recognize it and become fully operational again. The plugin also allows you to set the number of tolerable node failures. It does not protect against coordinator failure or postgres failure.

As far as Postgres, it can be hosted on a different cluster / machine altogether and can have its own redundancy story. SciDB nodes only need to be able to talk to it.

If you’d like more info about proprietary P4 stuff (and P4 does sometimes build custom plugins for various customers) - contact info@paradigm4.com.

Hope it helps answer your question.

  • Alex Poliakov[/quote]

are there any plans to add this to SciDB core? this seems like an important feature for people running even medium sized clusters – i have a hard time imagining anyone who is running a production cluster would be able to do it without some kind of redundancy.

if not, is this something that a motivated and skilled outsider could add, via the plug-in interface or otherwise? (is there documentation for the plug in interface?)

m


#4

hello –

didn’t get a response so i’m repeating my questions.

are there any plans to add redundancy / failure tolerance to SciDB core? if not, is this something that a motivated and skilled outsider could add, via the plug-in interface or otherwise?

m

[quote=“midfield”][quote=“apoliakov”]Hello,

Yes, SciDB by itself cannot survive a node failure at the moment. After a node fails, the system will refuse to execute queries. To use pure SciDB with some durability, you’ d have to do regular backups and/or redundant hardware (RAIDed drives, redundant network cards, etc).

There is a built and tested P4 redundancy plugin. At the moment, it allows you to survive temporary failures. With that plugin, a node failure puts the system into a read-only state - you can issue read queries and analytics but can’t add new data. After the failed node is restored, the system will recognize it and become fully operational again. The plugin also allows you to set the number of tolerable node failures. It does not protect against coordinator failure or postgres failure.

As far as Postgres, it can be hosted on a different cluster / machine altogether and can have its own redundancy story. SciDB nodes only need to be able to talk to it.

If you’d like more info about proprietary P4 stuff (and P4 does sometimes build custom plugins for various customers) - contact info@paradigm4.com.

Hope it helps answer your question.

  • Alex Poliakov[/quote]

are there any plans to add this to SciDB core? this seems like an important feature for people running even medium sized clusters – i have a hard time imagining anyone who is running a production cluster would be able to do it without some kind of redundancy.

if not, is this something that a motivated and skilled outsider could add, via the plug-in interface or otherwise? (is there documentation for the plug in interface?)

m[/quote]


#5

Hi Midfield,

At the moment - no immediate plans. By default, if you set “redundancy=…” then scidb will actually make the copies of the chunks and replicate them. And there is a P4-only plugin that takes advantage of that and keeps the system online if an instance becomes unresponsive. P4 may, from time to time, “demote” plugins to make them open source - but no such plans for this plugin at this time. P4 also sometimes gives these plugins for free to academics and lab scientists. If that’s you - send a request via the “Try it” form at paradigm4.com.

Hope it helps.