Scidb doesn't start


#1

I was executing redimension operator to load data.

but, I showed “setposition” error after sort & merge of redimension.
So, I would like to restart scidb.

Now, when scidb start, I show following error.

2014-12-04 00:59:47,001 [0x2ba3430b89e0] [DEBUG]: Network manager is intialized 2014-12-04 00:59:47,001 [0x2ba3430b89e0] [DEBUG]: NetworkManager::run() 2014-12-04 00:59:47,002 [0x2ba3430b89e0] [DEBUG]: Performing rollback 2014-12-04 00:59:47,003 [0x2ba3430b89e0] [DEBUG]: End of log at position 138720 rc=136 2014-12-04 00:59:47,003 [0x2ba3430b89e0] [DEBUG]: End of log at position 0 rc=0 2014-12-04 00:59:47,018 [0x2ba3430b89e0] [DEBUG]: Rollback complete 2014-12-04 00:59:47,018 [0x2ba3430b89e0] [DEBUG]: SystemCatalog::deleteArrayLocks instanceId = 0 role = 1 queryId = 18446744073709551615 2014-12-04 00:59:47,019 [0x2ba3430b89e0] [DEBUG]: Performing rollback 2014-12-04 00:59:47,019 [0x2ba3430b89e0] [DEBUG]: End of log at position 138720 rc=136 2014-12-04 00:59:47,019 [0x2ba3430b89e0] [DEBUG]: End of log at position 0 rc=0 2014-12-04 00:59:47,035 [0x2ba3430b89e0] [DEBUG]: Rollback complete 2014-12-04 00:59:47,035 [0x2ba3430b89e0] [DEBUG]: SystemCatalog::deleteArrayLocks instanceId = 0 role = 2 queryId = 18446744073709551615 2014-12-04 00:59:47,186 [0x2ba3430b89e0] [ERROR]: Error during SciDB execution: SystemException in file: src/smgr/io/Storage.cpp function: initChunkMap line: 448 Error id: scidb::SCIDB_SE_STORAGE::SCIDB_LE_DATABASE_HEADER_CORRUPTED Error description: Storage error. Database header has been corrupted. 2014-12-04 00:59:47,218 [0x2ba3430b89e0] [INFO ]: SciDB instance. SciDB Version: 14.8.8457. Build Type: RelWithDebInfo. Copyright (C) 2008-2014 SciDB, Inc. is exiting.

I don’t want to lose my loaded data in scidb.


#2

Having the same issue. Unable to restart the server following this error:

2015-03-15 07:11:44,182 [0x7f2cc5f26840] [ERROR]: Error during SciDB execution: SystemException in file: src/smgr/io/Storage.cpp function: initChunkMap line: 448
Error id: scidb::SCIDB_SE_STORAGE::SCIDB_LE_DATABASE_HEADER_CORRUPTED
Error description: Storage error. Database header has been corrupted.

Can someone point me to additional information about the database header? Once corrupted, can it be restored?


#3

what version of scidb db are you running?
enterprise or community?

if its community and scidb crashed i don’t think you can recover.
for enterprise you would need to enable replication feature to be able to recover.


#4

This corruption is a known bug in 14.8. In 14.12 we eliminated at least one root cause for the bug and so far in our testing it appears to have significantly lessened the probability of hitting it. In the meantime, if you don’t have replication and the enterprise extensions running, there is still one thing (hack) you can do. It requires some familiarity with SQL commands, and involves manually updating the PostGres catalog, so of course it is not officially supported.

The most likely reason Scidb is not starting is that due to a power outage or a software fault, SciDB shut down unexpectedly during an update to an array. Normally, this would not be a problem, since SciDB is designed for ACID compliance. However, the bug I mentioned above introduced a small possibility that during recovery from a crash SciDB could allocate the same disk block to two different chunks. Once this is discovered by the system, the system will crash and not be able to restart. The corruption is most likely limited to a single array. If you can remove that array from the system catalog, then SciDB will start properly. Of course, if all of your data is contained in the corrupted array, this will not help you at all. But if the corrupted array represents a small portion of your data set, then doing this will allow you to preserve the rest of your uncorrupted data.

So you need to do two things:

  1. determine which array is corrupted. In 14.8 the logging info is not as detailed as it is in 14.12, so it won’t be obvious from the error message in the log. One possibility is to look into the log and see what the last trasaction was that was running before scidb crashed. If you can determine the array that is corrupted, then:
  2. you need to remove the array from the array table in Postgres SQL. This requires you to use the psql client to open the system catalog DB and update the array table. Remove all versions of the affected array from the “array” table. Make sure that the cluster is stopped when you are performing this operation.

If you need more details regarding these steps, let us know…
–Steve F
P4