SCIDB_SE_IO::SCIDB_LE_PWRITE_ERROR


#1

In continuation of my previous post and subsequent testing to create a multi dimensional array and running a redimension_store for 3 hrs I got the following error

SystemException in file: src/util/FileIO.cpp function: writeAll line: 94 Error id: scidb::SCIDB_SE_IO::SCIDB_LE_PWRITE_ERROR Error description: I/O error. pwrite failed to write 1742 byte(s) to the position 18522365952 w$ Failed query id: 1100867376282

[code]iquery -aq "
CREATE EMPTY ARRAY pipe_ndim
<
id : int64,
scheduledVolume : double NULL,
actualVolume : double NULL,
utlization : double NULL,
designCapacity : double NULL,
operationalCapacity : double NULL,
actualCapacity : double NULL,
operationallyAvailable : double NULL,
flowdirection : string NULL,
interruptibleflow : string NULL

[ pipelineId=0:,32,0, mesasurementDate(datetime)=,32,0, pointId=0:,32,0, segmentId=0:,32,0,nomCycle(string)=,8,0,locationType(string)=,16,0 ]
"
Query was executed successfully

iquery -naq "
redimension_store (
pipe_load,
pipe_ndim
)
"
SystemException in file: src/util/FileIO.cpp function: writeAll line: 94
Error id: scidb::SCIDB_SE_IO::SCIDB_LE_PWRITE_ERROR
Error description: I/O error. pwrite failed to write 1742 byte(s) to the position 18522365952 with error 28.
Failed query id: 1100867376282

[/code]

I am not sure if this is a data issue/OS (VM)/Scidb issue. I did not build the sources and installed the binaries.


#2

Hi,

You are getting errno 28 which is “no space left on device”.

During redimension_store, scidb writes data to its primary data partition as well as the temp partition. One of these locations is running out of space - my guess is it’s the temp. The location of temp is provided by the config “tmp-dir”. Regardless of whether you set it or not, you can check scidb.log for where it is.

Also, what kind of sparsity are you expecting in the result array? In the array pipe_ndim, your chunk logical size is 134+ million cells. Based on your data, are you expecting 100% occupancy? 50%? 1%?


#3

Thank you for the response. There are 70Million rows and 16 attributes. There are not many gaps in the data so it may not be sparse.

To make it work i tweaked with sizes and removed the attributes and created an array for each attribute. This code did work as i removed other attributes, The initial query execution (aggregations) is around 2 mins but later queries are taking 8 sec. The chuck size still not correct. This is before your response

AFL% show(bentek_scheduledVolume); i,schema 0,"bentek_scheduledVolume<scheduledVolume:double NULL> [mesasurementDate(datetime)=*,366,0,pipelineId=0:260,10,0,segmentId=0:60564,100,0,pointId=0:60989,100,0,nomCycle(string)=*,1,0,locationType(string)=*,1,0,flowdirection(string)=*,1,0,interruptibleflow(string)=*,1,0]"

Will use repart to change the dims sizes to approximate to the recommended size and test and see the performance and will post it.

-Vijay


#4

Yup.

At the moment, many SciDB operations proceed through an entire “logical” chunk at a time. So if you have (for example) 100 attributes, and you want to re-organize the data, SciDB tries to re-organize 100 data chunks in a single gulp. If each of these chunks is 10M, that’s 1G of chunks per instance, read and write. You can see how, given this kind of math, we chew up available memory pretty quickly.

The solution is pretty obvious. Rather than trying to chew through of the attributes’ we should proceed through a few attributes at a time. That won’t degrade run-time (algorithm will have the same complexity) and it will use far fewer resources during query execution.

Until then, breaking your arrays up into per-attribute arrays is going to get you closer. That, or smaller chunk sizes.