Input parser expected literal error; issue loading arrays


#1

Hello All,

I created a one dimensional array using:

AQL% CREATE ARRAY raw1new <X:double,Y:double,filename: string>[i];

and I then attempted to load my .scidb file into this array, and I received this error:

AQL% LOAD raw1new FROM '/home/scidb/Desktop/Final/output/graphs/NewSciDB/2.scidb';
UserException in file: src/query/ops/input/InputArray.cpp function: end line: 49
Error id: scidb::SCIDB_SE_IMPORT_ERROR::SCIDB_LE_FILE_IMPORT_FAILED
Error description: Import error. Import from file '/home/scidb/Desktop/Final/output/graphs/NewSciDB/2.scidb' (instance 0) to array 'raw1new' failed at line 603, column 1, offset 14773, value='2.dat': Input parser expected literal.
Failed query id: 1100853483146

It’s a 3 column x 600 row file, where the 3rd column consists of strings, with rows of the form “(#,#,1.dat)”.

It seems like a very simple sort of issue, but I can’t figure out how to get this file to upload properly. Any help would be greatly appreciated.

Best,
Anthony


#2

Anthony -

A couple of things:

  1. When you use an unadorned dimension name in the CREATE ARRAY ... statement, SciDB is obliged to make a number of assumptions about the dimension. The engine assumes that your dimension values start at zero and have an unbounded size. The engine also guesses at the chunking - 500,000 cells per chunk, or about 4 Meg per chunk.

$ iquery -aq "CREATE ARRAY foo < a1 : int64 > [I]" Query was executed successfully $ iquery -aq "show ( foo )" [("foo<a1:int64> [I=0:*,500000,0]")]

  1. I suspect that you have a malformed load file. What do lines 600 through to the end of the load file look like?

KR

Pb


#3

Anthony,

Could you post a few lines of the scidb load file.
I assume you have put the CSV file through csv2scidb first, and you are trying to load the output of csv2scidb.
Do the string attributes appear with quotes (single quotes)?

Suchi


#4

I am learning how to use scidb ,and i have the same problem as you do.

I created a one dimensional array

AQL% CREATE ARRAY A <…losts of attributes…>[ i=0:*,100000,0 ]

and i have put the CSV file through csv2scidb first, but the string attributes not appear with quotes ,why ?


#5

What’s the precise command you’re using to generate the SciDB load format data from the .csv? Here’s an example of something very similar that just ran.

$ ./genTS 2 10 
AAC,1/2/2011 8:0:1,111.52,51
AAC,1/2/2011 8:0:2,110.52,82
AAC,1/2/2011 8:0:3,110.52,78
AAC,1/2/2011 8:0:4,106.52,34
AAC,1/2/2011 8:0:5,110.52,83
AAC,1/2/2011 8:0:6,109.52,79
AAC,1/2/2011 8:0:7,110.52,41
AAC,1/2/2011 8:0:8,112.52,67
AAC,1/2/2011 8:0:9,112.52,41
AAC,1/2/2011 8:0:10,110.52,44
AACC,1/2/2011 8:0:1,54.50,338
AACC,1/2/2011 8:0:2,53.50,389
AACC,1/2/2011 8:0:3,54.50,680
AACC,1/2/2011 8:0:4,52.50,613
AACC,1/2/2011 8:0:5,53.50,269
AACC,1/2/2011 8:0:6,52.50,618
AACC,1/2/2011 8:0:7,53.50,405
AACC,1/2/2011 8:0:8,51.50,449
AACC,1/2/2011 8:0:9,52.50,676
AACC,1/2/2011 8:0:10,52.50,440

Don’t worry too much about genTS. It’s an internal tool we use to generate fake timeseries data to test things. But the key point is that it’s format is a string, then a datetime (which we will load as a string), then a double, then an integer. So, to load that:

$ ./genTS 2 10 | csv2scidb -c 10 -p SSN
{0}[
("AAC","1/2/2011 8:0:1",67.59,404),
("AAC","1/2/2011 8:0:2",65.59,328),
("AAC","1/2/2011 8:0:3",67.59,406),
("AAC","1/2/2011 8:0:4",66.59,438),
("AAC","1/2/2011 8:0:5",67.59,303),
("AAC","1/2/2011 8:0:6",67.59,350),
("AAC","1/2/2011 8:0:7",65.59,234),
("AAC","1/2/2011 8:0:8",65.59,188),
("AAC","1/2/2011 8:0:9",66.59,201),
("AAC","1/2/2011 8:0:10",67.59,295)
];
{10}[
("AACC","1/2/2011 8:0:1",74.39,677),
("AACC","1/2/2011 8:0:2",71.39,379),
("AACC","1/2/2011 8:0:3",75.39,915),
("AACC","1/2/2011 8:0:4",76.39,670),
("AACC","1/2/2011 8:0:5",74.39,856),
("AACC","1/2/2011 8:0:6",74.39,625),
("AACC","1/2/2011 8:0:7",72.39,538),
("AACC","1/2/2011 8:0:8",74.39,376),
("AACC","1/2/2011 8:0:9",75.39,502),
("AACC","1/2/2011 8:0:10",74.39,945)
];

That’s plonked the quotation marks around the two strings, and it’s identified the two numerical values.

I suspect that there’s something malformed in the file.

Also: pro tip. You don’t have to create a copy of the original .csv file, and name it something like foo.scidb. Internally, when we’re loading data, we always use a named pipe, rather than a file name, because that avoids a disk bounce.

#!/bin/sh
#
#  Generate a stream of appropriately formatted SciDB data, and pipe
# it into a named pipe. Then in the SciDB load() command, read 
# from the named pipe. 
#
rm -rf /tmp/load_pipe
mkfifo /tmp/load_pipe
./genTS 2000 2000000 | csv2scidb -c 100000 -p SSNN > /tmp/load_pipe &
#
time iquery -aq "load ( Signal_Timeseries_Raw, '/tmp/load_pipe' )" -r /dev/null

#6

When using csv2scidb, or loadcsv.py you can supply an optional format string. That format string looks like this “NssNNSS”, it contains one value per field and the meaning is this:
N: do not put the field in quotes, convert , to ,null,
S: put the field in quotes, convert , to ,"",
s: put the field in quotes, convert , to ,null,

So if you want the tool to put the quotes and nulls in - supply this string.
Another tip - when debugging it’s useful to run loadcsv.py with these options:

 -M                    Create Intermediate DLF Files (not FIFOs)
 -L                    Leave Intermediate DLF Files

Then you can examine the load error from scidb, go to the offset in the file in the scidb data directory, figure out what went wrong.


#7

thank you very much!