Variable length csv files


#1

Hi I am trying to model purchasing behaviour in scidb. I have a list of customers with a set number of fields and no nulls. Each customer will have shopped a number of times over the year for different numbers of different items. So I have a csv file with a line for each time a customer shops and a list of items purchased and price paid. Items (including number of) and price vary each time they shop.

Sorts of questions are around male customers tend to shop at time X and purchase these items. People who shop and time Y tend to buy these items etc.

Coming from relational land I’m struggling to model this in scidb. Can I load a csv with a variable number of fields per row?

Thanks

Kevin


#2

I think I’ve solved it by thinking in SciDB arrays . I’m too fixated on rows and columns. If I turn the data around to a single item being bought by a customer at one time - and that being the csv row. It think it will work. going to be a lot of rows!


#3

@townheadbluesboy your one data point per row idea sounds solid.

Also, if you know in advance the maximum number of columns in one row, you can probably try aio_input from the accelerated_io_tools plugin https://github.com/Paradigm4/accelerated_io_tools#trivial-end-to-end-example