I plan to use multiple instances on EC2 to process each SciDB query in parallel. I have used the AMI specified in the document to initiate some instances, but I need to address two issues:
- I realize that there is a “config.ini.ec2” sample configuration file, it seems that I need to customize the server list. Shall I use the private IP of each EC2 instance to replace each server IP there? And then I should run the command scidb.py script to apply the updated configuration, right?
- SciDB is claimed to be able to process queries on a shared-nothing cluster, so how can I store the input data which can be shared by all the instances? When I run normal distributed programs on EC2, I can store the input data by S3 service, but it seems that this is not an option for SciDB, because all the data must be loaded into database first.
Perhaps setting up such an distributed environment on EC2 is too complex for me. I am also considering concurrently running multiple queries on a single EC2 instance. I think I can customize the configure file by adding the field “operator-threads”. For example, if my allocated EC2 machine has 16 cores, and operator-threads = 4, then 4 cores will work on each single query in parallel. Thus, I can run at most 4 queries concurrently, right?