Cost of evaluating SciDB on AWS


#1

Hi,

If I launch one of the SciDB AMIs on EC2 (eg SciDB 14.3, ami-7592881c) will this run for free (I have 12 months AWS free usage)?

The free tier works with EC2 micro instances but it’s not clear to me how the free tier works with pre-built AMIs.

Regards,
Mike


#2

Partially answered my own question…

When Launching the AMI, I can select the target machine and the default is the (free tier eligible) micro instance.

The AWS free tier comes with 30GB of EBS disk space.

I guess I should have followed the instructions (!) and used the v12 image but I figured documentation ages out pretty fast and that I should go for the latest version.

The latest SciDB instance is built with a 200GB image which blows the bank.

But if I select an older version (the 13.9 AMI is built with a 16GB image) it all works.

I can even launch 2 instances now!

Next I want to try building a small cluster - 4 nodes over 2 VMs…

It would be good to know how to do this without disturbing the default example.

eg
Can I create this by:

  • creating a new postgres account
  • define a new instance in the existing scidb config.ini file (referring to the new postgres account)
  • restart the scidb services (referring to the new instance)?

#3

Hi Mike,

Keep in mind, a SciDB process can be a little greedy memory-wise. The micro instance may not have enough RAM.

When setting up EC2 clusters, the big headache is IP addresses, password-less SSH and ports. EC2 has VPC and “Placement Group” features that may help ease the pain with IP addresses.
You need to make sure postgres is running on coordinator and the pg_hba.conf file allows the other instances to connect.

Ports:
22 for SSH. Instances need to be able to password-less ssh to each other as scidb user.
5432 for postgres accessible to all instances
All instances need to be able to communicate to all other instances
SciDB coordinator runs on port 1239
Other instances run on port 1240+, the first instance on each machine starts at 1240 and then increments by 1.
So if you are running 8 instances on each machine, you would need to open 1239-1247
If you are running 1 instance on each machine, open 1239-1240

Optional ports:
8080 on coordinator accessible to the outside world (for HTTP clients if you want to use it)
8787 on coordinator accessible to the outside world (for RStudio server if you want to use it)
8888 on coordinator accessible to the outside world (for iPython notebook if you want to use it)

For high-performance linear algebra, SciDB uses Scalapack/MPI and we recently discovered MPI will pick a random port on every query.
So to run linear algebra (operators gemm and gesvd) you need to open nearly all ports
10000-65535 accessible to all instances; note: you don’t need to open them to the outside world, just for the instances to talk to each other
If you don’t do this, operators gemm/gesvd won’t work, but the rest of the system will run OK.

Let us know how it goes!

  • Alex Poliakov