First of all, just due to the fact Mr. Stonebraker is related to this project makes it a very interesting one to follow. For sure he has come out with some of the best technologies in the database realm, thus SciDB looks from start a good candidate.
We are seeking for a database that meets the following requirements:
- There is a powerful open source version. Even if there are commercial “versions” at least an open version is required => OK
- Allows for horizontal escalability and big ammounts of data => OK
- Data is never overwritten nor updated. Just new data is added (timeseries) and sumary data computed => OK
- Fast adhoc queries as well as backgound jobs => ?
So in my understanding it mets all requirements.
Our project (open source) tries to store network monitoring data in time from multiple devices. Currently we aggregate data on different time frames but loose the detail as data gets older (for example, raw data is kept only for 3 hours, after that data will be aggregated / summarized up to just 1 point in a day when data is older than 1 month). The fact we do this is to try to keep data size under control as well as make queries fast enough.
Our idea would be to still compute those summaries (for faster queries, summaries, etc) but still keep ALL original data for further details (legal requirements, specific diagnoses, etc) Right now we dont know if the best way is to split raw data / summary data into NoSQL / SQL or use SciDB.