Group-By Aggregates: rows Holding the Group-wise Maximum


#1

Just started to learn to use SciDB. Have gone through the SciDB manual (14.8), could not find any instructions on how to achieve the following: for each article, find the dealer or dealers with the most expensive price. I understand that SciDB can give the highest price for each article by using “group by artical”. But I like to know the dealer who has the highest price for that article. Is this doable by SciDB? Thanks!

The following is just how this can be easily achieved in MySQL:

SELECT article, dealer, price
FROM shop s1
WHERE price=(SELECT MAX(s2.price)
FROM shop s2
WHERE s1.article = s2.article);

±--------±-------±------+
| article | dealer | price |
±--------±-------±------+
| 0001 | B | 3.99 |
| 0002 | A | 10.99 |
| 0003 | C | 1.69 |
| 0004 | D | 19.95 |
±--------±-------±------+


#2

Sure, this comes up pretty often. Here’s an AFL example. While surely more verbose, it has some parallels to the SQL query you’ve pointed out:

create array foo <val:double> [x=0:9,10,0, y=0:9,10,0];
store(build(foo, random()%100), foo);

project(filter(cross_join(foo as A, aggregate(foo, max(val) as max, x) as B, A.x, B.x), val=max), val);
{x,y} val
{0,3} 91
{1,0} 96
{2,1} 91
{3,4} 96
{4,4} 77
{4,7} 77
{5,1} 94
{6,1} 88
{6,3} 88
{7,5} 96
{8,9} 72
{9,0} 72

A little more concise in SciDB-R:

> iqdf(subset(merge(foo, aggregate(foo, FUN=max, by="x")), "val=val_max")$val)
    i x y val
1   1 0 3  91
2   2 1 0  96
...

There were a couple of cases where a single pass over the data was strongly preferred and we used the technique of concatenating the coordinate value with the data in a user-defined type, instantiated during the query, followed by a special user-defined aggregate.

There is some ongoing thought and discussion for making the internal aggregate framework more flexible, where aggregate objects can opt to keep track of cell coordinates. But that work is lower priority to some of the more fundamental changes folks are currently working on.


#3

Thank you for the quick helpful response. It might be great if topics like this can be put in the HowTos to help the SciDB beginners like me.


#4

Thanks for your suggestion, Alin. A link to this forum post has been added to “SciDB Usage Notes” at trac.scidb.net/wiki/public.