Hello, is it possible to perform nearest neighbor, linear, bilinear, bicubic or other type of interpolation on the array?
Use case: given a wind speed on a 0.5 x 0.5 grid, obtain wind speed on 0.25 x 0.25 grid.
Thanks.
Hello, is it possible to perform nearest neighbor, linear, bilinear, bicubic or other type of interpolation on the array?
Use case: given a wind speed on a 0.5 x 0.5 grid, obtain wind speed on 0.25 x 0.25 grid.
Thanks.
The short answer is, use xgrid(…). But a longer answer with an example implementation follows.
This is actually a pretty common problem. For example, suppose I have a remote sensing image where each pixel of the image is 0.25 degrees x 0.25 degrees, and I want to combine it with another data set where each pixel is (say) 0.75 degrees x 0.75 degrees. This kind of thing happens all the time when you try to combine data generated by different devices or sensors constructed with different purposes. What we’ll explore here is a pretty simple-minded solution to the problem. But its also quite flexible and works well enough in practice.
The basic problem is easy enough to illustrate. Suppose we have two sensors (say, a satellite, and RADAR) that are are monitoring some phenomenon like cloud cover. The RADAR data might have “pixels” that are 1 degree by 1 degree, while the satellite might be 0.1 degree by 0.1 degree. That is, there are 100 satellite pixels within each RADAR pixel.
Now … here’s a critical (initial) assumption that we’ll talk about more in a minute. Let’s start with the assumption that the [0,0] pixel of the RADAR data is geo-spatially aligned with the [0,0] pixel of the satellite data. So the satellite cell at [10,10] is geo-spatially aligned with the RADAR data at [1,1]. We might represent these two data sets in the following way. Each cell of the RADAR array might contain a different value that refers to a sub-portion of the area covered by the single cell of the Satellite array.
RADAR Satellite
. . . . . . . . . . . . .
.+--+--+--+--+--+--+--+--+--+--+. .+-----------------------------+.
09 | | | | | | | | | | | | |
.+--+--+--+--+--+--+--+--+--+--+. | |
08 | | | | | | | | | | | | |
.+--+--+--+--+--+--+--+--+--+--+. | |
07 | | | | | | | | | | | | |
.+--+--+--+--+--+--+--+--+--+--+. | |
06 | | | | | | | | | | | | |
.+--+--+--+--+--+--+--+--+--+--+. | |
05 | | | | | | | | | | | | |
.+--+--+--+--+--+--+--+--+--+--+. 00 | |
04 | | | | | | | | | | | | |
.+--+--+--+--+--+--+--+--+--+--+. | |
03 | | | | | | | | | | | | |
.+--+--+--+--+--+--+--+--+--+--+. | |
02 | | | | | | | | | | | | |
.+--+--+--+--+--+--+--+--+--+--+. | |
01 | | | | | | | | | | | | |
.+--+--+--+--+--+--+--+--+--+--+. | |
00 | | | | | | | | | | | | |
.+--+--+--+--+--+--+--+--+--+--+. .+-----------------------------+.
00 01 02 03 04 05 06 07 08 09 00
SciDB has two operators that first, “break a single cell up into lots of little cells”, and second “combine some number of smaller cells into a single, bigger one”. These are the xgrid(…) an regrid(…) operators, respectively. A key thing to keep in mind about both of these operators is that although they change the logical number of pixels in the array, this change does not reflect any change in the size of the physical region that the array’s data represents. It’s just like two different scale maps of the same geographic region. That is, even though you might increase the number of cells in the Satellite array a hundred-fold it does not follow that the region covered by the array gets any larger. In the language of the OP, the values are “interpolated” within the space, with each now smaller region holding the larger data value.
Let’s work an example to see how this works.
First, a little hygiene to clean up in case we’re re-running the script …
SET LANG AFL;
remove ( RADAR );
remove ( Satellite );
Second, let’s create some example arrays.
The first array holds a 10,000 x 10,000 array with RADAR data. The second a 1,000 x 1,000 array of Satellite data The origin of both of these arrays at [0,0] is the same geographic location and the [ 9999,9999 ] cell in the RADAR ends at the same location as the Satellite [999,999] cell. As a result, the Satellite [0,0] cell covers the same region as the RADAR[0,0]->[10,10] block of cells.
CREATE ARRAY RADAR
< value : int32 >
[ X=0:9999,1000,0, Y=0:9999,1000,0 ];
--
CREATE ARRAY Satellite
< value : int32 >
[ X=0:999,1000,0, Y=0:999,1000,0 ];
Just to give us something to work with, let’s populate these arrays with random() data. But to keep things close to reasonable, I’ll divide the “space” into 4 sub-regions, and vary the data by a random amount on within each square.
SET LANG AFL;
SET NO FETCH;
store (
build (
RADAR,
((X/5000) * 2 + (Y/5000)) * 100 + random()%100 - 50
),
RADAR
);
SET NO FETCH;
store (
build (
Satellite,
((X/500) * 2 + (Y/500)) * 100 + random()%100 - 50
),
Satellite
);
And here’s what the data looks like in those four quadrants of each array.
regrid (
RADAR,
5000, 5000,
AVG ( value )
);
{X,Y} value_avg
{0,0} -0.500138
{0,1} 99.4979
{1,0} 199.504
{1,1} 299.511
store (
build (
Satellite,
((X/500) * 2 + (Y/500)) * 100 + random()%100 - 50
),
Satellite
);
{X,Y} value_avg
{0,0} -0.437736
{0,1} 99.5467
{1,0} 199.58
{1,1} 299.526
Get the idea? Each array covers the same region, but at different “scales”.
How do you “extrapolate” the 1,000 x 1,000 Satellite array to the same scale as the 10,000 x 10,000 RADAR array? Use the SciDB xgrid(…) operator. xgrid() takes as input an array, and for each dimension in the input array a “scale factor”. Then it produces as output an array that is scaled up along each dimension by the scale factor with each block of new cells containing values from one cell in the input.
The following figure provides a small illustration of what xgrid(…) does:
Input xgrid ( Input, 2, 3 );
0 1 2 0 1 2 3 4 5
+-----+-----+-----+ +-----+-----+-----+-----+-----+-----+
2 | 0 | 1 | 2 | 08 | 0 | 0 | 1 | 1 | 2 | 2 |
+-----+-----+-----+ +-----+-----+-----+-----+-----+-----+
1 | 3 | 4 | 5 | 07 | 0 | 0 | 1 | 1 | 2 | 2 |
+-----+-----+-----+ +-----+-----+-----+-----+-----+-----+
0 | 6 | 7 | 8 | 06 | 0 | 0 | 1 | 1 | 2 | 2 |
+-----+-----+-----+ +-----+-----+-----+-----+-----+-----+
05 | 3 | 3 | 4 | 4 | 5 | 5 |
+-----+-----+-----+-----+-----+-----+
04 | 3 | 3 | 4 | 4 | 5 | 5 |
+-----+-----+-----+-----+-----+-----+
03 | 3 | 3 | 4 | 4 | 5 | 5 |
+-----+-----+-----+-----+-----+-----+
02 | 6 | 6 | 7 | 7 | 8 | 8 |
+-----+-----+-----+-----+-----+-----+
01 | 6 | 6 | 7 | 7 | 8 | 8 |
+-----+-----+-----+-----+-----+-----+
00 | 6 | 6 | 7 | 7 | 8 | 8 |
+-----+-----+-----+-----+-----+-----+
A more realistic question is something like, “What is the biggest and the smallest difference between what the RADAR data records for one of its cells, and what the Satellite records for the same location?” To answer this, we first “interpolate” (or xgrid(…)) the small scale data set to the same size as the larger, join(…) the two, find the difference, and then find the largest difference. Such a query looks like this:
SET LANG AFL;
SET FETCH;
aggregate (
apply (
join (
RADAR,
xgrid (
repart (
Satellite,
< value : int32 > [ X=0:999,100,0, Y=0:999,100,0 ]
),
10, 10
) AS SAT
),
difference, abs( RADAR.value - SAT.value )
),
MIN ( difference ) AS smallest_difference,
MAX ( difference ) AS largest_difference
);
Some quick notes about this query …
See the innermost repart(…)? The one that turns the Satellite array from chunks of 1000 x 1000 to 100 x 100? That’s not strictly necessary. But recall: the output of the xgrid(…) will be an array that is (logically) 100x the size of the input array, with chunks that are 100x larger. In fact it won’t be that big, as the internal encoding scheme will cut the size down. But th repart(…) will move data to the instances on which the “expanded” data will find itself, where the xgrid(…) will move data after it has expanded. In this case, moving pre-expansion is a better idea. So, I’ve illustrated how to use the inner repart. (This is something our optimizer will do automatically … )
The semantic correctness of this query depends on the assumption that the RADAR[0,0] and the Satellite[0,0] refer to the same point, and that the RADAR[9999,9999] and the Satellite[999,999] also refer to the same point. Another way to say this is that the region covered by both arrays is the same. Of course, if you’re looking at real RADAR and Satellite data then things like the curvature of the earth become relevant … the detailed sizes and shapes of the data will never align perfectly. (We’re working on the general problem of how to integrate open-source tools like GDAL into SciDB, and a google search will reveal some encouraging early work on this.)
OK. So what happens if the RADAR[0,0] and the Satellite[0,0] don’t align? Suppose Satellite[0,0]->[999,999] actually overlaps with RADAR[1500,1500]->[5999,5999], say? Now there are two issues:
The size of the RADAR data that overlaps the Satellite data is 4,500 x 4,500, while the RADAR is 1,000 x 1,000. In the general case, the two input sizes might be co-prime. That is, there might not be a convenient integer multiplier that re-sizes the first array into the second array’s size.
The regions that the arrays cover don’t coincide. So we need to move one of them.
The first problem can be overcome by using xgrid(…) on both inputs to size them up so they can be joined. This means calculating for each dimension of each input array, the Lowest Common Multiple of each chunk length. In this case, the LCM is 9,000. We will need to scale up the RADAR by 2, and the Satellite by 9.
The second problem can be resolved by “clipping” out the relevant region using subarray(…). Recall that, in contrast with between(…) or filter(…), subarray(…) changes the size of the array. It transposes the indicated section from the input array back to the origin, that is, back to [0,0], which aligns the origins of the two inputs.
The following query illustrates both both of these techniques. We take the subarray(…) of the RADAR array that covers the common region, and then adjust the sizes of the two inputs (using the same repart(…) and then xgrid(…) procedure introduced above) so that they can be joined and compared.
SET LANG AFL;
aggregate (
apply (
join (
xgrid (
repart (
subarray ( RADAR, 1500, 1500, 5999, 5999 ),
< value : int32 > [ X=0:4499,450,0, Y=0:4499,450,0 ]
),
2, 2
),
xgrid (
repart (
Satellite,
< value : int32 > [ X=0:999,100,0, Y=0:999,100,0 ]
),
9, 9
) AS SAT
),
difference, abs( RADAR.value - SAT.value )
),
MIN ( difference ) AS smallest_difference,
MAX ( difference ) AS largest_difference
);
And if you want to change the shape of the array to a smaller scale, use the redimension(…) operator. Which we illustrate blow …
This last query:
SET LANG AFL;
redimension (
apply (
join (
xgrid (
repart (
subarray ( RADAR, 1500, 1500, 5999, 5999 ),
< value : int32 > [ X=0:4499,450,0, Y=0:4499,450,0 ]
),
2, 2
),
xgrid (
repart (
Satellite,
< value : int32 > [ X=0:999,100,0, Y=0:999,100,0 ]
),
9, 9
) AS SAT
),
difference, abs( RADAR.value - SAT.value ),
I, X/9,
J, Y/9
),
< smallest_difference : int32, largest_difference : int32,
average_difference : double >
[ I=0:999,1000,0, J=0:999,1000,0 ],
MIN ( difference ) AS smallest_difference,
MAX ( difference ) AS largest_difference,
AVG ( difference ) AS average_difference
);
Hope this helps!
Wow!
Thanks for the detailed guidlines!
So, xgrid(…) provides only nearest neighbor filling? Is it possible to achieve any other kind of interpolation, e.g. linear, bilinear, bicubic?
Thanks again!
Oh boy …
All of those other interpolations involve some kind of convolution, in addition to the basic interpolation.
We don’t currently have any convolution functionality. Watch this space! It’s something we’re working on.