How to make 3D array with 2D arrays


#1

I have 2D image arrays that were taken in different times. The 2D image arrays have longitude dimension and latitude dimension.
With these 2D arrays, I want to make 3D array that has time dimension and 2D image dimensions (longitude, latitude).

How can I make this in AFL?

Thanks for your attention.


#2

Hi!

I’ve prepared a little script and some commentary to try to answer your question. If you cut-n-paste all of the “code” bit’s here, you’ll get a single, “long”, script that illustrates how these options work.

#!/bin/sh
#
#  Some throat clearing. Useful bash/bourne shell functions. 
#
exec_afl_query () {
    echo "Query: ${1}"
    /usr/bin/time -f "Elapsed Time: %E" iquery -o dcsv ${2} -aq "${1}"
};
#
exec_aql_query () {
    echo "Query: ${1}"
    /usr/bin/time -f "Elapsed Time: %E" iquery -o dcsv ${2} -q "${1}"
};
#
#---------------------------------------------------------------------------------

OK. So it isn’t completely clear to me, from your post, quite what you’re up to. So I’ve tried to come up with two answers; one for each of two possibilities.

First, if you have managed to load the image data into a 2D array, where the data points have [ X, Y ] but have been gathered over time. This is the case with, for example, MODIS data, where the satellite “pings” the ground as it orbits and gets a single data point each time.

If your goal here is to take the datetime value, and to create one “image per day” (say), then you can create a 3D array as follows, and use the redimension_store(…) command to convert your 2D data (where time is an attribute) to a 3D array (where date is a dimension).

To see how this would work, let’s first create a Two D array to hold the source data:

CMD_HYGIENE="remove ( Two_D_Image_Data )"
exec_afl_query "${CMD_HYGIENE};"
#
CMD_CREATE_TWO_D_IMAGE_ARRAY="
CREATE ARRAY Two_D_Image_Data 
<
  r : int32, 
  g : int32, 
  b : int32, 
  t : datetime
>
[ X=0:1999,1000,0, Y=0:1999,1000,0 ]
"
exec_afl_query "${CMD_CREATE_TWO_D_IMAGE_ARRAY};"
#
CMD_POPULATE_TWO_D_IMAGE_ARRAY="
store ( 
  project ( 
    apply ( 
      cross (
        build ( < i : int32 > [ X=0:1999,1000,0 ], X ),
        build ( < i : int32 > [ Y=0:1999,1000,0 ], Y )
      ),
      r, int32(random()%255),
      g, int32(random()%255),
      b, int32(random()%255),
      t, datetime('2013-01-01 00:00:00') + ((random()%365) * 86400)
    ),
    r, g, b, t
  ),
  Two_D_Image_Data
)
"
exec_afl_query "${CMD_POPULATE_TWO_D_IMAGE_ARRAY};" -n
#
#  What does this data look like? 
#
exec_afl_query "analyze ( Two_D_Image_Data );"
#  {attribute_number} attribute_name,min,max,distinct_count,non_null_count
#  {0} 'b','0','254',255,4000000
#  {1} 'g','0','254',255,4000000
#  {2} 'r','0','254',255,4000000
#  {3} 't','\'2013-01-01 00:00:00\'','\'2013-12-31 00:00:00\'',365,4000000
#  Elapsed Time: 0:06.04

OK. Now we create a Three D target array. In this case, I’m using an integer td to hold the ‘day’ as an integer since the first day of the source data, but you should be able to turn the “second” computed with datetime() - datetime() into any interval you want.

CMD_HYGIENE="remove ( Three_D_Image_Data )"
exec_afl_query "${CMD_HYGIENE};"
#
CMD_CREATE_THREE_D_IMAGE_ARRAY="
CREATE ARRAY Three_D_Image_Data 
<
  r : int32, 
  g : int32, 
  b : int32
>
[ td=0:*,1,0, X=0:1999,1000,0, Y=0:1999,1000,0 ]
"
exec_afl_query "${CMD_CREATE_THREE_D_IMAGE_ARRAY};"
#
CMD_REDIMENSION_TWO_D_TO_THREE_D="
redimension_store ( 
  project ( 
    apply ( 
      Two_D_Image_Data,
      td, ((t - datetime('2013-01-01 00:00:00')) / 86400)
    ),
    r, g, b, td
  ),
  Three_D_Image_Data
)
"
exec_afl_query "${CMD_REDIMENSION_TWO_D_TO_THREE_D};" -n

Of course, the nature of redimension_store(…) means that you can populate your 3D target from (say) a 1D load file just as easily.

There’s a slight, practical question that we need to address here. What if the source data contains several values for the same day at the same [X, Y] location. One of the things you can do with redimension_store(…) is to apply an aggregate over the input to produce the output. For example:

CMD_HYGIENE="remove ( Three_D_Image_Data )"
exec_afl_query "${CMD_HYGIENE};"
#
CMD_CREATE_THREE_D_IMAGE_ARRAY="
CREATE ARRAY Three_D_Image_Data 
<
  ar : double NULL, 
  ag : double NULL, 
  ab : double NULL
>
[ td=0:*,1,0, X=0:1999,1000,0, Y=0:1999,1000,0 ]
"
exec_afl_query "${CMD_CREATE_THREE_D_IMAGE_ARRAY};"
#
CMD_REDIMENSION_TWO_D_TO_THREE_D="
redimension_store ( 
  project ( 
    apply ( 
      Two_D_Image_Data,
      td, ((t - datetime('2013-01-01 00:00:00')) / 86400)
    ),
    r, g, b, td
  ),
  Three_D_Image_Data,
  avg ( r ) as ar, 
  avg ( g ) as ag,
  avg ( b ) as ab
)
"
exec_afl_query "${CMD_REDIMENSION_TWO_D_TO_THREE_D};" -n

Alternatively, you might have 2 (or more) 2D arrays that you want to combine into a single 3D target. This is pretty common when what you’re doing is to build the target from a series of loads over time. The solution here is to look to the insert(…) operator.

Let’s start by creating and populating two input arrays. Note that they both have the same size / chunking, and the same attributes. They’re just two independent arrays. And, of course, they might be simply 1D load arrays taking their data from a .csv file, or a binary file.

CMD_HYGIENE="remove ( Two_D_Image_Data_First )"
exec_afl_query "${CMD_HYGIENE};"
#
CMD_CREATE_TWO_D_IMAGE_ARRAY_FIRST="
CREATE ARRAY Two_D_Image_Data_First 
<
  r : int32, 
  g : int32, 
  b : int32
>
[ X=0:1999,1000,0, Y=0:1999,1000,0 ]
"
exec_afl_query "${CMD_CREATE_TWO_D_IMAGE_ARRAY_FIRST};"
#
CMD_POPULATE_TWO_D_IMAGE_ARRAY_FIRST="
store ( 
  project ( 
    apply ( 
      cross (
        build ( < i : int32 > [ X=0:1999,1000,0 ], X ),
        build ( < i : int32 > [ Y=0:1999,1000,0 ], Y )
      ),
      r, int32(random()%255),
      g, int32(random()%255),
      b, int32(random()%255)
    ),
    r, g, b
  ),
  Two_D_Image_Data_First
)
"
exec_afl_query "${CMD_POPULATE_TWO_D_IMAGE_ARRAY_FIRST};" -n
#
#
CMD_HYGIENE="remove ( Two_D_Image_Data_Second )"
exec_afl_query "${CMD_HYGIENE};"
#
CMD_CREATE_TWO_D_IMAGE_ARRAY_SECOND="
CREATE ARRAY Two_D_Image_Data_Second 
<
  r : int32, 
  g : int32, 
  b : int32
>
[ X=0:1999,1000,0, Y=0:1999,1000,0 ]
"
exec_afl_query "${CMD_CREATE_TWO_D_IMAGE_ARRAY_SECOND};"
#
CMD_POPULATE_TWO_D_IMAGE_ARRAY_SECOND="
store ( 
  project ( 
    apply ( 
      cross (
        build ( < i : int32 > [ X=0:1999,1000,0 ], X ),
        build ( < i : int32 > [ Y=0:1999,1000,0 ], Y )
      ),
      r, int32(random()%255),
      g, int32(random()%255),
      b, int32(random()%255)
    ),
    r, g, b
  ),
  Two_D_Image_Data_Second
)
"
exec_afl_query "${CMD_POPULATE_TWO_D_IMAGE_ARRAY_SECOND};" -n

Let’s suppose that Two_D_Image_Data_First is the image for day = 1, and Two_D_Image_Data_Second is the image for day = 2. The goal is to get these two together into an array that looks like the 3D array created earlier; one with the same two dimensions as the image, but a third dimension that reflects the date. Of course, this implies that, for each value of ‘t’ (the time dimension) the values of X and Y (the two image dimensions) align correctly.

CMD_HYGIENE="remove ( Three_D_Image_Data )"
exec_afl_query "${CMD_HYGIENE};"
#
CMD_CREATE_THREE_D_IMAGE_ARRAY="
CREATE ARRAY Three_D_Image_Data 
<
  r : int32, 
  g : int32, 
  b : int32
>
[ td=0:*,1,0, X=0:1999,1000,0, Y=0:1999,1000,0 ]
"
exec_afl_query "${CMD_CREATE_THREE_D_IMAGE_ARRAY};"

The insert(…) operator is a storing operator. It takes the array you give it in its first argument, and it stores it inside the array you name in the second. This means that the size / shape / attribute list of the source data must be the same as the target. So to get the data into a form that we can use in the insert(…), you need to use redimension(…) first. The following queries illustrate how this works.

#   
CMD_INSERT_THE_FIRST_IMAGE="
insert ( 
  redimension ( 
    apply ( 
      Two_D_Image_Data_First,
      td, 1
    ),
    Three_D_Image_Data
  ),
  Three_D_Image_Data
)
"
exec_afl_query "${CMD_INSERT_THE_FIRST_IMAGE};" -n
#   
CMD_INSERT_THE_SECOND_IMAGE="
insert ( 
  redimension ( 
    apply ( 
      Two_D_Image_Data_Second,
      td, 2
    ),
    Three_D_Image_Data
  ),
  Three_D_Image_Data
)
"
exec_afl_query "${CMD_INSERT_THE_SECOND_IMAGE};" -n
#

At this point, you’ve inserted the data from the two 2D image arrays into the 3D target using the value in the td dimension to separate them. Some queries to show you that this is the case:

First, the number of cells in the Three_D_Image_Data is equal to the sum of the cells in the Two_D_Image_Data_First and Two_D_Image_Data_Second arrays.

exec_afl_query "count ( Three_D_Image_Data );"
#
#  {i} count
#  {0} 8000000
#   Elapsed Time: 0:00.20
#
CMD_SUM_OF_CELLS_IN_INPUT_ARRAYS="
apply ( 
  join ( 
    aggregate ( Two_D_Image_Data_First, count(*) as cnt1 ),
    aggregate ( Two_D_Image_Data_Second, count(*) as cnt2 )
  ), 
  total,
  cnt1 + cnt2 
)
"
exec_afl_query "${CMD_SUM_OF_CELLS_IN_INPUT_ARRAYS};"
#
# {i} cnt1,cnt2,total
# {0} 4000000,4000000,8000000
# Elapsed Time: 0:00.14

And finally, when we look at the two “slices” of Three_D_Image_Data, you can see where the data from the two input arrays has gone.

exec_afl_query "aggregate ( slice ( Three_D_Image_Data, td, 1 ), sum ( r ) );"
#
#  {i} r_sum
#  {0} 508004194
#  Elapsed Time: 0:00.71
#
exec_afl_query "aggregate ( Two_D_Image_Data_First, sum ( r ) );"
#
#  {i} r_sum
#  {0} 508004194
#  Elapsed Time: 0:00.13
#
exec_afl_query "aggregate ( slice ( Three_D_Image_Data, td, 2 ), sum ( r ) );"
#
#  {i} r_sum
#  {0} 507772278
#  Elapsed Time: 0:00.13
#
exec_afl_query "aggregate ( Two_D_Image_Data_Second, sum ( r ) );"
#
# {i} r_sum
# {0} 507772278
# Elapsed Time: 0:00.73

Hope this helps!