Description#

ecodata provides tools to efficiently subset large geospatial datasets to an area of interest. A few examples are shown on this page, but see the Examples section for more demonstrations!

Key Features

  • Efficiently subset large geospatial datasets to a smaller area of interest, and write the results to new files that can be used in other analysis tools

  • Choose from three options for providing a subsetting boundary:

    • Provide bounding box coordinates

    • Provide another geospatial datalayer (e.g., a shapefile with a region boundary)

    • Provide a csv file of Movebank animal track data, and a boundary will be drawn that encompasses the track locations

  • Options to use both rectangular subsetting boundaries and irregular boundary shapes

Note

ecodata is in the early stages of development, and any feedback is welcome! If you have any suggestions or feature requests, enounter any bugs, or come across places where the documentation is unclear, please submit a GitHub issue!

A few examples:#

Subsetting the GRIP global roads dataset using a bounding box#

If you only need to subset a small area of a spatial dataset, this can be done very quickly even if you are subsetting from a very large dataset (in this example, the GRIP global roads dataset, which is over 4 GB):

%%time
roads_subset, boundary = eco.subset_data(roadsfile, bbox=bbox)
CPU times: user 11.4 ms, sys: 4.26 ms, total: 15.7 ms
Wall time: 20.7 ms
_images/17b0431148b3fd2af2f1427eea05bb558efe7bcfa5c77ad43cc5abb11ef85456.png

Larger subsets with a lot of features will take a bit longer, but are still quite manageable. The requested subset in the next example has over 300,000 records:

%%time
roads_subset, boundary = eco.subset_data(roadsfile, bbox=bbox)
CPU times: user 10.4 s, sys: 374 ms, total: 10.8 s
Wall time: 11.1 s
_images/d7d5075e73842e20fae41d49dfe22cd6fb9ae6b239f5947020fc026d91c5a557.png

Subsetting using bounding geometry from a file#

In this case, subsetting using the Y2Y region boundary. The roads dataset and the GIS layer with the Y2Y boundary are in different projections; this is handled.

You can use a rectangular bounding box around the region boundary for subsetting, but there are also options to instead use the actual region boundary, or a convex hull around the region boundary. These options can also be used with a buffer size around the boundary, if you want to get features that are close to the boundary as well.

Here, demonstrating using the actual region boundary to subset. This option can save a lot of time if you have an irregular boundary shape, and don’t actually need all of the features in the whole bounding box outside the boundary shape:

%%time
roads_subset, boundary = eco.subset_data(roadsfile, bounding_geom=y2yfile, 
                                         boundary_type="mask")
CPU times: user 2.18 s, sys: 98.8 ms, total: 2.28 s
Wall time: 2.54 s
_images/c227c034be75c72b2acc590533b92ca0b05571ddc2a615a9ba73a0a9dcaf1c94.png

Subsetting using animal track data#

Instead of providing bounding box coordinates or a region boundary for subsetting, you can also provide a csv file with animal track data. A boundary will be drawn around all the track points, with the option for either a rectangular boundary or convex hull.

You can also specify a buffer size with this option.

In the next example, demonstrating the option to use a convex hull around the track points, with a buffer. Note that the track data in this case includes almost 250,000 points.

%%time
roads_subset, boundary = eco.subset_data(roadsfile, track_points=track_file, 
                                         boundary_type='convex_hull', buffer=0.2)
CPU times: user 8.89 s, sys: 85.2 ms, total: 8.97 s
Wall time: 8.98 s
_images/3a6f472cbafa133add5d20c71866024f3ca4019c5fa391032af2fe401202b03d.png

Contents#

Indices and tables#