Time selection for gridded datasets#

There are several options for selecting from a gridded dataset based on time:

  • Select data within a given time range

  • Conditional selection (e.g., selecting only certain seasons, only daytime data, etc). This can be achieved by selecting data from certain years, months, days of year, or hours of the day.

import ecodata as eco 
import xarray as xr 
import pandas as pd 
# Print the start and end time in the dataset
def print_dataset_start_end(ds):
    print(f"Dataset start: {ds.time.min().values}")
    print(f"Dataset end: {ds.time.max().values}")
# ECMWF dataset 
filein = eco.get_path("ECMWF_subset.nc")
ds = xr.load_dataset(filein)
print_dataset_start_end(ds)
Dataset start: 2008-01-01T00:00:00.000000000
Dataset end: 2008-12-31T23:00:00.000000000

Selecting data within a certain time range#

select_time_range is used to select data within a given time range by specifying the start and end of the time range. If the start or end are not provided, the function will default to using the earliest or latest time in the dataset.

# Selecting a time slice 
ds2 = eco.select_time_range(ds, start_time='2008-02-01 05:00', end_time='2008-03-01 13:00')
print_dataset_start_end(ds2)
Dataset start: 2008-02-01T05:00:00.000000000
Dataset end: 2008-03-01T13:00:00.000000000
# Selecting a time slice - give only start time 
ds2 = eco.select_time_range(ds, start_time = '2008-02-01')
print_dataset_start_end(ds2)
Dataset start: 2008-02-01T00:00:00.000000000
Dataset end: 2008-12-31T23:00:00.000000000
# Selecting a time slice - give only end time 
ds2 = eco.select_time_range(ds, end_time = '2008-01-11')
print_dataset_start_end(ds2)
Dataset start: 2008-01-01T00:00:00.000000000
Dataset end: 2008-01-11T23:00:00.000000000

Conditional selection#

select_time_cond is used to select data from certain years, months, days of year, or hours of day. These conditions can be applied in combination, and can be specified as either a list of specific values or as a range.

# ECMWF dataset 
filein = eco.get_path("MOD13A1.006_500m_aid0001_all.nc")
ds = xr.load_dataset(filein)
print_dataset_start_end(ds)
Dataset start: 2000-02-18 00:00:00
Dataset end: 2009-02-18 00:00:00

The function can be used to select a list of specific (non-consecutive) years:

ds2 = eco.select_time_cond(ds, years=[2000, 2005])

# Years in the resulting dataset
pd.unique(ds2.time.dt.year)
array([2000, 2005])

A range of years can also be specified:

ds2 = eco.select_time_cond(ds, year_range=[2001,2004])

# Years in the resulting dataset
pd.unique(ds2.time.dt.year)
array([2001, 2002, 2003, 2004])

A list of specific values and a range can be used in combination:

ds2 = eco.select_time_cond(ds, months=[1, 2], month_range=[10,12])

# Months in the resulting dataset
sorted(pd.unique(ds2.time.dt.month))
[1, 2, 10, 11, 12]
# ECMWF dataset 
filein = eco.get_path("ECMWF_subset.nc")
ds = xr.load_dataset(filein)

Using a combination of different variables:

ds2 = eco.select_time_cond(ds, years=[2008], dayofyear_range=[209,220], hour_range=[10,15])
# Days of year in the resulting dataset
sorted(pd.unique(ds2.time.dt.dayofyear))
[209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220]
# Hours of day in the resulting dataset
sorted(pd.unique(ds2.time.dt.hour))
[10, 11, 12, 13, 14, 15]