Time selection for gridded datasets#
There are several options for selecting from a gridded dataset based on time:
Select data within a given time range
Conditional selection (e.g., selecting only certain seasons, only daytime data, etc). This can be achieved by selecting data from certain years, months, days of year, or hours of the day.
import ecodata as eco
import xarray as xr
import pandas as pd
# Print the start and end time in the dataset
def print_dataset_start_end(ds):
print(f"Dataset start: {ds.time.min().values}")
print(f"Dataset end: {ds.time.max().values}")
# ECMWF dataset
filein = eco.get_path("ECMWF_subset.nc")
ds = xr.load_dataset(filein)
print_dataset_start_end(ds)
Dataset start: 2008-01-01T00:00:00.000000000
Dataset end: 2008-12-31T23:00:00.000000000
Selecting data within a certain time range#
select_time_range is used to select data within a given time range by specifying the
start and end of the time range. If the start or end are not provided, the function will
default to using the earliest or latest time in the dataset.
# Selecting a time slice
ds2 = eco.select_time_range(ds, start_time='2008-02-01 05:00', end_time='2008-03-01 13:00')
print_dataset_start_end(ds2)
Dataset start: 2008-02-01T05:00:00.000000000
Dataset end: 2008-03-01T13:00:00.000000000
# Selecting a time slice - give only start time
ds2 = eco.select_time_range(ds, start_time = '2008-02-01')
print_dataset_start_end(ds2)
Dataset start: 2008-02-01T00:00:00.000000000
Dataset end: 2008-12-31T23:00:00.000000000
# Selecting a time slice - give only end time
ds2 = eco.select_time_range(ds, end_time = '2008-01-11')
print_dataset_start_end(ds2)
Dataset start: 2008-01-01T00:00:00.000000000
Dataset end: 2008-01-11T23:00:00.000000000
Conditional selection#
select_time_cond is used to select data from certain years, months, days of
year, or hours of day. These conditions can be applied in combination, and can be
specified as either a list of specific values or as a range.
# ECMWF dataset
filein = eco.get_path("MOD13A1.006_500m_aid0001_all.nc")
ds = xr.load_dataset(filein)
print_dataset_start_end(ds)
Dataset start: 2000-02-18 00:00:00
Dataset end: 2009-02-18 00:00:00
The function can be used to select a list of specific (non-consecutive) years:
ds2 = eco.select_time_cond(ds, years=[2000, 2005])
# Years in the resulting dataset
pd.unique(ds2.time.dt.year)
array([2000, 2005])
A range of years can also be specified:
ds2 = eco.select_time_cond(ds, year_range=[2001,2004])
# Years in the resulting dataset
pd.unique(ds2.time.dt.year)
array([2001, 2002, 2003, 2004])
A list of specific values and a range can be used in combination:
ds2 = eco.select_time_cond(ds, months=[1, 2], month_range=[10,12])
# Months in the resulting dataset
sorted(pd.unique(ds2.time.dt.month))
[1, 2, 10, 11, 12]
# ECMWF dataset
filein = eco.get_path("ECMWF_subset.nc")
ds = xr.load_dataset(filein)
Using a combination of different variables:
ds2 = eco.select_time_cond(ds, years=[2008], dayofyear_range=[209,220], hour_range=[10,15])
# Days of year in the resulting dataset
sorted(pd.unique(ds2.time.dt.dayofyear))
[209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220]
# Hours of day in the resulting dataset
sorted(pd.unique(ds2.time.dt.hour))
[10, 11, 12, 13, 14, 15]