{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Time selection for gridded datasets\n", "\n", "There are several options for selecting from a gridded dataset based on time:\n", "- Select data within a given time range\n", "- Conditional selection (e.g., selecting only certain seasons, only daytime data, etc). \n", " This can be achieved by selecting data from certain years, months, days of year, or hours of the day." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import ecodata as eco \n", "import xarray as xr \n", "import pandas as pd " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Print the start and end time in the dataset\n", "def print_dataset_start_end(ds):\n", " print(f\"Dataset start: {ds.time.min().values}\")\n", " print(f\"Dataset end: {ds.time.max().values}\")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# ECMWF dataset \n", "filein = eco.get_path(\"ECMWF_subset.nc\")\n", "ds = xr.load_dataset(filein)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset start: 2008-01-01T00:00:00.000000000\n", "Dataset end: 2008-12-31T23:00:00.000000000\n" ] } ], "source": [ "print_dataset_start_end(ds)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Selecting data within a certain time range \n", "\n", "``select_time_range`` is used to select data within a given time range by specifying the \n", "start and end of the time range. If the start or end are not provided, the function will \n", "default to using the earliest or latest time in the dataset. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset start: 2008-02-01T05:00:00.000000000\n", "Dataset end: 2008-03-01T13:00:00.000000000\n" ] } ], "source": [ "# Selecting a time slice \n", "ds2 = eco.select_time_range(ds, start_time='2008-02-01 05:00', end_time='2008-03-01 13:00')\n", "print_dataset_start_end(ds2)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset start: 2008-02-01T00:00:00.000000000\n", "Dataset end: 2008-12-31T23:00:00.000000000\n" ] } ], "source": [ "# Selecting a time slice - give only start time \n", "ds2 = eco.select_time_range(ds, start_time = '2008-02-01')\n", "print_dataset_start_end(ds2)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset start: 2008-01-01T00:00:00.000000000\n", "Dataset end: 2008-01-11T23:00:00.000000000\n" ] } ], "source": [ "# Selecting a time slice - give only end time \n", "ds2 = eco.select_time_range(ds, end_time = '2008-01-11')\n", "print_dataset_start_end(ds2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conditional selection \n", "\n", "``select_time_cond`` is used to select data from certain years, months, days of \n", "year, or hours of day. These conditions can be applied in combination, and can be \n", "specified as either a list of specific values or as a range. " ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset start: 2000-02-18 00:00:00\n", "Dataset end: 2009-02-18 00:00:00\n" ] } ], "source": [ "# ECMWF dataset \n", "filein = eco.get_path(\"MOD13A1.006_500m_aid0001_all.nc\")\n", "ds = xr.load_dataset(filein)\n", "print_dataset_start_end(ds)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The function can be used to select a list of specific (non-consecutive) years:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([2000, 2005])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds2 = eco.select_time_cond(ds, years=[2000, 2005])\n", "\n", "# Years in the resulting dataset\n", "pd.unique(ds2.time.dt.year)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A range of years can also be specified:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([2001, 2002, 2003, 2004])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds2 = eco.select_time_cond(ds, year_range=[2001,2004])\n", "\n", "# Years in the resulting dataset\n", "pd.unique(ds2.time.dt.year)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A list of specific values and a range can be used in combination:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2, 10, 11, 12]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds2 = eco.select_time_cond(ds, months=[1, 2], month_range=[10,12])\n", "\n", "# Months in the resulting dataset\n", "sorted(pd.unique(ds2.time.dt.month))" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# ECMWF dataset \n", "filein = eco.get_path(\"ECMWF_subset.nc\")\n", "ds = xr.load_dataset(filein)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using a combination of different variables:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "ds2 = eco.select_time_cond(ds, years=[2008], dayofyear_range=[209,220], hour_range=[10,15])" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Days of year in the resulting dataset\n", "sorted(pd.unique(ds2.time.dt.dayofyear))" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[10, 11, 12, 13, 14, 15]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Hours of day in the resulting dataset\n", "sorted(pd.unique(ds2.time.dt.hour))" ] } ], "metadata": { "kernelspec": { "display_name": "pmv-dev", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.15" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "5b6f9a2c562b6e60868cbf0c86ab18522a63e76e5fe9fe366e3c27fb9acdc7d5" } } }, "nbformat": 4, "nbformat_minor": 2 }