---
jupytext:
  formats: md:myst
  text_representation:
    extension: .md
    format_name: myst
kernelspec:
  display_name: Python 3
  language: python
  name: python3
---

# Data by Variable

This page is an entry point for exploring the measurement data collected by the TCO group. The **data is organised by variable rather than by instrument**. It is intended for users interested in specific measured quantities, such as humidity, temperature or wind, who may not yet be familiar with the instruments used. The aim is to provide a general overview of where data products related to each variable can be found. For more detailed information on individual datasets and instruments, please refer to the relevant instrument pages.
It's intended to provide some rough guidance where to search for individual data products.
More detailed explanations can be found with the individual instruments.


```{note}
The same variable may be measured by different instruments, each of which uses a distinct measurement principle. These differences can affect the accuracy of the data, as well as its temporal and spatial resolution. Therefore, always consider the method behind the measurement when interpreting the results.
```

```{code-cell} ipython3
:tags: [remove-input]

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

import intake
import cf_xarray.units
import pint_xarray
import pandas as pd
from IPython.display import display, Markdown, HTML

links = {
    "CORAL": "BCO/raman_lidars/coral.html",
    "EARLI": "BCO/raman_lidars/earli.html",
    "LICHT": "BCO/raman_lidars/licht.html",
    "MBR2": "BCO/cloud_radars/mbr2.html",
    "MBRS": "BCO/cloud_radars/mbrs.html",
    "mrr": "BCO/mrr/index.html",
    "mrr_pro": "BCO/mrr/index.html",
    "PARSIVEL451995": "BCO/parsivel2/index.html",
    "PARSIVEL451996": "BCO/parsivel2/index.html",
    "PARSIVEL451997": "BCO/parsivel2/index.html",
    "radiation": "BCO/solar_radiation/index.html",
    "WAD": "BCO/rain_gauge_wad/index.html",
    "windlidar1": "BCO/wind_lidars/index.html",
    "windlidar2": "BCO/wind_lidars/index.html",
    "WXT": "BCO/surfacemet/index.html",
    "KATRIN": "BCO/cloud_radars/katrind.html",
    "HATPRO": "BCO/hatpro/index.html",
    "ceilometer": "BCO/ceilometer/index.html",
}

cat = intake.open_catalog("../intake/catalog.yaml")

def all_entries(cat, prefix=None):
    prefix = prefix or []
    try:
        entries = list(cat.items())
    except:
        yield ".".join(prefix), cat
    else:
        for name, entry in entries:
            yield from all_entries(entry, prefix + [name])

def guess_instrument_and_location(key):
    parts = key.split(".")
    location = parts[0]
    iparts = parts[-1].split("_")

    a = iparts[0].lower()
    b = iparts[1].upper()

    if b in links:
        instrument = b
    elif a in links:
        instrument = a
    else:
        raise ValueError(f"couldn't guess instrument for dataset {key}")

    return instrument, location

def gen_overview_frame(cat):
    for k, v in all_entries(cat):
        try:
            ds = v(chunks=None).to_dask()
        except:
            print(f"{k} is broken")
            continue
        if "time" in ds:
            ds = ds.assign(time=ds["time"].assign_attrs({"axis": "T", "standard_name": "time"}))
        for varname, variable in ds.items():
            if "T" not in variable.cf.axes:
                continue

            tstart, tend = variable.cf["time"][[0, -1]].values
            instrument, location = guess_instrument_and_location(k)
            is_vertical = "Z" in variable.cf.axes
            entry = {
                "dataset": k,
                "variable": varname,
                "standard_name": variable.attrs.get("standard_name"),
                "is_vertical": is_vertical,
                "time_start": tstart,
                "time_end": tend,
                "instrument": instrument,
                "location": location,
            }
            if is_vertical:
                zrange = variable.cf["Z"][[0, -1]].pint.quantify().pint.to("km").values
                entry["alt_min"] = min(zrange)
                entry["alt_max"] = max(zrange)

            yield entry
    

df = pd.DataFrame.from_records(list(gen_overview_frame(cat)))

def format_z(row):
    if row["is_vertical"]:
        return "column ({alt_min:.0f} - {alt_max:.0f} km)".format(**row)
    else:
        return "surface"

def short_table(df):
    out = df[["instrument", "location"]]
    out["start"] = df["time_start"].dt.date
    out["end"] = df["time_end"].dt.date
    return out

def _build_html_table(df):
    yield "<tr><th>Instrument</th><th>Vertical Coverage</th><th>Time Period</th><th>Location</th></tr>"
    for key, row in df.iterrows():
        yield f"<tr><td><a href=\"{links[row['instrument']]}\">{row['instrument']}</a></td><td>{format_z(row)}</td><td>{row['time_start']:%Y-%m} ... {row['time_end']:%Y-%m}</td><td>{row['location']}</td></tr>"
    
def build_html_table(df):
    df = df[["instrument", "location", "time_start", "time_end", "is_vertical", "alt_min", "alt_max"]].drop_duplicates()
    return HTML("<table class=\"colwidths-auto table\">" + "\n".join(list(_build_html_table(df))) + "\n</table>")

#short_table(df[df["standard_name"] == "air_temperature"])

def show_table(*keys):
    return display(build_html_table(df[df["standard_name"].isin(keys) | df["variable"].isin(keys)]))

# these are for debugging
#pd.set_option('display.max_rows', None)
#df
```

## Temperature

```{code-cell} ipython3
:tags: [remove-input]
show_table("air_temperature")
```

<!--
| Instrument | z | time period | location | comment |
| --- | --- | --- | --- | --- |
| [WXT](BCO/surfacemet/index.md) | surface | 2010-12-16 ... now | BCO | basic weather station |
| [LICHT](BCO/raman_lidars/licht.md) | column (0 - 15 km) | 2016-07-17 ... 2017-06-13 | BCO | raman lidar |
| [LICHT](BCO/raman_lidars/licht.md) | column (0 - 15 km) | 2017-06-17 ... 2019-12-06 | BCO | raman lidar |
| [CORAL](BCO/raman_lidars/coral.md) | column (0 - 29 km) | 2019-01-21 ... now | BCO | raman lidar |
| EARLI lidar | vertical column (0 - X km) |
| HATPRO | vertical column | | | (not really) |
-->

## Humidity

```{code-cell} ipython3
:tags: [remove-input]
show_table("relative_humidity", "specific_humidity", "humidity_mixing_ratio")
```

<!--
* WXT
* HATPRO
-->

## Wind

```{code-cell} ipython3
:tags: [remove-input]
show_table("wind_speed", "wind_from_direction", "eastward_wind", "northward_wind")
```

<!--
* WXT
* Wind Lidar
-->

## Rain

```{code-cell} ipython3
:tags: [remove-input]
show_table("rainfall_amount", "rainfall_rate")
```

<!--
* WAD
* Parsivel
* MRR
* WXT
-->

## Radiation

```{code-cell} ipython3
:tags: [remove-input]
show_table(
    "downwelling_shortwave_radiance_in_air",
    "downwelling_longwave_radiance_in_air",
    "downwelling_radiance_per_unit_wavelength_in_air",
    "surface_direct_downwelling_shortwave_flux_in_air",
    "surface_diffuse_downwelling_shortwave_flux_in_air",
    "surface_downwelling_shortwave_flux_in_air",
    "surface_downwelling_longwave_flux_in_air",
    "surface_upwelling_radiance_in_air",
    "surface_upwelling_radiance_per_unit_wavelength_in_air",
    "upwelling_longwave_radiance_in_air",
    "upwelling_radiance_per_unit_wavelength_in_air",
    "upwelling_shortwave_radiance_in_air",
)
```

<!--
* surface radiation
-->

## Clouds

```{code-cell} ipython3
:tags: [remove-input]
show_table(
    "equivalent_reflectivity_factor",
    "cloud_mask",
    "cloud_base_altitude",
)
```

<!--
* radar
* lidar
* ceilometer
-->
