{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# Preliminaries"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1",
   "metadata": {},
   "source": [
    "## Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "import holoviews as hv\n",
    "import panel as pn\n",
    "\n",
    "from bridge.providers.vision import Coco2017Detection\n",
    "\n",
    "hv.extension(\"bokeh\")\n",
    "pn.extension()\n",
    "\n",
    "TMP_NOTEBOOK_ROOT = Path(\"/tmp/bridge-ds/tutorials\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3",
   "metadata": {},
   "source": [
    "## Load Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4",
   "metadata": {},
   "outputs": [],
   "source": [
    "root_dir = TMP_NOTEBOOK_ROOT / \"coco\"\n",
    "\n",
    "provider = Coco2017Detection(root_dir, split=\"val\", img_source=\"stream\")\n",
    "ds = provider.build_dataset()\n",
    "ds"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5",
   "metadata": {},
   "source": [
    "# LoadMechanism\n",
    "\n",
    "In this tutorial we will learn about the **LoadMechanism**, Bridge's way of loading raw data from different sources.\n",
    "\n",
    "A quick reminder: to access the raw data within each element, we need to use the **SampleAPI** with `sample.data / element.data`. The column `data` in the **TableAPI** usually (but not always) contains a reference to the data rather than the data itself:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6",
   "metadata": {},
   "outputs": [],
   "source": [
    "ds.samples.head(1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7",
   "metadata": {},
   "source": [
    "When we want to access data for a given element, we need to call the `element.data` property. In COCO, we have elements for _images_ and for _bboxes_. Because COCO is a **SingularDataset**, every sample has a special element, in this case the image, and we can access its data directly with `sample.data`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8",
   "metadata": {},
   "outputs": [],
   "source": [
    "sample = ds.iget(0)\n",
    "print(\"img_data:\", sample.data.shape, \"\\n\")\n",
    "\n",
    "bbox_elements = [ann for ann in sample.elements[\"bbox\"]]\n",
    "print(*[bb.data for bb in bbox_elements], sep=\"\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9",
   "metadata": {},
   "source": [
    "Every element holds a **LoadMechanism**, an object responsible for loading data from different sources. In this case, for images, `element.data` will perform an HTTP request and load the image in the response. For bboxes, which already exist in-memory (note that we can see them directly in the `annotations` table), `element.data` will simply load the stored object.\n",
    "\n",
    "The **LoadMechanism** is defined by two variables:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10",
   "metadata": {},
   "outputs": [],
   "source": [
    "img_element = sample.element\n",
    "print(\"Image element, loaded over HTTP:\")\n",
    "print(\"url_or_data:\", img_element._load_mechanism.url_or_data)\n",
    "print(\"category:\", img_element._load_mechanism.category)\n",
    "print()\n",
    "print(\"Bbox elements, loaded from memory:\")\n",
    "print(\"url_or_data:\", bbox_elements[0]._load_mechanism.url_or_data)\n",
    "print(\"category:\", bbox_elements[0]._load_mechanism.category)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11",
   "metadata": {},
   "source": [
    "- **url_or_data**, as its name suggests, contains either a url that references the object (url broadly speaking - including local paths, s3 paths, etc.), or contains the actual object, in case we want to store it directly in-memory.\n",
    "- **category** - accepts a string that is used to determine which logic is used to load the object. Should we load the image using PIL? or a text file using simple `with open()`? this value determines that. To find which categories are supported, use `list_registered_categories`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12",
   "metadata": {},
   "source": [
    "## In summary\n",
    "1. Bridge loads data lazily, only when `element.data` is called\n",
    "2. The loading mechanism function accepts **url_or_data** which defines where to load from (or what to load), and **category** which defines _how_ to load it."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}