{ "cells": [ { "cell_type": "markdown", "id": "0", "metadata": {}, "source": [ "# Preliminaries" ] }, { "cell_type": "markdown", "id": "1", "metadata": {}, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": null, "id": "2", "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "\n", "import holoviews as hv\n", "import panel as pn\n", "\n", "from bridge.providers.vision import Coco2017Detection\n", "\n", "hv.extension(\"bokeh\")\n", "pn.extension()\n", "\n", "TMP_NOTEBOOK_ROOT = Path(\"/tmp/bridge-ds/tutorials\")" ] }, { "cell_type": "markdown", "id": "3", "metadata": {}, "source": [ "## Load Dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "4", "metadata": {}, "outputs": [], "source": [ "root_dir = TMP_NOTEBOOK_ROOT / \"coco\"\n", "\n", "provider = Coco2017Detection(root_dir, split=\"val\", img_source=\"stream\")\n", "ds = provider.build_dataset()\n", "ds" ] }, { "cell_type": "markdown", "id": "5", "metadata": {}, "source": [ "# LoadMechanism\n", "\n", "In this tutorial we will learn about the **LoadMechanism**, Bridge's way of loading raw data from different sources.\n", "\n", "A quick reminder: to access the raw data within each element, we need to use the **SampleAPI** with `sample.data / element.data`. The column `data` in the **TableAPI** usually (but not always) contains a reference to the data rather than the data itself:\n" ] }, { "cell_type": "code", "execution_count": null, "id": "6", "metadata": {}, "outputs": [], "source": [ "ds.samples.head(1)" ] }, { "cell_type": "markdown", "id": "7", "metadata": {}, "source": [ "When we want to access data for a given element, we need to call the `element.data` property. In COCO, we have elements for _images_ and for _bboxes_. Because COCO is a **SingularDataset**, every sample has a special element, in this case the image, and we can access its data directly with `sample.data`." ] }, { "cell_type": "code", "execution_count": null, "id": "8", "metadata": {}, "outputs": [], "source": [ "sample = ds.iget(0)\n", "print(\"img_data:\", sample.data.shape, \"\\n\")\n", "\n", "bbox_elements = [ann for ann in sample.elements[\"bbox\"]]\n", "print(*[bb.data for bb in bbox_elements], sep=\"\\n\")" ] }, { "cell_type": "markdown", "id": "9", "metadata": {}, "source": [ "Every element holds a **LoadMechanism**, an object responsible for loading data from different sources. In this case, for images, `element.data` will perform an HTTP request and load the image in the response. For bboxes, which already exist in-memory (note that we can see them directly in the `annotations` table), `element.data` will simply load the stored object.\n", "\n", "The **LoadMechanism** is defined by two variables:" ] }, { "cell_type": "code", "execution_count": null, "id": "10", "metadata": {}, "outputs": [], "source": [ "img_element = sample.element\n", "print(\"Image element, loaded over HTTP:\")\n", "print(\"url_or_data:\", img_element._load_mechanism.url_or_data)\n", "print(\"category:\", img_element._load_mechanism.category)\n", "print()\n", "print(\"Bbox elements, loaded from memory:\")\n", "print(\"url_or_data:\", bbox_elements[0]._load_mechanism.url_or_data)\n", "print(\"category:\", bbox_elements[0]._load_mechanism.category)" ] }, { "cell_type": "markdown", "id": "11", "metadata": {}, "source": [ "- **url_or_data**, as its name suggests, contains either a url that references the object (url broadly speaking - including local paths, s3 paths, etc.), or contains the actual object, in case we want to store it directly in-memory.\n", "- **category** - accepts a string that is used to determine which logic is used to load the object. Should we load the image using PIL? or a text file using simple `with open()`? this value determines that. To find which categories are supported, use `list_registered_categories`." ] }, { "cell_type": "markdown", "id": "12", "metadata": {}, "source": [ "## In summary\n", "1. Bridge loads data lazily, only when `element.data` is called\n", "2. The loading mechanism function accepts **url_or_data** which defines where to load from (or what to load), and **category** which defines _how_ to load it." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 5 }