Display Engine#

Preliminaries#

Imports#

[1]:

from pathlib import Path

TMP_NOTEBOOK_ROOT = Path("/tmp/bridge-ds/tutorials")

DisplayEngine#

In the previous tutorial, we’ve written a DatasetProvider for the text classification dataset Large Movie Review Dataset. However, we didn’t use a custom DisplayEngine, so our visualization was lacking:

[2]:

from bridge.display.basic import SimplePrints
from bridge.providers.text import LargeMovieReviewDataset

provider = LargeMovieReviewDataset(root=TMP_NOTEBOOK_ROOT / "imdb", split="train", download=True)
ds = provider.build_dataset(display_engine=SimplePrints())
ds.iget(0).show()

Archive file aclImdb_v1.tar.gz already exists, skipping download.
{
    "Sample ID 3944_3": {
        "Elements": {
            "etype=text": [
                {
                    "element_id": "text_3944_3",
                    "element_type": "text",
                    "sample_id": "3944_3",
                    "data": "/tmp/bridge-ds/tutorials/imdb/aclImdb/train/neg/3944_3.txt",
                    "category": "text",
                    "is_example": true
                }
            ],
            "etype=class_label": [
                {
                    "element_id": "label_3944_3",
                    "element_type": "class_label",
                    "sample_id": "3944_3",
                    "data": "neg",
                    "category": "obj",
                    "is_example": false
                }
            ]
        }
    }
}

Class Structure#

We can improve this “viz” by writing our own DisplayEngine. For starters, let’s see which methods we need to implement:

class MyDisplayEngine(DisplayEngine):
    def show_element(
        self,
        element,
        element_plot_kwargs: Dict[str, Any] | None = None,
    ):
        pass

    def show_sample(
        self,
        sample,
        element_plot_kwargs: Dict[str, Any] | None = None,
        sample_plot_kwargs: Dict[str, Any] | None = None,
    ):
        pass

    def show_dataset(
        self,
        dataset,
        element_plot_kwargs: Dict[str, Any] | None = None,
        sample_plot_kwargs: Dict[str, Any] | None = None,
        dataset_plot_kwargs: Dict[str, Any] | None = None,
    ):
        pass

Seems straightforward enough. the DisplayEngine object implements methods to show individual annotations, samples, and datasets.

Let’s build our own DisplayEngine from the bottom up, starting with text and class elements. We will use Panel, but sure enough you can implement your own however you’d like:

[3]:

from typing import Any, Dict

import panel as pn

from bridge.display.basic import DisplayEngine

pn.extension()


class TextClassification(DisplayEngine):
    def show_element(self, element, element_plot_kwargs: Dict[str, Any] | None = None):
        if element.etype == "class_label":
            return pn.pane.Markdown(element.to_pd_series().to_frame().T.to_markdown())
        elif element.etype == "text":
            return pn.pane.Markdown(element.data)
        else:
            raise NotImplementedError()

    def show_sample(
        self,
        sample,
        element_plot_kwargs: Dict[str, Any] | None = None,
        sample_plot_kwargs: Dict[str, Any] | None = None,
    ):
        pass

    def show_dataset(
        self,
        dataset,
        element_plot_kwargs: Dict[str, Any] | None = None,
        sample_plot_kwargs: Dict[str, Any] | None = None,
        dataset_plot_kwargs: Dict[str, Any] | None = None,
    ):
        pass

To test it:

[4]:

engine = TextClassification()
sample = ds.iget(0)
text_element = sample.element  # SingularSample exposes the text element specifically
label_element = sample.annotations["class_label"][0]  # the class labels in this case are annotations

pn.Column(engine.show_element(label_element), engine.show_element(text_element))

[4]:

Looks good. Now, if we want to display an entire sample rather than individual elements:

[5]:

from typing import Any, Dict

import pandas as pd
import panel as pn

pn.extension()


class TextClassification(DisplayEngine):
    def show_element(self, element, element_plot_kwargs: Dict[str, Any] | None = None):
        if element.etype == "class_label":
            return pn.pane.Markdown(element.to_pd_series().to_frame().T.to_markdown())
        elif element.etype == "text":
            return pn.pane.Markdown(element.data)
        else:
            raise NotImplementedError()

    def show_sample(
        self,
        sample,
        element_plot_kwargs: Dict[str, Any] | None = None,
        sample_plot_kwargs: Dict[str, Any] | None = None,
    ):
        annotations_md = pd.DataFrame([ann.to_pd_series() for ann in sample.annotations["class_label"]]).to_markdown()
        text_display = pn.pane.Markdown(sample.data)
        return pn.Column("# Sample Text:", text_display, "# Annotations Data:", annotations_md)

    def show_dataset(
        self,
        dataset,
        element_plot_kwargs: Dict[str, Any] | None = None,
        sample_plot_kwargs: Dict[str, Any] | None = None,
        dataset_plot_kwargs: Dict[str, Any] | None = None,
    ):
        pass

[6]:

engine = TextClassification()
engine.show_sample(ds.iget(0))

[6]:

Good. Finally, let’s use the Panel DiscreteSlider widget to create an interface to browse all samples in our Dataset:

[7]:

from typing import Any, Dict


class TextClassification(DisplayEngine):
    def show_element(self, element, element_plot_kwargs: Dict[str, Any] | None = None):
        if element.etype == "class_label":
            return pn.pane.Markdown(element.to_pd_series().to_frame().T.to_markdown())
        elif element.etype == "text":
            return pn.pane.Markdown(element.data)
        else:
            raise NotImplementedError()

    def show_sample(
        self,
        sample,
        element_plot_kwargs: Dict[str, Any] | None = None,
        sample_plot_kwargs: Dict[str, Any] | None = None,
    ):
        annotations_md = pd.DataFrame([ann.to_pd_series() for ann in sample.annotations["class_label"]]).to_markdown()
        text_display = pn.pane.Markdown(sample.data)
        return pn.Column("# Sample Text:", text_display, "# Annotations Data:", annotations_md)

    def show_dataset(
        self,
        dataset,
        element_plot_kwargs: Dict[str, Any] | None = None,
        sample_plot_kwargs: Dict[str, Any] | None = None,
        dataset_plot_kwargs: Dict[str, Any] | None = None,
    ):
        sample_ids = dataset.sample_ids
        sample_ids_wig = pn.widgets.DiscreteSlider(name="Sample ID", options=sample_ids, value=sample_ids[0])

        @pn.depends(sample_ids_wig.param.value)
        def plot_sample_by_widget(sample_id):
            return self.show_sample(dataset.get(sample_id), element_plot_kwargs, sample_plot_kwargs)

        return pn.Column(sample_ids_wig, plot_sample_by_widget)

[8]:

engine = TextClassification()
engine.show_dataset(ds)

[8]:

Done! Now we have an operable interface to browse our dataset.

We can also include our TextClassification engine right when the dataset is built. The following code is enough to reproduce everything we’ve written so far:

[9]:

ds = LargeMovieReviewDataset(TMP_NOTEBOOK_ROOT / "imdb", split="train", download=False).build_dataset(
    display_engine=TextClassification()
)

ds = ds.select_samples(lambda samples, anns: anns[anns.data != "unsup"].index.get_level_values("sample_id"))
ds.show()

[9]:

In Summary#

DisplayEngines are tools used to visualize data through the methods element.show, sample.show(), ds.show()
We built a DisplayEngine using Holoviz Panel, but this is not a requirement and users can implement their own DisplayEngines using whichever libraries they’d like.

Up Next#

So far, we’ve learned how to create Bridge Datasets and how to use them. In the following tutorials we will learn how to transform these Datasets into ones which are usable to train models (e.g. into PyTorch Datasets).