Display Engine#
Download this notebook from GitHub
Preliminaries#
Imports#
[1]:
from pathlib import Path
TMP_NOTEBOOK_ROOT = Path("/tmp/bridge-ds/tutorials")
DisplayEngine#
In the previous tutorial, we’ve written a DatasetProvider for the text classification dataset Large Movie Review Dataset. However, we didn’t use a custom DisplayEngine, so our visualization was lacking:
[2]:
from bridge.display.basic import SimplePrints
from bridge.providers.text import LargeMovieReviewDataset
provider = LargeMovieReviewDataset(root=TMP_NOTEBOOK_ROOT / "imdb", split="train", download=True)
ds = provider.build_dataset(display_engine=SimplePrints())
ds.iget(0).show()
Archive file aclImdb_v1.tar.gz already exists, skipping download.
{
"Sample ID 3944_3": {
"Elements": {
"etype=text": [
{
"element_id": "text_3944_3",
"element_type": "text",
"sample_id": "3944_3",
"data": "/tmp/bridge-ds/tutorials/imdb/aclImdb/train/neg/3944_3.txt",
"category": "text",
"is_example": true
}
],
"etype=class_label": [
{
"element_id": "label_3944_3",
"element_type": "class_label",
"sample_id": "3944_3",
"data": "neg",
"category": "obj",
"is_example": false
}
]
}
}
}
Class Structure#
We can improve this “viz” by writing our own DisplayEngine. For starters, let’s see which methods we need to implement:
class MyDisplayEngine(DisplayEngine):
def show_element(
self,
element,
element_plot_kwargs: Dict[str, Any] | None = None,
):
pass
def show_sample(
self,
sample,
element_plot_kwargs: Dict[str, Any] | None = None,
sample_plot_kwargs: Dict[str, Any] | None = None,
):
pass
def show_dataset(
self,
dataset,
element_plot_kwargs: Dict[str, Any] | None = None,
sample_plot_kwargs: Dict[str, Any] | None = None,
dataset_plot_kwargs: Dict[str, Any] | None = None,
):
pass
Seems straightforward enough. the DisplayEngine object implements methods to show individual annotations, samples, and datasets.
Let’s build our own DisplayEngine from the bottom up, starting with text and class elements. We will use Panel, but sure enough you can implement your own however you’d like:
[3]:
from typing import Any, Dict
import panel as pn
from bridge.display.basic import DisplayEngine
pn.extension()
class TextClassification(DisplayEngine):
def show_element(self, element, element_plot_kwargs: Dict[str, Any] | None = None):
if element.etype == "class_label":
return pn.pane.Markdown(element.to_pd_series().to_frame().T.to_markdown())
elif element.etype == "text":
return pn.pane.Markdown(element.data)
else:
raise NotImplementedError()
def show_sample(
self,
sample,
element_plot_kwargs: Dict[str, Any] | None = None,
sample_plot_kwargs: Dict[str, Any] | None = None,
):
pass
def show_dataset(
self,
dataset,
element_plot_kwargs: Dict[str, Any] | None = None,
sample_plot_kwargs: Dict[str, Any] | None = None,
dataset_plot_kwargs: Dict[str, Any] | None = None,
):
pass
To test it:
[4]:
engine = TextClassification()
sample = ds.iget(0)
text_element = sample.element # SingularSample exposes the text element specifically
label_element = sample.annotations["class_label"][0] # the class labels in this case are annotations
pn.Column(engine.show_element(label_element), engine.show_element(text_element))
[4]:
Looks good. Now, if we want to display an entire sample rather than individual elements:
[5]:
from typing import Any, Dict
import pandas as pd
import panel as pn
pn.extension()
class TextClassification(DisplayEngine):
def show_element(self, element, element_plot_kwargs: Dict[str, Any] | None = None):
if element.etype == "class_label":
return pn.pane.Markdown(element.to_pd_series().to_frame().T.to_markdown())
elif element.etype == "text":
return pn.pane.Markdown(element.data)
else:
raise NotImplementedError()
def show_sample(
self,
sample,
element_plot_kwargs: Dict[str, Any] | None = None,
sample_plot_kwargs: Dict[str, Any] | None = None,
):
annotations_md = pd.DataFrame([ann.to_pd_series() for ann in sample.annotations["class_label"]]).to_markdown()
text_display = pn.pane.Markdown(sample.data)
return pn.Column("# Sample Text:", text_display, "# Annotations Data:", annotations_md)
def show_dataset(
self,
dataset,
element_plot_kwargs: Dict[str, Any] | None = None,
sample_plot_kwargs: Dict[str, Any] | None = None,
dataset_plot_kwargs: Dict[str, Any] | None = None,
):
pass
[6]:
engine = TextClassification()
engine.show_sample(ds.iget(0))
[6]:
Good. Finally, let’s use the Panel DiscreteSlider widget to create an interface to browse all samples in our Dataset:
[7]:
from typing import Any, Dict
class TextClassification(DisplayEngine):
def show_element(self, element, element_plot_kwargs: Dict[str, Any] | None = None):
if element.etype == "class_label":
return pn.pane.Markdown(element.to_pd_series().to_frame().T.to_markdown())
elif element.etype == "text":
return pn.pane.Markdown(element.data)
else:
raise NotImplementedError()
def show_sample(
self,
sample,
element_plot_kwargs: Dict[str, Any] | None = None,
sample_plot_kwargs: Dict[str, Any] | None = None,
):
annotations_md = pd.DataFrame([ann.to_pd_series() for ann in sample.annotations["class_label"]]).to_markdown()
text_display = pn.pane.Markdown(sample.data)
return pn.Column("# Sample Text:", text_display, "# Annotations Data:", annotations_md)
def show_dataset(
self,
dataset,
element_plot_kwargs: Dict[str, Any] | None = None,
sample_plot_kwargs: Dict[str, Any] | None = None,
dataset_plot_kwargs: Dict[str, Any] | None = None,
):
sample_ids = dataset.sample_ids
sample_ids_wig = pn.widgets.DiscreteSlider(name="Sample ID", options=sample_ids, value=sample_ids[0])
@pn.depends(sample_ids_wig.param.value)
def plot_sample_by_widget(sample_id):
return self.show_sample(dataset.get(sample_id), element_plot_kwargs, sample_plot_kwargs)
return pn.Column(sample_ids_wig, plot_sample_by_widget)
[8]:
engine = TextClassification()
engine.show_dataset(ds)
[8]:
Done! Now we have an operable interface to browse our dataset.
We can also include our TextClassification engine right when the dataset is built. The following code is enough to reproduce everything we’ve written so far:
[9]:
ds = LargeMovieReviewDataset(TMP_NOTEBOOK_ROOT / "imdb", split="train", download=False).build_dataset(
display_engine=TextClassification()
)
ds = ds.select_samples(lambda samples, anns: anns[anns.data != "unsup"].index.get_level_values("sample_id"))
ds.show()
[9]:
In Summary#
DisplayEngines are tools used to visualize data through the methods
element.show,sample.show(),ds.show()We built a DisplayEngine using Holoviz Panel, but this is not a requirement and users can implement their own DisplayEngines using whichever libraries they’d like.
Up Next#
So far, we’ve learned how to create Bridge Datasets and how to use them. In the following tutorials we will learn how to transform these Datasets into ones which are usable to train models (e.g. into PyTorch Datasets).