Photographic benthic surveys produce vast amounts of imagery that require interpretation in order to achieve science objectives. The diagram below shows an illustrative summary of the various data interpretation techniques.
Currently much effort is spent on manual data annotation, illustrated by Figure (a). With the increasing adoption of automated benthic surveying techniques, these manual, arduous and labour intensive endeavours are infeasible for the amounts of data that are involved. With the rapidly growing abundance of data and the corresponding lack of human resources available to interpret and annotate the data, typically less than 1 − 2 % of collected data ends up being processed and annotated. In addition, issues of consistency and objectivity across human labellers lead to erroneous, incomparable results.
Unsupervised clustering techniques, illustrated by Figure (b), are capable of processing large amounts of data, very quickly and require little to no human intervention. While these methods are useful for summarising and exploring patterns in the data, without a human in the loop, there are no guarantees that the resultant clusters represent information that is relevant to end users.
Figure (c) shows a supervised classification setup. Supervised classification techniques rely on training a classification algorithm using human-labelled examples, which can then be used to automatically classify remaining data. However, these traditional supervised techniques still generally require substantial human input in the form of labelled examples and often result in an inefficient allocation of human effort during the annotation stage.
Active learning is a supervised machine learning framework in which the learning algorithm interactively queries the human annotator in an effort to minimise the amount of human effort, while at the same time, maximise classification performance. Active learning is illustrated in Figure (d).