Hierarchy: To understand the hierarchy the file you need is nodedef.py. In this file Mike builds the hierarchical tree. He starts from the leaf nodes and adds children to parents. Here you can also see which 'types' are merged together as one class. When nodedef.py is called a RootNode of the class ClassificationTreeNode is generated which allows access to all the child nodes. Have a look at hierarchical.py to see what is possible with this class. If you don't care about the tree structure but only about the groupings, there are these 13 groups: 1: 'ECK'; 2: 'BOCF', 'PHY' 3: 'SAR' 4: 'THAM' 5: 'ECOR' 6: 'SOND' 7: 'BUST', 'COD', 'CAL', 'UNA', 'DRIFT', 'RFOL', 'GO', 'TURF' 8: 'BRY1', 'BRY2', 'BRY3', 'BRY4', 'BRY5', 'BRY6', 'BRY7', 'BRY8' 9: 'C2S', 'C3BR', 'C4BR', 'C5OS', 'C6SB', 'C7SY', 'G1P', 'G2R', 'HYD1', 'SPEN', 'SW1', 'PARA1', 'ANEM1' 10: 'CENOL', HOL', 'SS', 'BS', 'URCH' 11: 'A1WF', 'A2GR', 'A3PT', 'A4OF', 'A5W', 'A6Y', 'A7PU', 'A8T', 'A9OT', 'A10F', 'A11OF', 'A12BT', 'A13O', 'A14B', 'A15WS', 'A16BT', 'C1W', 'C2WF', 'C3B', 'C4BT', 'C5R', 'C6PT', 'C7LPFT', 'C8Y', 'E1OR', 'E2OR', 'E3Y', 'E4BL', 'E5BR', 'E6WH', 'E7G', 'F1OR', 'F2BR', 'F3OF', 'F4PI', 'F5PE', 'F6Y', 'F7ORT', 'F8BT', 'F9ORT', 'F10', 'F11PT', 'F12BT', 'F13OF', 'F14WT', 'F15OT', 'G1OR', 'G2WH', 'G3BL', 'G4O', 'L1PS', 'L2O', 'L3W', 'L4P', 'L5Y', 'M1', 'M2', 'M3OR', 'M4DON', 'M5', 'M6VEL', 'M7BL', 'M8', 'M9WH', 'M10BR', 'M11WH', 'M12YP', 'M13WP', 'M14OR', 'M15PS', 'M16P', 'M17WL', 'M18OH', 'M19', 'M20P', 'P1SU', 'P2Y', 'P3B', 'P4LO', 'R1B', 'T1PI', 'T2', 'T3WC', 'T4T', 'T5', 'T6WT', 'T7PT', 'T8OR', 'T9PP', 'T10OT', 'T11B', 'T12PO', 'T13', 'T14S' 12: 'MOL', 'ABAL', 'SCAL', 'A1Cl', 'A2Cl', 'AS3O', 'A4Sy', 'A5Sol', 'A6R', 'A7Sol', 'A8O', 'FISH', 'TW1', 'BIOT' 13: 'MATR', 'UNK', 'BRUB', 'UNID', 'NZSS' ======================================================================================== Labels: keypointdata.csv is the file which contains the information about the labels, their location, which image they belong to, etc etc. Notice that for each image there are 55 labels and the first 5 are 'global' whole image labels such as the labeler. These labels should be ignored. The file co-occurrence.txt contains a adjacency matrix showing the number of co-occurrence of each class. The labels of these are given in cooccurrence_labels.txt. ======================================================================================== Datasets: I have also included the datasets.csv which tells which labels should be used for training/validation/testing. You are of course welcome to split the data in the way you like but this is just a reference split which we use to compare various algorithms. ======================================================================================== Furthermore, it is important to note that the data set is extremely unbalanced, with some of the classes having very few labeled instances.