Datasets Summary Table
All datasets currently available in delve are summarized in the table
below. The interpretation of the columns are explained
below.
Name Usage
Origin Task
types
Attrs CasesMethods
View
Results
census-house
A C R 139 22784
2
click
abalone
A N R 9 4177
2
click
adult
A N C 16 48842
0
none
splice
A N C 61 3175
8
click
titanic
A N C 4 2201
3
click
bank-32fh
A S R 33 8192
9
none
bank-32fm
A S R 33 8192
9
none
bank-32nh
A S R 33 8192
9
none
bank-32nm
A S R 33 8192
9
none
bank-8fh
A S R 9 8192
9
none
bank-8fm
A S R 9 8192
9
none
bank-8nh
A S R 9 8192
9
none
bank-8nm
A S R 9 8192
9
none
pumadyn-32fh
A S R 33 8192
25
click
pumadyn-32fm
A S R 33 8192
25
click
pumadyn-32nh
A S R 33 8192
25
click
pumadyn-32nm
A S R 33 8192
25
click
pumadyn-8fh
A S R 9 8192
25
click
pumadyn-8fm
A S R 9 8192
25
click
pumadyn-8nh
A S R 9 8192
25
click
pumadyn-8nm
A S R 9 8192
25
click
demo
D A C R 5 2048
11
click
mushrooms
D A C 23 8124
1
click
comp-activ
D C R 27 8192
2
click
image-seg
D C C 19 2310
8
click
boston
D N R 14 506
10
click
kin-32fh
D S R 33 8192
22
click
kin-32fm
D S R 33 8192
22
click
kin-32nh
D S R 33 8192
22
click
kin-32nm
D S R 33 8192
22
click
kin-8fh
D S R 9 8192
22
click
kin-8fm
D S R 9 8192
22
click
kin-8nh
D S R 9 8192
22
click
kin-8nm
D S R 9 8192
22
click
letter
D S C 17 20000
8
click
add10
H A R 11 9792
2
click
hwang
H A R 12 13600
0
none
ringnorm
H A C 21 7400
3
click
twonorm
H A C 21 7400
3
click
The meaning of the columns are as follows:
- Clicking on the
dataset name in the left column displays the
documentation for the
dataset.
- The suggested Usage of the dataset is coded as one of
Assessment,
Development or Historical. See the delve
manual, chapter 3 for an elaboration of these terms.
- The Origin of the dataset can be one of
Natural, Cultivated,
Simulated, Artificial. See the delve
manual, chapter 3 for an elaboration of these terms.
- Task type indicates the types of tasks associated with
the dataset. We distinguish Regression,
Classification and Density estimation task
types depending on the prior information provided about the task's
targets. It is possible for a dataset to have more than one kind of task type.
- Attrs indicate the total number of attributes in the
dataset.
- Cases is the total number of cases in the dataset.
- Methods shows the number of learning methods in the Delve
repostory which have been run on one or more prototasks in the dataset. Clicking
on the number lists the methods.
- Clicking in the View Results column gives a summary plot
of the performance of the different methods on the dataset.
This type of plot condenses a lot of information into a single
figure. Briefly, it shows the expected performance of each method as the
height of the solid bar on each of the training set sizes of a
prototask. Squared-error loss is used for regression prototasks and 0-1 loss
for classification prototasks. The standard error of the mean is indicated by
the thin line on top of the bar. Below the plot are boxes in which the
P value of the hypothesis that two learning method performances
differ in a paired-test. (See the Delve manual, chapter 8 for details.) The
methods
compared are listed down the left edge. The ordering of the bars from left to
right within a training set size is the same as the method list from top to
bottom. Therefore, an entry in the (i,j) cell of the box shows the
P
value of the comparison between the i and j methods. Only
P values less than 0.05 are indicated. Scanning along a row, quickly
indicates how a method compares to others in the plot; an entry in the row
indicates that the method is significantly worse than the method corresponding
to the column. Poor performing methods will have many entries in their rows.
On the other hand, a column that has entries is indicative of a method with
superior performance.