Datasets Summary Table

All datasets currently available in delve are summarized in the table below. The interpretation of the columns are explained below.

Name Usage Origin Task
types Attrs CasesMethods View
Results
census-house A C R 139 22784 2 click
abalone A N R 9 4177 2 click
adult A N C 16 48842 0 none
splice A N C 61 3175 8 click
titanic A N C 4 2201 3 click
bank-32fh A S R 33 8192 9 none
bank-32fm A S R 33 8192 9 none
bank-32nh A S R 33 8192 9 none
bank-32nm A S R 33 8192 9 none
bank-8fh A S R 9 8192 9 none
bank-8fm A S R 9 8192 9 none
bank-8nh A S R 9 8192 9 none
bank-8nm A S R 9 8192 9 none
pumadyn-32fh A S R 33 8192 25 click
pumadyn-32fm A S R 33 8192 25 click
pumadyn-32nh A S R 33 8192 25 click
pumadyn-32nm A S R 33 8192 25 click
pumadyn-8fh A S R 9 8192 25 click
pumadyn-8fm A S R 9 8192 25 click
pumadyn-8nh A S R 9 8192 25 click
pumadyn-8nm A S R 9 8192 25 click
demo D A C R 5 2048 11 click
mushrooms D A C 23 8124 1 click
comp-activ D C R 27 8192 2 click
image-seg D C C 19 2310 8 click
boston D N R 14 506 10 click
kin-32fh D S R 33 8192 22 click
kin-32fm D S R 33 8192 22 click
kin-32nh D S R 33 8192 22 click
kin-32nm D S R 33 8192 22 click
kin-8fh D S R 9 8192 22 click
kin-8fm D S R 9 8192 22 click
kin-8nh D S R 9 8192 22 click
kin-8nm D S R 9 8192 22 click
letter D S C 17 20000 8 click
add10 H A R 11 9792 2 click
hwang H A R 12 13600 0 none
ringnorm H A C 21 7400 3 click
twonorm H A C 21 7400 3 click

Summary table

The meaning of the columns are as follows:
  1. Clicking on the dataset name in the left column displays the documentation for the dataset.
  2. The suggested Usage of the dataset is coded as one of Assessment, Development or Historical. See the delve manual, chapter 3 for an elaboration of these terms.
  3. The Origin of the dataset can be one of Natural, Cultivated, Simulated, Artificial. See the delve manual, chapter 3 for an elaboration of these terms.
  4. Task type indicates the types of tasks associated with the dataset. We distinguish Regression, Classification and Density estimation task types depending on the prior information provided about the task's targets. It is possible for a dataset to have more than one kind of task type.
  5. Attrs indicate the total number of attributes in the dataset.
  6. Cases is the total number of cases in the dataset.
  7. Methods shows the number of learning methods in the Delve repostory which have been run on one or more prototasks in the dataset. Clicking on the number lists the methods.
  8. Clicking in the View Results column gives a summary plot of the performance of the different methods on the dataset.
    This type of plot condenses a lot of information into a single figure. Briefly, it shows the expected performance of each method as the height of the solid bar on each of the training set sizes of a prototask. Squared-error loss is used for regression prototasks and 0-1 loss for classification prototasks. The standard error of the mean is indicated by the thin line on top of the bar. Below the plot are boxes in which the P value of the hypothesis that two learning method performances differ in a paired-test. (See the Delve manual, chapter 8 for details.) The methods compared are listed down the left edge. The ordering of the bars from left to right within a training set size is the same as the method list from top to bottom. Therefore, an entry in the (i,j) cell of the box shows the P value of the comparison between the i and j methods. Only P values less than 0.05 are indicated. Scanning along a row, quickly indicates how a method compares to others in the plot; an entry in the row indicates that the method is significantly worse than the method corresponding to the column. Poor performing methods will have many entries in their rows. On the other hand, a column that has entries is indicative of a method with superior performance.


Last Updated 21 May 1998
Comments and questions to: delve@cs.toronto.edu
Copyright

AltStyle によって変換されたページ (->オリジナル) /