Datasets for molecular descriptor comparisons
Introduction
In order to promote the comparison of new and old molecular descriptors, evaluate their predictive ability and better understand their meaning, the International Academy of Mathematical Chemistry suggests the use of some benchmark data sets.

You can freely download the data sets and
a) calculate your molecular descriptors on the provided molecules (available formats: SMILES, HyperChem, MDL SDF)
b) eventually compare your descriptors with some well known descriptors which are also provided in the data files

Notes.
Unknown property values are identified by -999.
The descriptors are calculated by DRAGON software (version 5.4).
Data references are given in the readme text file.
The whole list of descriptor labels and meanings of the already provided descriptors is collected in the file Descriptor_Label_List.

You are invited to contribute to this initiative by sending us new data sets (molecule structures, properties and, eventually, your new descriptors), giving an important support to the development of the field of the molecular descriptors.
You can also send us your molecular descriptors calculated on the provided data sets. We will upload your results in the provided files (with your reference).
 
18 octane isomers (C8)
The data set is constituted by 18 octane isomers (C8).
The following properties are given, both in Excel format (C8_Properties.xls) and in a tab-delimited text format (C8_Properties.txt):

1) boling point (BP)
2) melting point (MP)
3) heat capacity at T constant (CT)
4) heat capacity at P constant (CP)
5) Entropy (S)
6) density (DENS)
7) enthalpy of vaporization (HVAP)
8) standard enthalpy of vaporisation (DHVAP)
9) enthalpy of formation (HFORM)
10) standard enthalpy of formation (DHFORM)
11) motor octane number (MON)
12) molar refraction (MR)
13) acentric factor (AcenFac)
14) total surface area (TSA)
15) octanol-water partition coefficient (LogP)
16) molar volume (MV)

A set of 102 molecular descriptors (topological descriptors) is also given, both in Excel format (C8_Descriptors.xls) and in a tab-delimited text format (C8_Descriptors.txt).

download dataset
 
82 polyaromatic hydrocarbons (PAH)
The data set is constituted by 82 polyaromatic hydrocarbons (PAH).
The following properties are given, both in Excel format (PAH_Properties.xls) and in a tab-delimited text format (PAH_Properties.txt):

1) melting point (MP)
2) boling point (BP)
3) octanol-water partition coefficient (LogP)

A set of 112 molecular descriptors (topological descriptors) is also given, both in Excel format (PAH_Descriptors.xls) and in a tab-delimited text format (PAH_Descriptors.txt).

download dataset
 
209 polychlorobiphenyls (PCB)
The data set is constituted by 209 polychlorobiphenyls (PCB).
The following properties are given, both in Excel format (PCB_Properties.xls) and in a tab-delimited text format (PCB_Properties.txt):

1) melting point (MP)
2) relative retention time (RTT)
3) octanol-water partition coefficient (logP)
4) total surface area (TSA)
5) log Henry constant (logH)
6) log water solubility (logSw)
7) log water activity coefficient (logYw)
8) relative enthalpy of formation (dHf)

A set of 106 molecular descriptors (topological descriptors) is also given, both in Excel format (PCB_Descriptors.xls) and in a tab-delimited text format (PCB_Descriptors.txt).

download dataset
 
22 Phenetylamines
The data set is constituted by 22 phenetyl-amines with two substituent sites (Phenet).
The following property is given:

1) biological activity: log(1/C)

A set of 110 molecular descriptors (topological descriptors) is also given, both in Excel format (Phenet_Descriptors.xls) and in a tab-delimited text format (Phenet_Descriptors.txt).

download dataset