Using the TOPDB database a benchmark set was created, containing entries, which have known 3D structure and the corresponding PDB entry covers all TM helices of the TOPDB entry. After the selection the sequences were filtered to 40% identity using CD-HIT, which resulted 320 sequences. Then the sequences, which were used earlier to train any of the 10 selected methods plus CCTOP itself were collected, and were used as a filters: entries with 40% or higher identity with the training set were removed from the 320 protein set. This procedure resulted in 170 sequences, which were used to measure the accuracy of the different predicting methods.
The benchmark set can be downloaded from here.
The accuracy of the various methods and CCTOP on this benchmark set can be seen here.