This file describes in detail the error checking scheme that was implemented by M. G. Ehm, R. W. Cottingham, Jr., and M. Kimmel. The error checking system in conjunction with FASTLINK identifies individuals and loci likely to contain errors using a likelihood based method.
As described in the papers:
M. G. Ehm, R.W. Cottingham Jr., and M. Kimmel. Error Detection in Genetic Linkage Data Using Likelihood Based Methods. Journal of Biological Systems, Vol. 3, No. 1 (1995) 13-25.
Download a PostScript file of the Biological Systems GenoCheck paper
M. G. Ehm, R. W. Cottingham Jr., and M. Kimmel. Error Detection in Genetic Linkage Data Using Likelihood Based Methods. American Journal of Human Genetics, Vol. 58, No. 1 (1996).
Download a PostScript file of the American Journal of Human Genetics GenoCheck paper
This documentation refers to version 1.0 of the error detection scheme described in the papers above. The error detection algorithm, called GenoCheck, uses an altered version of the ILINK program, called ILINKERR, from FASTLINK 2.2.
The occurrence of laboratory typing error in pedigree data for linkage analysis cannot be ignored. When studying linked markers between which crossovers rarely occur, errors in the data will often result in false recombinations. Erroneous recombinations in a dense map are given substantial weight thereby increasing the estimate of theta, the recombination fraction. In dense maps, theta approaches the error rate and most of all observed crossovers will be spurious. We present a method for detecting errors in pedigree data. The index is a variant of the likelihood ratio test statistic and is used to test the null hypothesis of no error for each individual at each locus versus the alternative hypothesis of error. High values of the index pinpoint individuals and loci with relatively unlikely genotypes. Power and significance studies using Monte Carlo methods show that the index detects errors for small values of theta with a small false positive rate.
When pedigree data are obtained by typing individuals, the observed genotype is equal to the true genotype unless a typing error has occurred. We represent error in pedigree data as incomplete penetrance of genotypes. The observed genotypes are considered phenotypes and may not correspond to the true genotypes due to errors. Therefore, modeling error in pedigree data is easily accomplished using the likelihood method of genetic linkage analysis by altering the penetrance function. Our method is designed to identify individuals and loci likely to contain errors. The method is equivalent to a hypothesis test for error for each individual and locus in the pedigree.
Each hypothesis test entails:
The GenoCheck program implements steps (1)-(3). Its output is a file containing the values of the test statistic separated by family and locus and ranked in decreasing order.
The following is a list of the files associated with GenoCheck.
In order to perform error checking on marker data, you must complete the following checklist.
Pedigree Options: General Pedigrees
General Pedigree Analysis Options: ILINK
ILINK - Order Options: Specific order
ILINK - Sex Difference Options: No sex difference
ILINK - Locus Order Specification: (Specify the most likely
order with recombination
fractions equal to 0.1
or the published values
if available.)
In the file PosError, the test statistics are separated by locus within each pedigree. Within each pedigree and locus, each individual is listed with its associated test statistic in order of decreasing test statistic. As briefly described above, test statistics with relatively large values are indicative of an unlikely genotype for that individual at that locus. Test statistics greater than 0.0 are of particular interest. Note that test statistics are not comparable across different pedigrees or loci. In the presence of multiple errors, the program is likely to catch only some errors. Therefore correcting any errors and rerunning the program is very important.
The ordered list of individuals within pedigree and locus given in PosError should be thought of as a priority list for retyping. Interpreting an error checking run includes the following steps:
NOTE: To use GenoCheck at NIH, please contact CIT/DCB/BIMAS. GenoCheck requires a customized executable for most datasets. BIMAS will examine your dataset(s) and create the needed GenoCheck executable(s) for you. GenoCheck will run on helix, and other UNIX workstations.