the allele confers a protective effect. A significant association is fulfilled when
the confidence interval (CI) does not include 1.00.
An important tool when studying multiple SNPs in a region is linkage
disequilibrium (LD). Alleles of two neighbouring SNPs have a tendency of
segregating together and when linked alleles are associated it is called linkage
disequilibrium. LD
is the non-random, or non-independent, association between
alleles at two or more loci. For example, allele A and allele B at different loci
are at allele frequencies p(A) and p(B) in the population.
If the loci are
independent, one would expect the AB haplotype at a frequency of p(A)p(B). If
the AB haplotype frequency is either higher or lower than p(A)p(B), meaning
that the alleles are observed together, then the two loci are said to be in LD.
12
LD between two SNPs or markers is measured by D’ or by the correlation
coefficient, r
2
. Both the D’ and r
2
measures are based on the pairwise-
disequilibrium coefficient, D, which is the difference between
the probability of
observing alleles from two markers on the same haplotype and observing them
independently in the population.
13
D’ values ranges from 0 to 1 and a D’ value
of 1 means that all copies of the minor allele for one SNP are always observed
with one of the two alleles of the other SNP. A D’ value of 0 means that the two
loci are randomly inherited. The r
2
measure ranges between 0 and 1 and an r
2
value of 0 also implies independence. However, an r
2
value of 1 has a more
strict interpretation than D’; r
2
=1.0 when the allele
frequency are identical for
the two SNPs. With an r
2
of 1.0 for two SNPs means that given the allele
frequencies for one SNP, one can predict the allele frequencies for the other
SNP. D’ is very unstable
for small sample sizes so r
2
is used more frequently in
the assessment of LD. Knowledge about LD is invaluable when constructing
haplotypes and interpreting results from genetic associations. Associations
between an allele and a phenotype can occur due to a number of reasons. It
could be a true association, by chance or it could be an artefact owing to the
allele being in LD with the real phenotype causing allele. LD is also very useful
when fine-mapping regions of interest. For example, the
PTPN22
gene harbours
47 SNPs (Figure 5) according to the build 36 assembly from the HapMap data.
Using
pair-wise tagging with a r
2
threshold of 0.8, only 10 of those 47 SNPs
(called tagSNPs) are necessary to genotype since their allele frequencies can be
used to predict the allele frequencies of the other 37 SNPs.
There are some pitfalls in genetic statistics. Type I errors (the test result is false
positive) and type II errors (the test result is false negative) are normally
corrected for using the conservative Bonferroni correction. Another way of
correcting for multiple testing is
to perform a permutation test, which is a
computerized test where
χ
2
-tests are calculated based on the observed
frequencies from randomised samplings of the test population.
- 18 -
Figure 5.
LD pattern with r
2
values for the SNPs in the PTPN22 gene. r
2
values are presented in a
grey scale, where
black corresponds to a r
2
value of 1.0 and white corresponds to a r
2
value of 0.
Do'stlaringiz bilan baham: