mini Review article (500 words)

profileHua
populationarticles.zip

gcsp-2014-054.pdf

O P E N A C C E S S Review article

Arab gene geography: From population diversities to personalized medical genomics Ghazi O. Tadmouri

1 , Konduru S. Sastry

2 , Lotfi Chouchane

2, *

ABSTRACT

Genetic disorders are not equally distributed over the geography of the Arab region. While a number of

disorders have a wide geographical presence encompassing 10 or more Arab countries, almost half of

these disorders occur in a single Arab country or population. Nearly, one-third of the genetic disorders

in Arabs result from congenital malformations and chromosomal abnormalities, which are also

responsible for a significant proportion of neonatal and perinatal deaths in Arab populations.

Strikingly, about two-thirds of these diseases in Arab patients follow an autosomal recessive mode of

inheritance. High fertility rates together with increased consanguineous marriages, generally noticed in

Arab populations, tend to increase the rates of genetic and congenital abnormalities. Many of the

nearly 500 genes studied in Arab people revealed striking spectra of heterogeneity with many novel

and rare mutations causing large arrays of clinical outcomes. In this review we provided an overview of

Arab gene geography, and various genetic abnormalities in Arab populations, including disorders of

blood, metabolic, circulatory and neoplasm, and also discussed their associated molecules or genes

responsible for the cause of these disorders. Although studying Arab-specific genetic disorders

resulted in a high value knowledge base, approximately 35% of genetic diseases in Arabs do not have

a defined molecular etiology. This is a clear indication that comprehensive research is required in this

area to understand the molecular pathologies causing diseases in Arab populations.

Keywords: Arab populations, neolithic, population genetics, gene geography, genetic disorders, neoplasms

Cite this article as: Tadmouri GO, Sastry KS, Chouchane L. Arab gene geography: From population diversities to personalized medical genomics, Global Cardiology Science and Practice 2014:54 http://dx.doi.org/10.5339/gcsp.2014.54

http://dx.doi.org/ 10.5339/gcsp.2014.54

Submitted: 1 September 2014 Accepted: 11 December 2014 ª 2014 Tadmouri, Sastry & Chouchane, licensee Bloomsbury Qatar Foundation Journals. This is an open access article distributed under the terms of the Creative Commons Attribution license CC BY 4.0, which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited.

1 Faculty of Public Health, Jinan University,

Tripoli, Lebanon 2 Laboratory of Genetic Medicine and

Immunology, Weill Cornell Medical

College in Qatar, Qatar Foundation,

Doha, Qatar

*Email: [email protected]

A DEFINITION OF ‘ARAB POPULATIONS’

The term “Arabs” indicates a panethnicity of peoples of various ancestral origins, religious

backgrounds, and historic identities. It is possible to define the geographical area inhabited by Arabs

using one of the two following approaches:

(1) The linguistic approach is a relaxed definition and it includes all populations speaking the

Arabic language and living in a vast area extending from south of Iran in the east to Morocco in

the west including parts in the south-east of Asia Minor, East, and West Africa.

(2) The political definition of Arabs is more conservative as it only includes those populations

residing in 23 Arab States, namely: Algeria, Bahrain, Comoros, Djibouti, Egypt, Eritrea, Iraq,

Jordan, Kuwait, Lebanon, Libya, Mauritania, Morocco, Oman, Palestine, Qatar, Saudi Arabia,

Somalia, Sudan, Syria, Tunisia, United Arab Emirates (UAE), and Yemen.

In the subsequent parts of this paper, it is the political definition that would mainly be used to define

the term “Arab region” or simply “the region”. In all cases, the Arab geocultural unit is the largest in the

world after Russia and Anglo-America. The size of this unit exceeds 375 million people and spans more

than 14 million square kilometers. 1

PALEOLITHIC OUT-OF-AFRICA MIGRATIONS

Archeological excavations, historical records, and molecular analyses, mainly based on the study of

uniparental Y-chromosome and mitochondrial DNA (mtDNA), provided considerable information

regarding the early evolutionary history of modern humans in the vast geographical region embracing

Arab populations. The advent of genomic methodologies based on the simultaneous analysis of

hundreds of thousands of single nucleotide polymorphisms allowed the drawing of conclusions on the

genetic structures of Arab populations with a higher resolution. 2

DNA evidence indicates that modern humans originated in East Africa about 200-100 kiloyears (kyr)

ago then established regional populations throughout the continent. 3 Archeological artifacts excavated

from Taforalt in today’s Morocco indicate that human inhabitation of modern day’s Maghreb region

(i.e., modern day Morocco, Algeria, Tunisia and Libya) dates back to some 82 kyr ago. 4 At that time,

settlements in the region were characterized by developed cultural manifestations that could only be

present in Europe 40 millennia later. 5 According to the Recent Out-of-Africa model, members of one

branch of anatomically modern humans left Africa to the Near East some 70-45 kyr ago. 6,7

Phylogenies

constructed on the basis of mtDNA comparisons are indicative for two possible migration routes in this

episode of human history (see Figure 1):

(1) A major route laid across Bab-el-Mandeb straits in the Red Sea linking modern day Eritrea and

Djibouti in Africa to Yemen, hence, probably making the Arabian Peninsula as the initial

Figure 1. Out-of-Africa migration routes.

Page 395 of 408

Tadmouri, Sastry & Chouchane. Global Cardiology Science and Practice 2014:54

staging post in the first successful migration of anatomically modern humans out of Africa

70-60 kyr ago. 7-9

Y-chromosome diversity studies in modern Saudi males support this view as

14% of them exhibit a pool typical of African biogeographic ancestry. 10 High diversity in the

Y-haplogroup substructure in samples from the region extends the geography of this active

route to include southern Arabia, South Iran, and South Pakistan. This route has possibly

maintained its important role in influencing gene flow from Africa along the coastal

crescent-shaped corridor of the Gulf of Oman and could have facilitated human dispersals into

the region until nearly 2500 years ago. 11

(2) Another route followed the Nile from East Africa, heading northwards and crossing through the

Sinai Peninsula into the Levant and resulted in a noticeable gene flow during the Upper

Paleolithic and Mesolithic periods between 40-14 kyr ago. 12-14

Recent data from Alu/short

tandem repeat compound systems and genome-wide polymorphisms are in support with this

view with 4-15% of the Levantine groups harboring African ancestry while this influence barely

reaches 1-3% in Southern Europeans. 15-18

Human populations in the Near East then branched

in several directions, some heading north into Europe and others heading east into Asia. 19-22

Y-chromosome analysis supports this view and demonstrates the absence of any significant

genetic barrier in the Levant, where a remarkable genetic variation was attained and gene flow

followed the “isolation-by-distance” model. This is in contrast to a strong north-south genetic

barrier, for both male and female gene flow, in the western Mediterranean basin, defined by

the Gibraltar Strait. 23,24

Paleoanthropological evidence and mtDNA variation analysis indicate that both the Levantine

corridor and the Horn of Africa served, repeatedly, as migratory passageways between Africa and

Eurasia. 14,25

Some of the oldest known genetic mutations that could have followed this route include:

(1) the delta F508 (c.1521_1523delCTT) mutation of the CFTR gene, which is responsible today for a

majority of cases with cystic fibrosis in Europe, 26

and (2) the p.Glu6Val sickle cell mutation associated

with the Benin haplotype and frequently observed in the western coastal region of the Arabian

Peninsula, the Levant, Egypt, and in the Maghreb region. 27

Some studies also support the view that regions near, but external to northeast Africa, like the

Levant, the southern-Arabian Peninsula, or Mesopotamia could have served as incubators for the early

diversification of non-African lineages and the development of local cultural techniques. 10,28,29

Again,

the p.Glu6Val sickle cell mutation provides a supportive evidence for this view since the mutation

associated with the Arab/Asian haplotype seems to be restricted to the eastern coastal regions of the

Arabian Peninsula with milder presence in Mesopotamia and the Levant. 27

THE EARLY FARMERS

Around 12 kyr ago, Neolithic human populations adapted some developed agricultural technologies

that allowed them to cause a far-reaching shift in subsistence and lifestyle. Improvement of the climatic

conditions in the area along with the practice of agriculture helped in the establishment of major

historical settlements with sizeable densities that could have contributed enormously to the genetic

makeup of modern Arab populations. Yet, farming was almost always associated with settlements near

mosquito-infested soft and marshy soil causing large malarial outbreaks. 30-32

These outbreaks

imposed selective pressure on the human genome and amplified the frequencies of several genetic

disorders including sickle cell disease, b-thalassemia, and glucose-6-phosphate dehydrogenase

(G6PD) deficiency. 33-35

Infectious agents that favored humid conditions could have also played major

roles in the selective advantage to a variety of other genetic traits. 36,37

A major example in this category

includes the heterozygote advantage of cystic fibrosis carriers against tuberculosis. 38

On the other

hand, adapting to an active lifestyle along with calorie-restricted diets, common in communities at the

time, could have provided protective features that suppressed the expression of celiac disease, type 2

diabetes, and inflammatory bowel disease. 39

In this phase of human history, the Arabian Peninsula, Sub-Saharan Africa, the Levant and Iran saw

local population expansions from refugia that could have participated in the building of the primitive

Arabian population. Y-chromosome and mtDNA haplogroup data support this view. 9,40

For example,

approximately 62–69% of today’s males in Saudi Arabia share common structures with those in the

near east and this demonstrates a possibly important role for the Levant in shaping the Neolithic

dispersal of human settlements in the Gulf. 10 This genetic evidence is consistent with archeological

Page 396 of 408

Tadmouri, Sastry & Chouchane. Global Cardiology Science and Practice 2014:54

interpretations of the expansion of sedentary Natufian hamlets in the Levant during the wet phase

15–13 kyr. 41 Male lineage estimates for these prominent Levantine haplogroups indicate a north to

south influence with a history of almost 12 kyr in Saudi Arabia, 11 kyr in Yemen, mainly in the western

region, and only at nearly 7 kyr in Qatar and the UAE. 10 Detailed analyses hint to a major terrestrial

colonization for the eastern Arabian Peninsula, which was followed by subsequent population isolation

from the western Arabian Peninsula and demonstrating significant genetic affinities to near-eastern

populations. 42

Many of the earliest disease-causing genetic mutations might have followed these

steps. 43

In particular, the c.208-2A . G mutation in the human amnionless homolog (AMN) gene,

found in 15% of Imerslund-Gräsbeck syndrome cases, could have emerged in the region around

13.6 kyr. Today, this mutation is responsible for over 50% of the Imerslund-Gräsbeck syndrome cases

among Arabic, Turkish, and Sephardic Jewish families. 44

On the other hand, studies of mtDNA

variability confirm a notable sub-Saharan African female-driven flow in the Arabian Peninsula. 9,25,29

An

Iranian influence also existed, but this was weakened by the presence of barriers to gene flow posed by

the two major Iranian deserts and the Zagros mountain range. 45

Analysis of the pattern of Y-chromosome and mtDNA variations in North Africa provides evidence of

the relatively young population history of North Africa mainly influenced by a strong demic expansion

of Neolithic pastoralists from the Levant and possible admixture with original settlers. 7,46,47

Some of

these earliest civilizations in the Maghreb region include immigrant Berbers who originated from the

Sahara 10,000 years ago and left considerable gene imprints in the gene pool of the populations

inhabiting the area between modern day Mauritania and southern Egypt. 48,49

Nearly 2,000 years later,

Mesolithic Capsians became the next influential genetic stock in the region. 50

MAJOR EVENTS IN ANCIENT HISTORY

In the Arabian Peninsula, Semitic-speaking peoples of Arabian origin migrated into the valley of the

Tigris and Euphrates rivers in Mesopotamia some 7,000-5,500 years ago. 51,52

Analysis of Y

chromosome and mitochondrial DNA in Iraqi Marsh Arabs revealed a prevalent autochthonous Middle

Eastern component for both male and female gene pools, with weak Southwest Asian and African

contributions. 29

The detailed analysis of genome-wide variation patterns among Qataris indicate that

the Southwest Asian influence is derived from Greater Persia rather than from China while the African

stock has a sub-Saharan origin and not a Southern African Bantu origin. 53 Data from the neighboring

Bahraini and Emirati populations reveal an increasing North-to-South influence of the Southwest Asian

component with a high contribution of 23% and 24%, respectively. 54

This could also explain the

exceptionally high frequencies of the Asian sickle cell mutation in the region extending from Kuwait to

the United Arab Emirates. 27

Archeological evidence further indicates that another group of Semites left Arabia around 4,500

years ago during the Early Bronze Age and settled along the Levant and mixed in with the local

populations there. Some 3,500 years ago, the Phoenician civilization of Lebanon became a developed

enterprising maritime trading culture. Phoenician traders spread across the Mediterranean and

established major cities and colonies that harbored their pathologic or polymorphic gene

variations. 55-58

Among the pathologic gene variations that could have followed Phoenician footsteps

are (1) the IVS-I-110 (c.93-21G . A) beta-globin gene mutation, the most frequently encountered

beta-thalassemia mutation among Arabs, and (2) the p.G542X mutation in the CFTR gene, a frequently

observed cystic fibrosis mutation in the Mediterranean basin. 56,59

Results of the Genographic

Consortium from Y-chromosome variations indicate that as many as 1 in 17 men living today on the

coasts of North Africa and southern Europe may have a Phoenician direct male-lineage ancestry. 60

The

genetic pool was further enriched in Mesopotamia through Persians while Romans gained a 600

year-long period of settlements throughout most of the region that were subsequently replaced by the

Byzantines. 61

MAJOR EVENTS IN MEDIEVAL HISTORY

Soon after the rise of Islam 1,400 years ago, the Arab Caliphates unified the region flanking the

Mediterranean and amalgamated the dominant ethnic identity that persists today in the Near East, the

Levant, the Maghreb, and Andalusia in the Iberian Peninsula. 58,62,63

The Arabian Peninsula gained and

increasing role and linked distant populations of China and India to communities of the Mediterranean

and beyond. During this period, demographical dynamics were predominantly governed by cultural

change in endogenous populations rather than demic influences with significant gene flow. 64 This view

Page 397 of 408

Tadmouri, Sastry & Chouchane. Global Cardiology Science and Practice 2014:54

is strongly supported by Y-chromosome analysis of Muslim expansion in India and mtDNA

haplogroups in the Sinai Peninsula and North Africa. 24,65,66

During the 11 th -13

th centuries CE, the Levant witnessed major Crusader settlements that could have

caused remarkable genetic drifts and bottlenecks and introduced western European lineages. 58

In the

16 th Century CE, the impact of the western European gene stock extended to the eastern Arabian

Peninsula where major parts, including today’s Bahrain, felt under the authority of the Portuguese for

nearly 150 years. This presence left clear impressions in the mutational spectrum of common disorders

in the eastern Arabian Peninsula as in the frequent observation of the western Mediterranean Codon 39

(c.118C . T) b-thalassemia mutation; 67-69

reviewed in Obeid and Tadmouri. 27

On the contrary, some

other disorders from the region have possibly spread out to geographically distant locations under this

Portuguese influence as demonstrated in the increasing evidence noted with regard to the world

distribution of Machado-Joseph disease. 70,71

During the 13 th -19

th centuries CE, Ottomans controlled

much of the lands surrounding the Mediterranean then expanded their influence to cover all the

Arabian Peninsula and further contributed to the enrichment of the genetic pool in the region. 72

After

the 19 th century, areas of the Maghreb were colonized by France, Spain and Italy while the Levant,

Egypt, and the Arabian Peninsula where colonized by France and England.

Despite this long trail of historical admixtures, genetic isolates persisted in the Arab region. Some of

these isolates include the inhabitants of the Island of Jerba in Tunisia, 73

the Bedouins of Sinai, 65

the

dwellers of the Dead Sea region in Jordan, 74 the Druze of the Levant,

58 and the Kurdish population of

Northern Iraq. 75

THE GENETIC HETEROGENEITY OF ARABS

Arab populations display some of the highest rates of consanguineous marriages in the world

including a large proportion of first cousin marriages. 76 At a macrogenomic level, this norm permits the

reunion of ancestral chromosomal segments in a homozygous pattern referred to as the autozygome. 77

At a microgenomic level, however, populations in the region exhibit exceptionally high levels of

variance within those runs of homozygosity. 2 This variance seems to follow a sexually asymmetric

model with higher heterogeneity recorded among the female groups while paternal lineages are mostly

of autochthonous origin. 29,78

In either way, this variance leads phenotypically to a wide array of more

than 1,100 genetic disorders described in the region of which 44% are confined to a single population

or region, a diversity of affected body systems and of clinical outcomes, and a diversity of disease

incidence and geographical distributions (reviewed in Tadmouri 79 ).

While the common practice of consanguinity seems to have also contributed to the preponderance

of more autosomal recessive (60%) than autosomal dominant (28%) disorders in the region, 76

it is

probably the large spectra of pathological gene mutations associated with many genetic disorders in

the region that emphasizes the genetic heterogeneity of Arab populations at its best. The following

disease families represent few examples of a continuously growing list of disorders related to a long list

of mutations many of which have possibly originated in the region.

BLOOD DISORDERS

b-Thalassemia

b-thalassemia syndromes are a group of hereditary disorders characterized by a genetic deficiency in

the synthesis of beta-globin chains. A meta-analysis of 6,652 b-thalassemia alleles from 17 Arab

populations indicated the presence of 73 out of the ,250 b-globin gene mutations occurring worldwide. In contrast to many world populations, this heterogeneity seems to be a common

observation in many Arab populations irrespective of the size of pooled b-thalassemia alleles. This

case is clearly demonstrated in Algeria, Egypt, Morocco, Tunisia, and the United Arab Emirates

exhibiting the largest heterogeneity with more than 20 b-thalassemia mutation types described in each

population so far (reviewed in Obeid and Tadmouri 27 ).

Glucose-6-Phosphate Dehydrogenase (G6PD) deficiency

G6PD deficiency is an X-linked inherited disorder caused by a defect or deficiency in the production of

an important red blood cell enzyme called G6PD. G6PD deficiency may cause the sudden destruction of

premature red blood cells leading to hemolytic anemia since the body cannot compensate for the

destroyed cells. In Tunisia, the African G6PD*A - variant is the most prevalent among G6PD patients and

Page 398 of 408

Tadmouri, Sastry & Chouchane. Global Cardiology Science and Practice 2014:54

causes a severe phenotype hemolytic anemia following the ingestion of fava beans. 80

This mutation is

also followed by the G6PD*Mediterranean (c.563C . T; p.Ser188Phe) and the G6PD*Aures

(c.143T . C; p.lle48Thr) mutations. The later, was originally described in Algeria and then in Saudi

Arabia. 81,82

The analysis of mildly affected males, revealed the presence of the association of

c.1311C . T, a newly described silent mutation in the exon 12, with the c.93T . C polymorphism in the

intron 11 and two single intronic base deletions: IVS-V-17 (-C) and IVS-VIII-43 (-G). 80

In Sudan, the

G6PD*B variant represents the most common type of enzyme in all the population groups. However,

the mutant G6PD*A þ enzyme, but with normal activity, is prevalent among individuals of African

descent. Among the deficiency-causing variants G6PD*Mediterranean and G6PD*A - are the most

common. 83 The genetic heterogeneity of G6PD further continues in the Arabian Peninsula. In the United

Arab Emirates, G6PD*B þ is the major allele described among non-deficient subjects while the

G6PD*Mediterranean mutation is the most common cause of G6PD deficiency among Emirati

patients. 84

Other mutations detected include: the African G6PD*A - (c.202G . A) and the G6PD*Aures

mutations. 84

This spectrum of mutations seems to be common with neighboring Kuwait, where the

G6PD*Mediterranean and the African G6PD*A - genotypes are the most common followed less frequent

G6PD*Chatham and G6PD*Aures alleles. 85

The Saudi population is also no exception, the G6PD*A 2 ,

G6PD*Mediterranean, and G6PD*B þ are the major variants producing a severe deficiency state among

affected individuals. These variants exhibit a significant difference in their frequencies, with the highest

recorded in areas that were endemic to malaria and have high frequencies of sickle cell disease and

b-thalassemia, namely, the Eastern and the Southern Regions. 86,87

In neighboring Jordan, molecular

screening of G6PD alleles revealed a higher incidence of the disease in Jordan Valley, known for its

historically higher rates of malaria, when compared to the Amman area and has also shown the

existence of six mutations: the c.563C . T G6PD*Mediterranean mutation (53%), the African G6PD*A -

(c.376A . G þ 202G . A; p.Asn126Asp þ Val68Met) mutation, G6PD*Chatham (c.1003G . A;

p.Ala335Thr), G6PD*Valladolid (c.406C . T), G6PD*Aures (c.143T . C), and G6PD*Asahi

(c.202G . A). 88

Molecular screening of G6PD alleles in Iraqi Kurdish males indicated that the

G6PD*Mediterranean variant was the most common (88%), followed by the G6PD*Chatham variant

(c.1003G . A; 9%). 89

In a study of 21 unrelated individuals with G6PD*Mediterranean, 90

confirmed

that almost all patients from Saudi Arabia, Iraq, Iran, Jordan, Lebanon, and Palestine share the

c.563C . T mutation.

METABOLIC DISORDERS

Cystic fibrosis

Cystic fibrosis is a multi-system life threatening inherited disorder that primarily affects the lungs and

digestive system. The spectrum of cystic fibrosis mutations in Arab populations reveals a major

difference from worldwide observations. For examples, more than 70% of cystic fibrosis patients with

European ancestry show the delta F508 (c.1521_1523delCTT) mutation of the CFTR gene. In Arab

patients these figures are far from being homogenous. A comprehensive meta-analysis of 827 alleles

with cystic fibrosis and encompassing 15 Arab populations revealed a wide spectrum of 56 CFTR gene

mutations responsible for the disease in the region (unpublished observations). This heterogeneity

seems to continue at regional level as well. For instance, the cystic fibrosis population of the Arabian

Peninsula exhibit 17 CFTR mutations. In Saudi Arabians, the 3120 þ 1G . A (c.2988 þ 1G . A) CFTR

mutation is the most common, while in Kuwait it is replaced by the delta F508 mutation. In neighboring

Bahrain, three mutations other mutations seem to prevail, these are: 2043delG (c.1911delG), 548A . T

(c.416A . T), and 4041C . G (c.3909C . G). This battery of mutations is replaced in Qatar by the

commonly observed c.3700A . G (p.I1234V) mutation in the CFTR gene. The picture further changes in

Oman and the United Arab Emirates where the c.1647T . G (p.S549R) mutation is common and the

delta F508 occurs at relatively low frequencies, but exclusively in patients of Baluchi descent (reviewed

in Obeid and Tadmouri 27 ).

Lipoid congenital adrenal hyperplasia

This is a severe genetic disorder of steroid hormone biosynthesis, in which the production of all adrenal

and gonadal steroids is significantly impaired by a severe defect in the conversion of cholesterol to

pregnenolone. Worldwide, lipoid congenital adrenal hyperplasia is caused by nearly 35 mutations in

the steroidogenic acute regulatory (StAR) protein gene. Collective results of 20 Arab patients from

Page 399 of 408

Tadmouri, Sastry & Chouchane. Global Cardiology Science and Practice 2014:54

Libya, Egypt, Palestine, Jordan, Kuwait, Qatar, Saudi Arabia, and Yemen indicate the presence of 12

mutations including five novel ones in the StAR gene (reviewed in Obeid and Tadmouri 27 ).

DISORDERS OF THE CIRCULATORY SYSTEM

An extensive survey on genetic disorders in Arab people indicated that there are at least 27 disorders of

the circulatory system known to run in Arab families. 79

However, unlike blood disorders and common

metabolic abnormalities, appreciation of the genetic etiologies of diseases of the circulatory system

has only occurred in the last decade. This resulted in the presence of scanty information that hints to

specific genetic signatures characteristic of Arab patients with cardiovascular disorders.

Congenital heart disease (CHD)

CHD is a structural abnormality of the heart or intra-thoracic great vessels. It is the most common birth

defect worldwide representing one third of all congenital malformations presenting in the neonatal

period. Arabs are liable to have more children with congenital defects including CHD because of high

fertility rates. 76,91

The presence of small isolated communities in different parts of the Arab world with

the common practice of consanguinity is another evidence of high incidence of CHD (e.g., Armenians,

Bedouins, Druzes, Jews, Kurds, Nubians, Berbers, Tebo, and Twareq). A molecular study in Lebanese

CHD patients identified a differential duplication of a 44-bp intronic segment within the Rel-family

transcription factor gene, NFATC1, suggestive that this gene could be a potential ventricular septal

defect-susceptibility gene. 92

In a prospective study involving 60 Jordanian babies with cleft lip and/or

cleft palate, 47% had CHD. However, no chromosomal studies were performed in these patients. 93

Coronary artery disease (CAD)

A study of Arabs living in Kuwait, found a strong association between a C to G substitution substitution

in the 3-prime untranslated region (3’UTR) of the APOC3 gene with coronary artery disease. The

population in the study included adults from Kuwait, Jordan, Palestine, Lebanon, Syria, Egypt, and

Iraq. 94

In Saudi individuals, CAD was also found to be associated with the 3’UTR allele of the APOC3

gene, 95

but also other associations were found with the MTHFR c.677C . T variant, a platelet

glycoprotein receptor IIIa (PlA1/PlA1) genotype, 96,97

and the null-genotypes of GSTT1 and GSTM1. 98

In

support of a probable specificity of the genotypic etiology of coronary artery disease in Arabs, no

association was found with the lipoprotein lipase (LPL) polymorphisms (LPL-HindIII and LPL-PvuII); 99

the infrequent band of 3.2-kb of the apolipoprotein A-I/C-III; 100

the insertion/deletion sites in the

polymorphic region of intron 16 of the angiotensin I-converting enzyme (ACE) gene; 101

the p.W64R

polymorphism of the b3-adrenoceptor (b3-AR) gene; 102

PvuII polymorphism in the LPL gene; 103

and the

c.677C . T and c.1298A . C variants of the MTHFR gene. 104

Hypertrophic cardiomyopathy (HCM)

HCM is characterized by an abnormal thickening of the heart muscles, resulting from mutations in one

of several genes that result in defects in the protein component of the cardiac muscles. An apical

hypertrophic cardiomyopathy in father and daughter of a Lebanese Christian family has been reported.

In both, identical segments of the left ventricle were involved by the hypertrophic process with differing

degrees of severity. 105

In an analysis of data pertaining to all patients less than 50-years of age in Qatar,

six of 42 Qataris were diagnosed with HCM, making it the most encountered cardiomyopathy in this

group following dilated cardiomyopathy. HCM occurred in two peaks: one below 15-years of age, and

the other between 36 and 50-years of age. About 27% of the children (between 1- and 15-years) were

found to have HCM. The prevalence rate of HCM was calculated as 3.1 per 100,000 of the population. 106

Arterial tortuosity syndrome

Probably, the earliest account of the disease in the region dates back to year 2000 with the description

of 12 patients from eight different families in Saudi Arabia. 107

The first mutations associated with the

disease, however, were reported six years later in patients of Moroccan origin who had homozygosity

for the c.510G . A (p.W170X) and for a frameshift c.961delG (p.V321fsX391) mutation in the SLC2A10

gene. 108

In Qatar, two mutations, a novel p.R105C and a recurrent p.S81R, were recently described in the

SLC2A10 gene in seven patients from two unrelated families. 109

Page 400 of 408

Tadmouri, Sastry & Chouchane. Global Cardiology Science and Practice 2014:54

Other disorders

In two consanguineous Saudi families, long QT syndrome (LQTS) was described as segregating with a

novel homozygous splicing mutation in the KCNQ1 gene. The observation of the same mutation in both

families indicated that this could be a founder mutation. 110

On the contrary, Naxos disease, a rare

cardiomyopathy disorder, failed to exhibit linkage with the previously identified plakoglobin gene in

two Saudi patients 111

indicating that the disease might have a private signature in the region.

NEOPLASMS

Neoplasms are not typically regarded as population-specific disorders. However, several aspects of

these disorders differ by race and ethnicity. Among Arabs, several types of cancers show many distinct

features that are quite different from those seen in other populations worldwide. Very preliminary data

from the CTGA (Catalogue for Transmission Genetics in Arabs) Database for genetic disorders in Arab

populations indicate the presence of at least 55 cancer types in Arab people. 112

Breast, ovarian, lung,

and colorectal cancers are the main cancers that run in Arab families. Cancer susceptibility genes for

many of these cancers have been reported. Yet, other cancers with familial types such as prostate,

pancreatic, and testicular cancers did not reveal specific cancer-susceptibility genes at this time.

Breast and ovarian cancer

Broadly speaking, 90% of breast cancer cases are sporadic and the processes leading to gene

mutations in such cases are not well-understood. Defined genetic predisposition accounts for only

about 5–10% of inherited breast cancer types. In either familial or sporadic cases, multiple genetic

etiologies, related to mutations in oncogenes and tumor suppressor genes, characterize breast

carcinomas in Arab patients. 113

A large fraction of inherited cases of breast cancer are usually

associated with mutations of the BRCA1 and BRCA2 genes. Other genes have also been implicated,

such as: BRCATA, BRCA3, TP53, BRIP1, PTEN, and STK11 genes. In sporadic breast cancer, increased

susceptibility has been blamed on the mutation of low penetrance genes including TNFA, HSP70-2, and

TNFRII. These private signatures of the disease in the region have probably contributed to the peculiar

clinical characteristics of the disease in Arab women particularly the earlier mean age of onset, which is

at least a decade earlier than in women of other ethnicities, and the more aggressive course of the

disease. 114

According to a study by Rouba et al. 115 , the proportion of BRCA1 and BRCA2 mutations could be

higher in Arab women when compared to other populations. 115

In Morocco, five deleterious mutations

in the BRCA1 gene where encountered in families with breast/ovarian cancer, including the novel

compound deletional c.2805delA/2924delA mutation. 116

In Algerian women, four of 11 familial cases

were associated with BRCA1 alterations. 117

In neighboring Tunisia, the prevalence of breast cancer is

calculated to be between 16% and 38%. 118,119

There, four BRCA1 mutations have been identified

including a novel Tunisian-specific c.212 þ 2insG mutation and a frequently observed c.798_799delTT

Tunisian and North African founder mutation. 119,120

In Egypt, the p.Arg841Trp BRCA1 disease-associated

mutation was detected while a novel p.Glu1373X mutation in exon 12 of the BRCA1 gene was identified

in ovarian or breast cancer patients in Arab kindred from East Jerusalem. 121

An extensive analysis of

familial breast cancer in Lebanon revealed the presence of 38 BRCA1 sequence variants, many of which

are novel. 122

Adding to this heterogeneity, two other unclassified BRCA1 variants, p.Phe486Leu and

p.Asn550His, were detected in Saudi patients. 123

In the case of BRCA2 gene, the scene is far from being different. Four mutations in BRCA2 gene cause

breast/ovarian cancer in Moroccan families including three novel ones (c.3381delT/3609delT;

c.7110delA/7338delA, and c.7235insG/7463insG). 116 . The same study also identified a large number of

distinct polymorphisms and unclassified variants in BRCA2 as well as in BRCA1 that were described for

the first time. 116

In four unrelated Tunisian families, two novel c.1313dupT and c.7654dupT mutations in

exons 10 and 16 of the BRCA2 gene were reported. 124

In an Arab patient of Palestinian descent with

breast cancer, the c.2482delGACT novel BRCA2 truncating mutation was observed. 123

An extensive

analysis of familial breast cancer in Lebanon revealed the presence of 40 BRCA2 gene sequence

variants, many of which are novel. 122

In Saudi patients, an unclassified p.Asp1420Tyr BRCA2 variant

was detected. 123

This array of region-specific mutation seems to extend to Arab Diasporas as well. For

example, the c.5804del4 mutation in exon 11 of BRCA2 gene was seen in nearly half of the carriers of

Page 401 of 408

Tadmouri, Sastry & Chouchane. Global Cardiology Science and Practice 2014:54

deleterious mutations in Arab American women. This mutation has not been previously associated with

a particular Arab ethnicity and may represent a founder mutation of recent origin. 125

Another frequently mutated gene in Arab breast cancer patients is the TP53 gene. In fact, the

frequency of TP53 mutations among Saudi patients is one of the highest in the world. The list of

mutations include seven novel ones of which five are found in exon 4 of the TP53 gene. In brief, tumors

from Arab breast cancer patients have a high prevalence (29%) of TP53 mutations in exons 4 and 5,

whereas the smallest proportion of TP53 mutations (10%) is found in exon 7. Also, an excess of

G:C . A:T transitions (49%) at non-CpG sites was noted, suggesting exposure to particular

environmental carcinogens such as N-nitroso compounds. 126

In addition, several single nucleotide

polymorphisms in Arab patients seem to be specific to the indigenous populations and could be

associated with increased risk of breast cancer. Examples include: the p.Pro72Pro in the TP53 gene and

the c.309GG in the MDM2 gene in Saudi women, the c.-251A IL8 allele in Tunisian women, and the

c.1298A . C DNA polymorphism in the MTHFR gene in patients of Syrian ancestry. 127-129

In western societies, mutation of the TP3 gene is highly associated with epithelial ovarian cancers

(50–80%), however, only 32% Arab patients with this neoplasm exhibit TP3 gene mutations. Instead,

PIK3CA amplification, but not PIK3CA mutation, is the single most common genetic alteration in Arab

cases (60%) and is mutually exclusive with gene mutations in both PI3 Kinase and MAPK pathways

(PIK3CA, KRAS, and BRAF). 130-132

This finding is suggestive for a significant role of the dysregulated

PI3K/Akt pathway in the pathogenesis of ovarian cancers. 132

Colorectal carcinoma (CRC)

This type of neoplasm is a further example demonstrating a genetic heterogeneity in the region in

which not only different alleles of the same gene are involved, but also several genes seem to be of

importance for the emergence of this ailment. In Moroccan patients with attenuated polyposis, the

homozygous p.Tyr165Cys and c.1186_1187insGG mutations of the MYH gene were reported 133,134

whereas in neighboring Tunisia, a large deletion involving exon 6 of the MLH1, a DNA mismatch repair,

gene was observed in a family with six patients diagnosed with a colorectal or an endometrial cancer

and characterized by a severe phenotype and an early onset. 135

Another study in Tunisians

demonstrated a significant association between the p.E1317Q, p.D1822V, and p.I1307K variants of the

adenomatous polyposis coli (APC) gene with colorectal carcinoma risk. 136

The p.I1307K mutation

seems to have a long history in the region as demonstrated in the repeated observation of the allele

among many populations in the region. In 1999, the p.I1307K mutation was first described among

Ashkenazi and Yemenite Jews. 137

A study on the general population demonstrated a carrier frequency

of the allele in Yemenite Jews of approximately 5%. 138

A more extensive analysis showed the p.I1307K

mutation existed in Sephardi Jews of Syrian, Egyptian, Moroccan, Yemeni, and Palestinian origins, as

well as in Muslim and Christian individuals of Arab descent. This study also demonstrated that the

ancestor of modern p.I1307K alleles existed some 2.2-2.95 kya. 139

The portrait of colorectal carcinoma

further gets more interesting with the presence of a recent study that investigated the methylation

patterns in colorectal carcinoma from Egypt and Jordan and showed that differing gene methylation

patterns and mutation frequencies are also involved, hence, indicating dissimilar molecular

pathogenesis and probably reflecting different environmental exposures. 140

Prostate cancer

In Tunisians, a significantly increased prostate cancer risk was associated with the VEGF-634 (GC þ CC)

combined genotype while the VEGF-634C allele was associated with high histological grade. However,

the VEGF-1154A/-634G haplotype was negatively associated with prostate cancer risk and high tumor

grade. 141

No association was observed between the p.N700S TSP1 polymorphism and prostate cancer

risk or severity. Yet, subjects carrying one copy of the MMP9-1562T allele exhibited a threefold higher

risk of developing prostate cancer. 142

Other neoplasms

The CYP1A1*2C, GSTT1 null, and GSTP1 TT genotypes demonstrated significant association with diffuse

large B-cell lymphoma (DLBCL) in the Saudi population. 143

The CYP1A1 c.4887C . A genotypes CA, AA

and variant allele A were demonstrated to have significant differences and greater risk of developing

papillary thyroid cancer in Saudi patients compared to wild type genotype CC. Also, in thyroid cancer,

Page 402 of 408

Tadmouri, Sastry & Chouchane. Global Cardiology Science and Practice 2014:54

GSTT1 null showed higher risk while GSTM1 null showed protective effect. 144

Tunisian smokers carrying

this later allele had an approximately 2.2-fold high risk of bladder cancer. 145

Furthermore, individuals

carrying at least one copy of the methionine synthase (MS) c.2756A . G variant allele and

heterozygous for the c.1298A . C MTHFR polymorphism displayed a 2.33 and 1.8 times increased risk

of developing bladder cancer, respectively. 146

FINAL NOTE

A multitude of studies reviewed in this paper clearly indicate that the Arab region was an important

milieu for the early adaptations of modern human populations to the out-of-Africa environment. The

experiences learned in that period certainly have allowed human populations to establish further

settlements and cover many areas in the rest of the world. The tidal movements of historical

populations in and out of the Arab region allowed the area to become an important bridge for the flow

of genes between Africa, Asia, and Europe. This characteristic made the area a focal point of attraction

for many population geneticists seeking to fill the gap in the interpretation of benign or lethal genomic

variations in world populations.

While we could be fascinated with the extent of the genetic heterogeneity that characterizes Arab

population, understanding the genetic structure of populations and exploring their biogeographical

heterogeneities may also yield a better understanding of the genetic processes and, eventually,

disease etiologies in the region. In many instances, studying Arab families, with Arab-specific genetic

disorders, has resulted in a high value knowledge base and linked many genes to well-defined

phenotypes and helped a great deal in global genome annotation efforts. 147

Yet, many of the nearly

500 genes studied in Arab people revealed striking spectra of heterogeneities with many rare and novel

mutations causing large arrays of clinical outcomes, thus, considerably complicating proper counseling

and diagnosis for many disorders. Unfortunately, the materialization of large-scale personalized

medical genomics may not be expected in the near future especially because of the presence of

hundreds of genetic disorders in Arabs with no defined molecular determinants and because of the

restricted economies to sustain genomic research throughout the region.

REFERENCES

[1] US Census Bureau. http://www.census.gov/population/international/data/idb/informationGateway.php, visited: 2.3.2014.

[2] Hunter-Zinck H, Musharoff S, Salit J, Al-Ali KA, Chouchane L, Gohar A, Matthews R, Butler MW, Fuller J, Hackett NR, Crystal RG, Clark AG. Population genetic structure of the people of Qatar. Am J Hum Genet. 2010;87(1):17–25.

[3] Liu H, Prugnolle F, Manica A, Balloux F. A geographically explicit genetic model of worldwide human-settlement history. Am J Hum Genet. 2006;79(2):230–237.

[4] Ferembach D. Human remains from the epipaleolithic period in the Taforalt grotto in eastern Morocco. C R Hebd Seances Acad Sci. 1959;248(24):3465–3467.

[5] Bouzouggar A, Barton N, Vanhaeren M, d’Errico F, Collcutt S, Higham T, Hodge E, Parfitt S, Rhodes E, Schwenninger JL, Stringer C, Turner E, Ward S, Moutmir A, Stambouli A. 82,000-year-old shell beads from North Africa and implications for the origins of modern human behavior. Proc Natl Acad Sci U S A. 2007;104(24):9964–9969.

[6] Relethford JH. Genetic evidence and the modern human origins debate. Heredity. 2008;100(6):555–563. [7] Fernandes V, Alshamali F, Alves M, Costa MD, Pereira JB, Silva NM, Cherni L, Harich N, Cerny V, Soares P, Richards

MB, Pereira L. The Arabian cradle: Mitochondrial relicts of the first steps along the southern route out of Africa. Am J Hum Genet. 2012;90(2):347–355.

[8] Bailey GN, Flemming NC, King GCP, Lambeck K, Momber G, Moran LJ, Al-Sharekh A, Vita-Finzi C. Coastlines, submerged landscapes, and human evolution: The Red Sea Basin and the Farasan Islands. J Island Coastal Archaeol. 2007;2:127–160.

[9] Cerný V, Mulligan CJ, Rı́dl J, Zaloudková M, Edens CM, Hájek M, Pereira L. Regional differences in the distribution of the sub-Saharan, West Eurasian, and South Asian mtDNA lineages in Yemen. Am J Phys Anthropol. 2008;136(2):128–137.

[10] Abu-Amero KK, Hellani A, González AM, Larruga JM, Cabrera VM, Underhill PA. Saudi Arabian Y-Chromosome diversity and its relationship with nearby regions. BMC Genet. 2009;10:59.

[11] Cadenas AM, Zhivotovsky LA, Cavalli-Sforza LL, Underhill PA, Herrera RJ. Y-chromosome diversity characterizes the Gulf of Oman. Eur J Hum Genet. 2008;16(3):374–386.

[12] Cann RL, Stoneking M, Wilson AC. Mitochondrial DNA and human evolution. Nature. 1987;325(6099):31–36. [13] Ingman M, Kaessmann H, Pääbo S, Gyllensten U. Mitochondrial genome variation and the origin of modern humans.

Nature. 2000;408(6813):708–713. [14] Luis JR, Rowold DJ, Regueiro M, Caeiro B, Cinnioğlu C, Roseman C, Underhill PA, Cavalli-Sforza LL, Herrera RJ. The

Levant versus the Horn of Africa: Evidence for bidirectional corridors of human migrations. Am J Hum Genet. 2004;74(3):532–544.

[15] Pérez-Miranda AM, Alfonso-Sánchez MA, Peña JA, Herrera RJ. Qatari DNA variation at a crossroad of human migrations. Hum Hered. 2006;61(2):67–79.

Page 403 of 408

Tadmouri, Sastry & Chouchane. Global Cardiology Science and Practice 2014:54

[16] Ferri G, Tofanelli S, Alù M, Taglioli L, Radheshi E, Corradini B, Paoli G, Capelli C, Beduschi G. Y-STR variation in Albanian populations: Implications on the match probabilities and the genetic legacy of the minority claiming an

Egyptian descent. Int J Legal Med. 2010;124(5):363–370.

[17] González-Pérez E, Esteban E, Via M, Gayà-Vidal M, Athanasiadis G, Dugoujon JM, Luna F, Mesa MS, Fuster V, Kandil M, Harich N, Bissar-Tadmouri N, Saetta A, Moral P. Population relationships in the Mediterranean revealed by

autosomal genetic data (Alu and Alu/STR compound systems). Am J Phys Anthropol. 2010;141(3):430–439.

[18] Moorjani P, Patterson N, Hirschhorn JN, Keinan A, Hao L, Atzmon G, Burns E, Ostrer H, Price AL, Reich D. The history of African gene flow into Southern Europeans, Levantines, and Jews. PLoS Genet. 2011;7(4):e1001373.

[19] Ke Y, Su B, Song X, Lu D, Chen L, Li H, Qi C, Marzuki S, Deka R, Underhill P, Xiao C, Shriver M, Lell J, Wallace D, Wells RS, Seielstad M, Oefner P, Zhu D, Jin J, Huang W, Chakraborty R, Chen Z, Jin L. African origin of modern humans

in East Asia: A tale of 12,000 Y chromosomes. Science. 2001;292(5519):1151–1153.

[20] Maca-Meyer N, González AM, Larruga JM, Flores C, Cabrera VM. Major genomic mitochondrial lineages delineate early human expansions. BMC Genet. 2001;2:13.

[21] Underhill PA, Passarino G, Lin AA, Shen P, Mirazón Lahr M, Foley RA, Oefner PJ, Cavalli-Sforza LL. The phylogeography of Y chromosome binary haplotypes and the origins of modern human populations. Ann Hum Genet. 2001;65(Pt

1):43–62.

[22] Lahr MM, Field JS. Assessment of the southern dispersal: GIS-based analyses of potential routes at oxygen isotopic stage 4. J World Prehistory. 2005;19:1–45.

[23] Manni F, Leonardi P, Barakat A, Rouba H, Heyer E, Klintschar M, McElreavey K, Quintana-Murci L. Y-chromosome analysis in Egypt suggests a genetic regional continuity in Northeastern Africa. Hum Biol. 2002;74(5):645–658.

[24] Ennafaa H, Cabrera VM, Abu-Amero KK, González AM, Amor MB, Bouhaha R, Dzimiri N, Elgaaı̈ed AB, Larruga JM. Mitochondrial DNA haplogroup H structure in North Africa. BMC Genet. 2009;10:8.

[25] Rowold DJ, Luis JR, Terreros MC, Herrera RJ. Mitochondrial DNA geneflow indicates preferred usage of the Levant Corridor over the Horn of Africa passageway. J Hum Genet. 2007;52(5):436–447.

[26] Saleheen D, Frossard PM. The cradle of the DF508 mutation. J Ayub Med Coll Abbottabad. 2008;20(4):157–160.

[27] Obeid T, Tadmouri GO. Initial results of a pilot Arab human variome project. In: Tadmouri GO, Taleb Al Ali M, Al Khaja N, eds. Genetic Disorders in the Arab World: Qatar. Dubai, United Arab Emirates: Centre for Arab Genomic

Studies; 2012.

[28] Rose JI. New light on human prehistory in the Arabo-Persian Gulf Oasis. Curr Anthropol. 2010;51:849–883.

[29] Al-Zahery N, Pala M, Battaglia V, Grugni V, Hamod MA, Hooshiar Kashani B, Olivieri A, Torroni A, Santachiara- Benerecetti AS, Semino O. In search of the genetic footprints of Sumerians: A survey of Y-chromosome and mtDNA

variation in the Marsh Arabs of Iraq. BMC Evol Biol. 2011;11:288.

[30] Grmek MD. Malaria in the eastern Mediterranean in prehistory and antiquity. Parassitologia. 1994;36:1–6.

[31] de Zulueta J. Malaria and ecosystems: From prehistory to posteradication. Parassitologia. 1994;36:7–15.

[32] Joy DA, Feng X, Mu J, Furuya T, Chotivanich K, Krettli AU, Ho M, Wang A, White NJ, Suh E, Beerli P, Su XZ. Early origin and recent expansion of Plasmodium falciparum. Science. 2003;300(5617):318–321.

[33] Angel JL. Porotic hyperostosis, anemias, malarias, and marshes in the prehistoric Eastern Mediterranean. Science. 1966;153:760–763.

[34] Carter R, Mendis KN. Evolutionary and historical aspects of the burden of malaria. Clin Microbiol Rev. 2002;15:564–594.

[35] Kwiatkowski DP. How malaria has affected the human genome and what human genetics can teach us about malaria. Am J Hum Genet. 2005;77:171–192.

[36] Ziskind B, Halioua B. La tuberculose en ancienne Egypte. Rev Mal Respir. 2007;24(10):1277–1283.

[37] Karlsson EK, Kwiatkowski DP, Sabeti PC. Natural selection and infectious disease in human populations. Nat Rev Genet. 2014;15(6):379–393.

[38] Poolman EM, Galvani AP. Evaluating candidate agents of selective pressure for cystic fibrosis. J R Soc Interface. 2007;4(12):91–98.

[39] Stiehm ER. Disease versus disease: How one disease may ameliorate another. Pediatrics. 2006;117(1):184–191.

[40] Abu-Amero KK, Larruga JM, Cabrera VM, González AM. Mitochondrial DNA structure in the Arabian Peninsula. BMC Evol Biol. 2008;8:45.

[41] Bar-Yosef O. The Natufian culture in the Levant, threshold to the origins of agriculture. Evol Anthropol. 1998;6:159–177.

[42] AlShamali F, Pereira L, Budowle B, Poloni ES, Currat M. Local population structure in Arabian Peninsula revealed by Y-STR diversity. Hum Hered. 2009;68(1):45–54.

[43] Fu W, O’Connor TD, Jun G, Kang HM, Abecasis G, Leal SM, Gabriel S, Rieder MJ, Altshuler D, Shendure J, Nickerson DA, Bamshad MJ, NHLBI Exome Sequencing Project, Akey JM. Analysis of 6,515 exomes reveals the recent origin of most

human protein-coding variants. Nature. 2013;493(7431):216–220.

[44] Beech CM, Liyanarachchi S, Shah NP, Sturm AC, Sadiq MF, de la Chapelle A, Tanner SM. Ancient founder mutation is responsible for Imerslund-Gräsbeck Syndrome among diverse ethnicities. Orphanet J Rare Dis. 2011;6:74.

[45] Terreros MC, Rowold DJ, Mirabal S, Herrera RJ. Mitochondrial DNA and Y-chromosomal stratification in Iran: Relationship between Iran and the Arabian Peninsula. J Hum Genet. 2011;56(3):235–246.

[46] Arredi B, Poloni ES, Paracchini S, Zerjal T, Fathallah DM, Makrelouf M, Pascali VL, Novelletto A, Tyler-Smith C. A predominantly neolithic origin for Y-chromosomal DNA variation in North Africa. Am J Hum Genet.

2004;75(2):338–345.

[47] Kujanová M, Pereira L, Fernandes V, Pereira JB, Cerný V. Near eastern neolithic genetic input in a small oasis of the Egyptian Western Desert. Am J Phys Anthropol. 2009;140(2):336–346.

[48] Lucotte G, Aouizérate A, Berriche S. Y-chromosome DNA haplotypes in north African populations. Hum Biol. 2000;72(3):473–480.

[49] Lucotte G, Mercier G. Brief communication: Y-chromosome haplotypes in Egypt. Am J Phys Anthropol. 2003;121(1):63–66.

Page 404 of 408

Tadmouri, Sastry & Chouchane. Global Cardiology Science and Practice 2014:54

[50] Irish JD. The Iberomaurusian enigma: North African progenitor or dead end? J Hum Evol. 2000;39:393–410.

[51] Beech M, Cuttler R, Moscrop D, Kallweit H, Martin J. New evidence for the Neolithic settlement of Marawah Island, Abu Dhabi, United Arab Emirates. PSAS. 2005;35:37–56.

[52] Bahri R, El Moncer W, Al-Batayneh K, Sadiq M, Esteban E, Moral P, Chaabani H. Genetic differentiation and origin of the Jordanian population: An analysis of Alu insertion polymorphisms. Genet Test Mol Biomarkers.

2012;16(5):324–329.

[53] Omberg L, Salit J, Hackett N, Fuller J, Matthew R, Chouchane L, Rodriguez-Flores JL, Bustamante C, Crystal RG, Mezey JG. Inferring genome-wide patterns of admixture in Qataris using fifty-five ancestral populations. BMC Genet.

2012;13:49.

[54] Garcia-Bertrand R, Simms TM, Cadenas AM, Herrera RJ. United Arab Emirates: Phylogenetic relationships and ancestral populations. Gene. 2014;533(1):411–419.

[55] Walter H, Matsumoto H, De Stefano GF. Gm and Km allotypes in four Sardinian population samples. Am J Phys Anthropol. 1991;86(1):45–50.

[56] Tadmouri GO, Garguier N, Demont J, Perrin P, Başak AN. History and origin of beta-thalassemia in Turkey: Sequence haplotype diversity of beta-globin genes. Hum Biol. 2001;73(5):661–674.

[57] Gérard N, Berriche S, Aouizérate A, Diéterlen F, Lucotte G. North African Berber and Arab influences in the western Mediterranean revealed by Y-chromosome DNA haplotypes. Hum Biol. 2006;78(3):307–316.

[58] Zalloua PA, Xue Y, Khalife J, Makhoul N, Debiane L, Platt DE, Royyuru AK, Herrera RJ, Hernanz DF, Blue-Smith J, Wells RS, Comas D, Bertranpetit J, Tyler-Smith C, Genographic Consortium. Y-chromosomal diversity in Lebanon is

structured by recent historical events. Am J Hum Genet. 2008a;82:873–882.

[59] Loirat F, Hazout S, Lucotte G. G542X as a probable Phoenician cystic fibrosis mutation. Hum Biol. 1997;69(3):419–425.

[60] Zalloua PA, Platt DE, El Sibai M, Khalife J, Makhoul N, Haber M, Xue Y, Izaabel H, Bosch E, Adams SM, Arroyo E, López- Parra AM, Aler M, Picornell A, Ramon M, Jobling MA, Comas D, Bertranpetit J, Wells RS, Tyler-Smith C, Genographic

Consortium. Identifying genetic traces of historical expansions: Phoenician footprints in the Mediterranean. Am J

Hum Genet. 2008b;83:633–642.

[61] Zahed L, Demont J, Bouhass R, Trabuchet G, Hänni C, Zalloua P, Perrin P. Origin and history of the IVS-I-110 and codon 39 beta-thalassemia mutations in the Lebanese population. Hum Biol. 2002;74:837–847.

[62] Fattoum S, Abbes S. Some data on the epidemiology of hemoglobinopathies in Tunisia. Hemoglobin. 1985;9:423–429.

[63] Ben Abdeladhim A, Aı̈ssaoui B, Boussen M, Homozygous O. Arab hemoglobinopathy in a Tunisian family. Apropos of a case. Tunis Med. 1987;65:571–574.

[64] Labie D, Elion J, Beldjord C. On the diversity of beta-globin mutations, a reflection of recent historic events in Israel. Am J Hum Genet. 1994;55(6):1284–1285.

[65] Salem AH, Badr FM, Gaballah MF, Pääbo S. The genetics of traditional living: Y-chromosomal and mitochondrial lineages in the Sinai Peninsula. Am J Hum Genet. 1996;59(3):741–743.

[66] Gutala R, Carvalho-Silva DR, Jin L, Yngvadottir B, Avadhanula V, Nanne K, Singh L, Chakraborty R, Tyler-Smith C. A shared Y-chromosomal heritage between Muslims and Hindus in India. Hum Genet. 2006;120(4):543–551.

[67] Gomes MP, da Costa MG, Braga LB, Cordeiro-Ferreira NT, Loi A, Pirastu M, Cao A. Beta-thalassemia mutations in the Portuguese population. Hum Genet. 1988;78(1):13–15.

[68] Jassim N, Merghoub T, Pascaud O, al Mukharraq H, Ducrocq R, Labie D, Elion J, Krishnamoorthy R, Arrayed SA. Molecular basis of beta-thalassemia in Bahrain: An epicenter for a Middle East specific mutation. Ann N Y Acad Sci.

1998;850:407–409.

[69] Al-Ali AK, Al-Ateeq S, Imamwerdi BW, Al-Sowayan S, Al-Madan M, Al-Muhanna F, Bashaweri L, Qaw F. Molecular Bases of beta-thalassemia in the Eastern Province of Saudi Arabia. J Biomed Biotechnol. 2005a;2005(4):322–325.

[70] Purdey M. The pathogenesis of Machado Joseph Disease: A high manganese/low magnesium initiated CAG expansion mutation in susceptible genotypes? J Am Coll Nutr. 2004;23(6):715S–729S.

[71] Mittal U, Srivastava AK, Jain S, Jain S, Mukerji M. Founder haplotype for Machado-Joseph disease in the Indian population: Novel insights from history and polymorphism studies. Arch Neurol. 2005;62(4):637–640.

[72] Haj Khelil A, Laradi S, Miled A, Tadmouri GO, Ben Chibani J, Perrin P. Clinical and molecular aspects of haemoglobinopathies in Tunisia. Clin Chim Acta. 2004;340:127–137.

[73] Loueslati BY, Cherni L, Khodjet-Elkhil H, Ennafaa H, Pereira L, Amorim A, Ben Ayed F, Ben Ammar Elgaaied A. Islands inside an island: Reproductive isolates on Jerba island. Am J Hum Biol. 2006;18(1):149–153.

[74] González AM, Karadsheh N, Maca-Meyer N, Flores C, Cabrera VM, Larruga JM. Mitochondrial DNA variation in Jordanians and their genetic relationship to other Middle East populations. Ann Hum Biol. 2008;35(2):212–231.

[75] Jalal SD, Al-Allawi NA, Bayat N, Imanian H, Najmabadi H, Faraj A. b-Thalassemia mutations in the Kurdish population of northeastern Iraq. Hemoglobin. 2010;34(5):469–476.

[76] Tadmouri GO, Nair P, Obeid T, Al Ali MT, Al Khaja N, Hamamy HA. Consanguinity and reproductive health among Arabs. Reprod Health. 2009;6:17.

[77] AlKuraya FS. Autozygome decoded. Genet Med. 2010;12(12):765–771.

[78] Fadhlaoui-Zid K, Martinez-Cruz B, Khodjet-el-khil H, Mendizabal I, Benammar-Elgaaied A, Comas D. Genetic structure of Tunisian ethnic groups revealed by paternal lineages. Am J Phys Anthropol. 2011;146(2):271–280.

[79] Tadmouri GO. Genetic disorders in Arabs. In: Tadmouri GO, Taleb Al Ali M, Al Khaja N, eds. Genetic Disorders in the Arab World: Qatar. Dubai, United Arab Emirates: Centre for Arab Genomic Studies; 2012.

[80] Daoud BB, Mosbehi I, Préhu C, Chaouachi D, Hafsia R, Abbes S. Molecular characterization of erythrocyte glucose-6- phosphate dehydrogenase deficiency in Tunisia. Pathol Biol (Paris). 2008;56(5):260–267.

[81] Nafa K, Reghis A, Osmani N, Baghli L, Benabadji M, Kaplan JC, Vulliamy TJ, Luzzatto L. G6PD Aures: A new mutation (48 Ile--.Thr) causing mild G6PD deficiency is associated with favism. Hum Mol Genet. 1993;2(1):81–82.

[82] Niazi GA, Adeyokunnu A, Westwood B, Beutler E. Neonatal jaundice in Saudi newborns with G6PD Aures. Ann Trop Paediatr. 1996;16(1):33–37.

Page 405 of 408

Tadmouri, Sastry & Chouchane. Global Cardiology Science and Practice 2014:54

[83] Beutler E. Glucose-6-phosphate dehydrogenase deficiency. In: Williams WJ, Beutler E, Erslev AS, Lichtman MA, eds. Haematology. New York: McGraw-Hill; 1991.

[84] Bayoumi RA, Nur-E-Kamal MS, Tadayyon M, Mohamed KK, Mahboob BH, Qureshi MM, Lakhani MS, Awaad MO, Kaeda J, Vulliamy TJ, Luzzatto L. Molecular characterization of erythrocyte glucose-6-phosphate dehydrogenase

deficiency in Al-Ain District. United Arab Emirates. Hum Hered. 1996;46(3):136–141.

[85] AlFadhli S, Kaaba S, Elshafey A, Salim M, AlAwadi A, Bastaki L. Molecular characterization of glucose-6-phosphate dehydrogenase gene defect in the Kuwaiti population. Arch Pathol Lab Med. 2005;129(9):1144–1147.

[86] El-Hazmi MA, Al-Swailem AR, Al-Faleh FZ, Warsy AS. Frequency of glucose-6-phosphate dehydrogenase, pyruvate kinase and hexokinase deficiency in the Saudi population. Hum Hered. 1986;36(1):45–49.

[87] El-Hazmi MA, Warsy AS. Frequency of glucose-6-phosphate dehydrogenase variants and deficiency in Arabia. Gene Geogr. 1990;4(1):15–19.

[88] Karadsheh NS, Moses L, Ismail SI, Devaney JM, Hoffman E. Molecular heterogeneity of glucose-6-phosphate dehydrogenase deficiency in Jordan. Haematologica. 2005;90(12):1693–1694.

[89] Al-Allawi N, Eissa AA, Jubrael JM, Jamal SA, Hamamy H. Prevalence and molecular characterization of Glucose-6- Phosphate dehydrogenase deficient variants among the Kurdish population of Northern Iraq. BMC Blood Disord.

2010;10:6.

[90] Kurdi-Haidar B, Mason PJ, Berrebi A, Ankra-Badu G, al-Ali A, Oppenheim A, Luzzatto L. Origin and spread of the glucose-6-phosphate dehydrogenase variant (G6PD-Mediterranean) in the Middle East. Am J Hum Genet.

1990;47(6):1013–1019.

[91] Aburawi EH. Call for multinational studies of the epidemiology of congenital heart disease in the Arab World. Ibnosina J Med BS. 2013.

[92] Yehya A, Souki R, Bitar F, Nemer G. Differential duplication of an intronic region in the NFATC1 gene in patients with congenital heart disease. Genome. 2006;49(9):1092–1098.

[93] Aqrabawi HE. Facial cleft and associated anomalies: Incidence among infants at a Jordanian medical centre. East Mediterr Health J. 2008;14(2):356–359.

[94] Tas S. Strong association of a single nucleotide substitution in the 3’-untranslated region of the apolipoprotein-CIII gene with common hypertriglyceridemia in Arabs. Clin Chem. 1989;35(2):256–259.

[95] Hussain SS, Buraiki J, Dzimiri N, Butt AI, Vencer L, Basco MC, Khan B. Polymorphism in apoprotein-CIII gene and coronary heart disease. Ann Saudi Med. 1999;19(3):201–205.

[96] Abu-Amero KK, Wyngaard CA, Dzimiri N. Association of the platelet glycoprotein receptor IIIa (PlA1/PlA1) genotype with coronary artery disease in Arabs. Blood Coagul Fibrinolysis. 2004;15(1):77–79.

[97] Al-Ali AK, Al-Muhana FA, Larbi EB, Abdulmohsen MF, Al-Sultan AI, Al-Maden MS, Al-Ateeq SA. Frequency of methylenetetrahydrofolate reductase C677T polymorphism in patients with cardiovascular disease in Eastern Saudi

Arabia. Saudi Med J. 2005b;26(12):1886–1888.

[98] Abu-Amero KK, Al-Boudari OM, Mohamed GH, Dzimiri N. T null and M null genotypes of the glutathione S-transferase gene are risk factor for CAD independent of smoking. BMC Med Genet. 2006;7:38.

[99] Abu-Amero KK, Wyngaard CA, Al-Boudari OM, Kambouris M, Dzimiri N. Lack of association of lipoprotein lipase gene polymorphisms with coronary artery disease in the Saudi Arab population. Arch Pathol Lab Med.

2003a;127(5):597–600.

[100] Johansen K, Dunn B, Tan JC, Kwaasi AA, Skotnicki A, Skotnicki M. Coronary artery disease and apolipoprotein A-I/C-III gene polymorphism: A study of Saudi Arabians. Clin Genet. 1991;39(1):1–5.

[101] Dzimiri N, Basco C, Moorji A, Meyer BF. Angiotensin-converting enzyme polymorphism and the risk of coronary heart disease in the Saudi male population. Arch Pathol Lab Med. 2000;124(4):531–534.

[102] Abu-Amero KK, Al-Boudari OM, Mohamed GH, Dzimiri N. Beta 3 adrenergic receptor Trp64Arg polymorphism and manifestation of coronary artery disease in Arabs. Hum Biol. 2005;77(6):795–802.

[103] Cagatay P, Susleyici-Duman B, Ciftci C. Lipoprotein lipase gene PvuII polymorphism serum lipids and risk for coronary artery disease: Meta-analysis. Dis Markers. 2007;23(3):161–166.

[104] Abu-Amero KK, Wyngaard CA, Dzimiri N. Prevalence and role of methylenetetrahydrofolate reductase 677 C--.T and 1298 A--.C polymorphisms in coronary artery disease in Arabs. Arch Pathol Lab Med. 2003b;127(10):1349–1352.

[105] Malouf J, Alam S, Kanj H, Mufarrij A, Der Kaloustian VM. Hypergonadotropic hypogonadism with congestive cardiomyopathy: An autosomal-recessive disorder? Am J Med Genet. 1985;20(3):483–489.

[106] El-Menyar AA, Bener A, Numan MT, Morcos S, Taha RY, Al-Suwaidi J. Epidemiology of idiopathic cardiomyopathy in Qatar during 1996-2003. Med Princ Pract. 2006;15(1):56–61.

[107] Al Fadley F, Al Manea W, Nykanen DG, Al Fadley A, Bulbul Z, Al Halees Z. Severe tortuosity and stenosis of the systemic, pulmonary and coronary vessels in 12 patients with similar phenotypic features: A new syndrome? Cardiol

Young. 2000;10(6):582–589.

[108] Coucke PJ, Wessels MW, Van Acker P, Gardella R, Barlati S, Willems PJ, Colombi M, De Paepe A. Homozygosity mapping of a gene for arterial tortuosity syndrome to chromosome 20q13. J Med Genet. 2003;40(10):747–751.

[109] Faiyaz-Ul-Haque M, Zaidi SH, Al-Sanna N, Alswaid A, Momenah T, Kaya N, Al-Dayel F, Bouhoaigah I, Saliem M, Tsui LC, Teebi AS. A novel missense and a recurrent mutation in SLC2A10 gene of patients affected with arterial tortuosity

syndrome. Atherosclerosis. 2009;203(2):466–471.

[110] Bhuiyan ZA, Momenah TS, Amin AS, Al-Khadra AS, Alders M, Wilde AA, Mannens MM. An intronic mutation leading to incomplete skipping of exon-2 in KCNQ1 rescues hearing in Jervell and Lange-Nielsen syndrome. Prog Biophys Mol

Biol. 2008;98(2-3):319–327.

[111] Stuhrmann M, Bukhari IA, El-Harith el-HA. Naxos disease in an Arab family is not caused by the Pk2157del2 mutation. Evidence for exclusion of the plakoglobin gene. Saudi Med J. 2004;25(10):1449–1452.

[112] Tadmouri GO, Nair P. Cancers in Arab populations: Concise notes. Hamdan Medical J. 2012;5(1):79–82.

[113] Polyak K. Molecular alterations in ductal carcinoma in situ of the breast. Curr Opin Oncol. 2002;14(1):92–96.

[114] Ayad E, Francis I, Peston D, Shousha S. Triple negative, basal cell type and EGFR positive invasive breast carcinoma in Kuwaiti and British patients. Breast J. 2009;15(1):109–111.

Page 406 of 408

Tadmouri, Sastry & Chouchane. Global Cardiology Science and Practice 2014:54

[115] Rouba A, Kaisi N, Al-Chaty E, Badin R, Pals G, Young C, Worsham MJ. Patterns of allelic loss at the BRCA1 locus in Arabic women with breast cancer. Int J Mol Med. 2000;6(5):565–569.

[116] Tazzite A, Jouhadi H, Nadifi S, Aretini P, Falaschi E, Collavoli A, Benider A, Caligo MA. BRCA1 and BRCA2 germline mutations in Moroccan breast/ovarian cancer families: Novel mutations and unclassified variants. Gynecol Oncol.

2012;125(3):687–692.

[117] Uhrhammer N, Abdelouahab A, Lafarge L, Feillel V, Ben Dib A, Bignon YJ. BRCA1 mutations in Algerian breast cancer patients: High frequency in young, sporadic cases. Int J Med Sci. 2008;5(4):197–202.

[118] Troudi W, Uhrhammer N, Romdhane KB, Sibille C, Amor MB, Khodjet El Khil H, Jalabert T, Mahfoudh W, Chouchane L, Ayed FB, Bignon YJ, Elgaaied AB. Complete mutation screening and haplotype characterization of BRCA1 gene in

Tunisian patients with familial breast cancer. Cancer Biomark. 2008;4(1):11–18.

[119] Mahfoudh W, Bouaouina N, Ahmed SB, Gabbouj S, Shan J, Mathew R, Uhrhammer N, Bignon YJ, Troudi W, Elgaaied AB, Hassen E, Chouchane L. Hereditary breast cancer in Middle Eastern and North African (MENA) populations:

Identification of novel, recurrent and founder BRCA1 mutations in the Tunisian population. Mol Biol Rep.

2012;39(2):1037–1046.

[120] Chouchane L, Boussen H, Sastry KS. Breast cancer in Arab populations: Molecular characteristics and disease management implications. Lancet Oncol. 2013;14(10):e417–e424.

[121] Kadouri L, Bercovich D, Elimelech A, Lerer I, Sagi M, Glusman G, Shochat C, Korem S, Hamburger T, Nissan A, Abu-Halaf N, Badrriyah M, Abeliovich D, Peretz T. A novel BRCA-1 mutation in Arab kindred from east Jerusalem with

breast and ovarian cancer. BMC Cancer. 2007;7:14.

[122] Jalkh N, Nassar-Slaba J, Chouery E, Salem N, Uhrchammer N, Golmard L, Stoppa-Lyonnet D, Bignon YJ, Mégarbané A. Prevalance of BRCA1 and BRCA2 mutations in familial breast cancer patients in Lebanon. Hered Cancer Clin Pract.

2012;10(1):7.

[123] El-Harith el-HA, Abdel-Hadi MS, Steinmann D, Dork T. BRCA1 and BRCA2 mutations in breast cancer patients from Saudi Arabia. Saudi Med J. 2002;23(6):700–704.

[124] Riahi A, Kharrat M, Ghourabi ME, Khomsi F, Gamoudi A, Lariani I, May AE, Rahal K, Chaabouni-Bouhamed H. Mutation spectrum and prevalence of BRCA1 and BRCA2 genes in patients with familial and early-onset breast/ovarian cancer

from Tunisia. Clin Genet. 2013;, Dec 28.

[125] Shatavi SV, Dohany L, Chisti MM, Jaiyesimi IA, Zakalik D. Unique genetic characteristics of BRCA mutation carriers in a cohort of Arab American women. J Clin Oncol. 2013;31(suppl):abstr 1541.

[126] Al-Qasem AJ, Toulimat M, Eldali AM, Tulbah A, Al-Yousef N, Al-Daihan SK, Al-Tassan N, Al-Tweigeri T, Aboussekhra A. TP53 genetic alterations in Arab breast cancer patients: Novel mutations, pattern and distribution. Oncol Lett.

2011;2(2):363–369.

[127] Snoussi K, Mahfoudh W, Bouaouina N, Ahmed SB, Helal AN, Chouchane L. Genetic variation in IL-8 associated with increased risk and poor prognosis of breast carcinoma. Hum Immunol. 2006;67(1-2):13–21.

[128] AlShatwi AA, Hasan TN, Shafi G, Alsaif MA, Al-Hazzani AA, Alsaif AA. A single-nucleotide polymorphism in the TP53 and MDM-2 gene modifies breast cancer risk in an ethnic Arab population. Fundam Clin Pharmacol.

2012;26(3):438–443.

[129] Lajin B, Alhaj Sakur A, Ghabreau L, Alachkar A. Association of polymorphisms in one-carbon metabolizing genes with breast cancer risk in Syrian women. Tumour Biol. 2012;33(4):1133–1139.

[130] Levine DA, Bogomolniy F, Yee CJ, Lash A, Barakat RR, Borgen PI, Boyd J. Frequent mutation of the PIK3CA gene in ovarian and breast cancers. Clin Cancer Res. 2005;11(8):2875–2878.

[131] Wang Y, Helland A, Holm R, Kristensen GB, Børresen-Dale AL. PIK3CA mutations in advanced ovarian carcinomas. Hum Mutat. 2005;25(3):322.

[132] Abubaker J, Bavi P, Al-Haqawi W, Jehan Z, Munkarah A, Uddin S, Al-Kuraya KS. PIK3CA alterations in Middle Eastern ovarian cancers. Mol Cancer. 2009;8:51.

[133] Teebi AS. Genetic disorders among Arab populations. Second Edition. Berlin, Heidelberg, Germany: Springer-Verlag; 2010.

[134] Laarabi FZ, Cherkaoui Jaouad I, Baert-Desurmont S, Ouldim K, Ibrahimi A, Kanouni N, Frebourg T, Sefiani A. The first mutations in the MYH gene reported in Moroccan colon cancer patients. Gene. 2012;496(1):55–58.

[135] Aissi-Ben Moussa S, Moussa A, Lovecchio T, Kourda N, Najjar T, Ben Jilani S, El Gaaied A, Porchet N, Manai M, Buisine MP. Identification and characterization of a novel MLH1 genomic rearrangement as the cause of HNPCC in a Tunisian

family: Evidence for a homologous Alu-mediated recombination. Fam Cancer. 2009;8(2):119–126.

[136] Bougatef K, Marrakchi R, Ouerhani S, Sassi R, Moussa A, Kourda N, Blondeau Lahely Y, Najjar T, Ben Jilani S, Soubrier F, Ben Ammar Elgaaied A. No evidence of the APC D1822V missense variant’s pathogenicity in Tunisian

patients with sporadic colorectal cancer. Pathol Biol (Paris). 2009;57(3):e67–e71.

[137] Patael Y, Figer A, Gershoni-Baruch R, Papa MZ, Risel S, Shtoyerman-Chen R, Karasik A, Theodor L, Friedman E. Common origin of the I1307K APC polymorphism in Ashkenazi and non-Ashkenazi Jews. Eur J Hum Genet.

1999;7(5):555–559.

[138] Drucker L, Shpilberg O, Neumann A, Shapira J, Stackievicz R, Beyth Y, Yarkoni S. Adenomatous polyposis coli I1307K mutation in Jewish patients with different ethnicity: Prevalence and phenotype. Cancer. 2000;88(4):755–760.

[139] Niell BL, Long JC, Rennert G, Gruber SB. Genetic anthropology of the colorectal cancer-susceptibility allele APC I1307K: Evidence of genetic drift within the Ashkenazim. Am J Hum Genet. 2003;73(6):1250–1260.

[140] Chan AO, Soliman AS, Zhang Q, Rashid A, Bedeir A, Houlihan PS, Mokhtar N, Al-Masri N, Ozbek U, Yaghan R, Kandilci A, Omar S, Kapran Y, Dizdaroglu F, Bondy ML, Amos CI, Issa JP, Levin B, Hamilton SR. Differing DNA

methylation patterns and gene mutation frequencies in colorectal carcinomas from Middle Eastern countries. Clin

Cancer Res. 2005;11(23):8281–8287.

[141] Sfar S, Hassen E, Saad H, Mosbah F, Chouchane L. Association of VEGF genetic polymorphisms with prostate carcinoma risk and clinical outcome. Cytokine. 2006;35(1-2):21–28.

[142] Sfar S, Saad H, Mosbah F, Gabbouj S, Chouchane L. TSP1 and MMP9 genetic variants in sporadic prostate cancer. Cancer Genet Cytogenet. 2007;172(1):38–44.

Page 407 of 408

Tadmouri, Sastry & Chouchane. Global Cardiology Science and Practice 2014:54

[143] Al-Dayel F, Al-Rasheed M, Ibrahim M, Bu R, Bavi P, Abubaker J, Al-Jomah N, Mohamed GH, Moorji A, Uddin S, Siraj AK, Al-Kuraya K. Polymorphisms of drug-metabolizing enzymes CYP1A1, GSTT and GSTP contribute to the development of diffuse large B-cell lymphoma risk in the Saudi Arabian population. Leuk Lymphoma. 2008;49(1):122–129.

[144] Siraj AK, Ibrahim M, Al-Rasheed M, Abubaker J, Bu R, Siddiqui SU, Al-Dayel F, Al-Sanea O, Al-Nuaim A, Uddin S, Al- Kuraya K. Polymorphisms of selected xenobiotic genes contribute to the development of papillary thyroid cancer susceptibility in Middle Eastern population. BMC Med Genet. 2008;9:61.

[145] Ouerhani S, Tebourski F, Slama MR, Marrakchi R, Rabeh M, Hassine LB, Ayed M, Elgaaı̈ed AB. The role of glutathione transferases M1 and T1 in individual susceptibility to bladder cancer in a Tunisian population. Ann Hum Biol. 2006;33(5-6):529–535.

[146] Ouerhani S, Oliveira E, Marrakchi R, Ben Slama MR, Sfaxi M, Ayed M, Chebil M, Amorim A, El Gaaied AB, Prata MJ. Methylenetetrahydrofolate reductase and methionine synthase polymorphisms and risk of bladder cancer in a Tunisian population. Cancer Genet Cytogenet. 2007;176(1):48–53.

[147] Ozçelik T, Kanaan M, Avraham KB, Yannoukakos D, Mégarbané A, Tadmouri GO, Middleton L, Romeo G, King MC, Levy-Lahad E. Collaborative genomics for human health and cooperation in the Mediterranean region. Nat Genet. 2010;42(8):641–645.

Page 408 of 408

Tadmouri, Sastry & Chouchane. Global Cardiology Science and Practice 2014:54

Genetic heterogeneity of Arab pop-HLA gene-2018 (1).pdf

RESEARCH ARTICLE

The genetic heterogeneity of Arab

populations as inferred from HLA genes

Abdelhafidh Hajjej 1*, Wassim Y. Almawi2¤, Antonio Arnaiz-Villena3, Lasmar Hattab4,

Slama Hmida 1

1 Department of Immunogenetics, National Blood Transfusion Center, Tunis, Tunisia, 2 Department of

Medicine, Harvard Medical School, Boston, MA, United States of America, 3 Department of Immunology,

University Complutense, School of Medicine, Madrid Regional Blood Center, Madrid, Spain, 4 Department of

Medical Analysis, Hospital of Gabes (Ghannouch), Gabes, Tunisia

¤ Current address: School of Pharmacy, Lebanese American University, Byblos, Lebanon * [email protected]

Abstract

This is the first genetic anthropology study on Arabs in MENA (Middle East and North Africa)

region. The present meta-analysis included 100 populations from 36 Arab and non-Arab com-

munities, comprising 16,006 individuals, and evaluates the genetic profile of Arabs using HLA

class I (A, B) and class II (DRB1, DQB1) genes. A total of 56 Arab populations comprising

10,283 individuals were selected from several databases, and were compared with 44 Mediter-

ranean, Asian, and sub-Saharan populations. The most frequent alleles in Arabs are A*01, A*02, B*35, B*51, DRB1*03:01, DRB1*07:01, DQB1*02:01, and DQB1*03:01, while DRB1*03:01-DQB1*02:01 and DRB1*07:01-DQB1*02:02 are the most frequent class II hap- lotypes. Dendrograms, correspondence analyses, genetic distances, and haplotype analysis

indicate that Arabs could be stratified into four groups. The first consists of North Africans

(Algerians, Tunisians, Moroccans, and Libyans), and the first Arabian Peninsula cluster (Sau-

dis, Kuwaitis, and Yemenis), who appear to be related to Western Mediterraneans, including

Iberians; this might be explained for a massive migration into these areas when Sahara under-

went a relatively rapid desiccation, starting about 10,000 years BC. The second includes Levan-

tine Arabs (Palestinians, Jordanians, Lebanese, and Syrians), along with Iraqi and Egyptians,

who are related to Eastern Mediterraneans. The third comprises Sudanese and Comorians,

who tend to cluster with Sub-Saharans. The fourth comprises the second Arabian Peninsula

cluster, made up of Omanis, Emiratis, and Bahrainis. It is noteworthy that the two large minori-

ties (Berbers and Kurds) are indigenous (autochthonous), and are not genetically different from

“host” and neighboring populations. In conclusion, this study confirmed high genetic heteroge-

neity among present-day Arabs, and especially those of the Arabian Peninsula.

Introduction

The human leukocyte antigens (HLA) system plays a key role in self-nonself recognition, and is divided into class I (HLA-A, -B, and -C) and class II (HLA-DP, -DQ, and -DR) loci, and com- prises 220 genes in a 3.6 Mb region found on the short arm of chromosome 6. HLA system is highly polymorphic, and in excess of 17,000 alleles were detected. For example, there are 4,828

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 1 / 24

a1111111111

a1111111111

a1111111111

a1111111111

a1111111111

OPEN ACCESS

Citation: Hajjej A, Almawi WY, Arnaiz-Villena A,

Hattab L, Hmida S (2018) The genetic

heterogeneity of Arab populations as inferred from

HLA genes. PLoS ONE 13(3): e0192269. https://

doi.org/10.1371/journal.pone.0192269

Editor: Amr H Sawalha, University of Michigan,

UNITED STATES

Received: November 6, 2017

Accepted: January 19, 2018

Published: March 9, 2018

Copyright: © 2018 Hajjej et al. This is an open access article distributed under the terms of the

Creative Commons Attribution License, which

permits unrestricted use, distribution, and

reproduction in any medium, provided the original

author and source are credited.

Data Availability Statement: All relevant data are

within the paper and its Supporting Information

files.

Funding: The authors received no specific funding

for this work.

Competing interests: The authors have declared

that no competing interests exist.

B, 3,968 A, and 3,579 C class I alleles, compared with 2,103 DRB1, and 1,142 DQB1 class II alleles. Several HLA alleles were associated with various auto-immune and infectious diseases [1]. HLA class I and class II loci are characterized by high (80–90%) heterozygosity, and thus constitute reliable genetic markers for phylogenetic study, and thus are useful for anthropolog-

ical studies.

Population studies confirmed varied frequencies of HLA alleles and haplotypes according to ethnicity and geographic origin. Given the codominant nature of the expression of HLA markers, this enables distinguishing between heterozygotes from homozygotes, hence allowing

assignment of genotypes and allele frequencies [2]. Linkage disequilibrium (LD) analysis

between HLA alleles identified the number of generations in-between two closely related pop- ulations from the time of their separation. Diversity in haplotype distribution, allele frequency,

and LD analysis reflect the extent of variation between closely related populations. Allele fre-

quency-based genetic distance analysis allows for construction of phylogenetic tree (Dendro-

grams), so as to infer relative estimate of the time that elapsed since the populations existed as

single cohesive units [3–6].

Arabs are a major panethnic group, and their union, Arab League, is a cultural and ethnic

union of 22 member states. As of 2013, nationals of the Arab League countries are 357 mil-

lions, who populate an area of 13 million km 2 , straddling Africa and Asia [7]. Ethnic, religious,

and linguistic diversity (triple heterogeneity) characterize Arabs. Most Arabs follow Islam, and

Christianity is the second largest religion, with over 15 million Christians. There are also

smaller but significant religious minorities (as Druze, Jews), and a number of non-Arab ethnic

minorities (as Berbers, Kurds) [7, 8].

The history of Arabs extends from circa 1200 BC when Southern Arabian Peninsula

was ruled by three successive civilizations: Mineans, who established their capital Karna

(1200–650 BC), Sabeans in Marib (1000 BC—570 AD), and the Himyarite (2nd-6th centu-

ries AD) in Dhafar (Oman) [9–11]. These civilizations were built by authentic Yemeni

tribes. The kingdom of Kinda was established in Central Arabia in 4th-early 6th century

AD, while Dilmun civilization was founded in Eastern Arabia. In 3rd century AD, East

African Kingdom of Aksum extended into Yemen and Western Saudi Arabia [12]. In

addition, the Lakhmids (Yemeni origin), established a dynasty which ruled part of pres-

ent-day Iraq and Syria in 300–602 AD [10, 13, 14]. The Arab Christian Ghassanids

(220–638 AD), originating from Southern Arabia, migrated in 3rd century to Jordan,

where they established their kingdom that extended from Syria to Yathrib (Saudi Arabia)

[12.13]. Islam was introduced in 610 AD to Arabian Peninsula. Shortly thereafter, Arabian

tribes were united as a single Islamic state in the Arabian Peninsula, which was spear-

headed by the Islamic prophet Muhammad. This Islamic state progressively grew in area,

and in types and numbers of populations, and extended from Andalusia (Spain) to the

west, to Indus in the east [14].

Subsequent spread of Islam involved swift invasion of Persia (637-651AD), Iraq, Levant,

and Egypt (639 AD), which extended into North Africa (640–709), and to Spain, Portugal, and

France (Poitiers) in 8 th

century AD. Eastwards, Arab expansion to Central Asia, Bukhara

(Uzbekistan), Afghanistan (637–709), and the Indus border (664–712) followed. Northwards,

Arab invaders were in contact with the Byzantine Empire, and the Caspian and Caucasus to

the north [15, 16]. With the Islamic expansion from 7th century, social and political groups

were gradually Arabized. The spreading of Arab-Muslim culture was at the expense of local

languages (as Berber, Kurdish), especially in Middle East and North Africa, resulting in the

Arabized population speaking variants of Arabic, mixed with original languages (dialect). The extent of gene Arab exchange with these autochthonous groups is undetermined but is thought

to be lower than religious/cultural influence.

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 2 / 24

Given the large number of conquests, Arabs were in contact with different ethnicities resid-

ing on a vast area stretching from Mauritania (West Africa) to the western China border (East

Asia). This suggests that cultural and perhaps genetic relationships were established with these

ethnic groups. This work aims to study the HLA distribution in North African and Oriental Arab populations, and compare them to neighboring populations (Sub-Saharans Africans,

Europeans, and Asians).

Populations and methods

Search strategy

Datasets of HLA allele frequencies were collected from a systematic review performed per Pre- ferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) criteria [Only

the criteria from 1–10, 17, and 26 are applicable to this type of study (S1 Checklist)] [17].

PubMed, ScienceDirect, AlleleFrequencies.net, and ResearchGate databases were searched for

all papers on HLA polymorphism, and HLA disease associations in Arabs. This systematic lit- erature search covering published papers up to May 31, 2017 was conducted by two investiga-

tors (H.A and H.L); the search terms used were: ‘HLA Arabs’, or ‘Human Leukocyte Antigen Arabs’. A search per country followed: ‘HLA Tunisians’, ‘HLA Saudis’, and so on. This was repeated for remaining countries, which resulted in excess of 50 keywords used. A database

from International Histocompatibility Workshops was also used. Some authors were also con-

tacted by e-mail, or through ResearchGate, requesting information and missing data. While

most datasets were taken from studies with an explicit anthropological focus, control groups

from case-control disease studies were also used. There was no language restriction used for

this search.

Inclusion and exclusion criteria

All included studies met the following criteria. HLA allele frequencies must be obtained by molecular typing, and that subjects should be typed for at least one of the following: HLA-A, HLA-B, HLA-DRB1, and HLA-DQB1. Publications were excluded in case of serological data; sample size less than 35 individuals, typed individuals (or controls) were either related and not

randomly selected, presentation of duplicate data sets. Studies were also excluded if they pre-

sented incomplete/partial allele frequencies, or there were significant ambiguities in the typing.

Data extraction

Studies were independently selected by two authors (H.A and H.L). An external referee was

invited in case of disagreements not resolved by both reviewers. Data extracted from selected

papers included publication year, study type (anthropology, association), sample size, HLA-A, -B, -C, -DRB1, and -DQB1 allele frequencies, haplotype frequencies, region, country, and typed loci.

Statistical analysis

A three-dimensional correspondence analysis and bi-dimensional representation were per-

formed using VISTA V5.02 software [18]. Phylogenetic trees were constructed based on allele

frequencies using the Neighbor-Joining (NJ) method [19], and standard genetic distances

(SGD) [20], using DISPAN software containing GNKDST and TREEVIEW software [21, 22].

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 3 / 24

Results

Study flow

The use of more than fifty key words allowed identification of 5,456 papers and HLA datasets, of which 315 were deemed relevant to the study. Of these, 42 articles and 11 HLA datasets con- taining information on 56 Arab populations, and meeting the study criteria, were included.

The study flow is illustrated in Fig 1. In addition, 20 articles and 18 HLA datasets which meet the criteria of this study, containing complete information on 44 other populations were

selected, but without going through systematic review. The populations used in the compari-

son were chosen mainly from neighboring Arab countries. This study relied on a database con-

sisting of 100 populations (of which data of 11 populations were extracted from association

studies) from 36 countries Arab and worldwide countries, and belonging to Asia, Europe, and

Africa. The distribution of populations by region is illustrated in Fig 2A. These populations

represent allele frequency data for 16,006 individuals (160.06 individuals/population), and

from 63 references.

Selected populations

Arab populations. The 42 articles and 11 HLA datasets (http://www.allelefrequencies.net) selected provided information on 56 populations (Table 1), comprising 10,283 individuals

[23–67]. The 56 different ethnic and religious populations were selected from 18 Arab coun-

tries. There were no reliable HLA data for the remaining countries (Somalia, Djibouti, Mauri- tania, and Qatar) (Fig 2B). The studied populations are divided into 29 African (26 North

Africans and 3 Sub-Saharans), and 27 Asian populations (13 Levantines, and 14 Arabian Pen-

insula). With the exception of 8 populations [28, 38, 47, 48, 50, 52, 53, 55], where HLA data were extracted from association studies, the 50-remaining studies were extracted from anthro-

pological ones.

Neighboring populations. Forty-four worldwide populations [23, 34, 39, 66, 68–85] com-

prising 5,723 individuals, were selected from 18 countries in three continents, using the same

criteria previously described (Table 2). These comprised 22 European, 11 non-Arab Asian,

and 11 Sub-Saharan African populations. Of the 11 Asian populations, there were two Arab

minorities living in Iran (Khuzestan and Famoori).

Data of only three populations [74, 75, 84] were extracted from association studies. These

populations were typed for at least HLA-A, -B, -DRB1, or DQB1.

HLA allele frequencies features of Arab populations

Table 3 shows the most frequent HLA-A and -B alleles in Arab populations. A�02 was the most prevalent allele, and its frequency exceeded 25% in some populations, such as Saudis

(30.4%) [23], Tunisian Berbers of Zrawa (29.3%) [24], Moroccans (26.2%) [25], and Suda-

nese (25.9%) [23]. A�01, �03, �24, �30, and �68 alleles were also common in most Arab popu- lations. For example, the highest frequency of A�01 was seen in Tunisians (15%) [26] and Moroccans (14.8%) [25], while A�03 was prevalent among Iraqi Kurds (15.1%) [23], and A�30 was prevalent among Sudanese (17.6%) [23]. In addition, A�24 was common among Lebanese-Armenians (17.3%) [27], while A�68 was prevalent in Saudis (10.5%) [28]. In con- trast, A�25, �28, �34, �36, �43, �66, �69, �74, and �80 are rare among Arabs. It is noteworthy that A�34, described as rare allele among Arabs, is found at a high frequency (22.2%) in Tunisian Berbers from Zrawa [24], the highest reported for any population worldwide.

Results of HLA-B locus are presented in Table 3. B�35 was the most frequent B� allele in Palestinians (20.3%) [29] and Lebanese-Armenians (19.8%) [27]. B�35 was found at varied

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 4 / 24

frequencies in Iraqi Kurds (15.6%) [23], Omanis (15.3%) [30], Jordanians (14.9%) [31], and

Arab Emirati (11.1%) [23] populations. B�51 was the second most frequent allele, and high fre- quencies were recorded for Saudis (19.3%) [23], Omanis (17.5%) [30], and Arab Emirati

(15.6%) [23] populations. B�50 was also a frequent B� allele in most Arabs, including Saudis (18.8%) [23], and Libyans (16.1%) [31], along with B�08, and B�44 among the Tunisian Berbers of Zrawa (32.8%) [24], the latter being the highest frequency worldwide. Similarly, the fre-

quency of B�27 is the highest among Jordanians (27.1%) [31]. In contrast, B�37, �42, �46, �47, �48, �54, �59, �67, and �78 alleles are extremely rare or virtually in all Arab populations.

The most common DRB1 and DQB1 alleles among Arabs are shown in Table 4. DRB1�07:01 was the most frequent allele among Tunisians from Ghannouch (28.6%) [33], Jor-

danians (26.9%) [31], and Saudis (26.6%) [23], while Egyptians (8.3%) and Sudanese had the

lowest frequencies of DRB1�07:01. DRB1�03:01 was the second most frequent DRB1� allele in some Arabs, such as Tunisians of Tunis (21.9%) [34] and Moroccans of Metelsa (20.2%) [23],

Fig 1. Flow diagram of the study selection process.

https://doi.org/10.1371/journal.pone.0192269.g001

Fig 2. The distribution of studied populations by region (A) and country (B).

https://doi.org/10.1371/journal.pone.0192269.g002

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 5 / 24

but rare in Jordanians (2.4%) [31]. DRB1�11:01 was also frequent among some Arabs, such as Lebanese (36.8%) [35], but rare among Saudis (4.8%) and Moroccans of Chayoua (2.5%) [23].

Furthermore, DRB1�13:01, �13:02, and �15:01 alleles are relatively frequent among Arabs. High frequency of DRB1�13:01 were recorded for Sudanese (23.3%), while DRB1�13:02 was virtually absent in Bahraini [35] and Sudanese [23]. All DRB1�09, �12, and �14 subtypes are extremely rare among Arabs. In addition, DRB1�16 subtypes are rare in all Arab populations except for Bahrain, where DRB1�16:01 is found at a high frequency (13.9%) [35].Haut du formulaire.

DQB1�02:0X and �03:01 alleles are the most frequent DQB1� in Arabs. The highest frequen- cies of DQB1�02:0X were reported for Tunisians (Ghannouch; 40.01%) [33], Yemenites-Jews (39.1%) [36], Moroccans (Agadir-Souss; 37.8%) [37] and Saudis (37.3%) [23], while the lowest

frequency was found in Egyptians (6%) [38]. On the other hand, DQB1�03:01 is very common among Lebanese (45%) [39] and Algerians (Oran; 35.1%) [23], but not Saudis (7.6%) [23].

DQB1�03:02 and �05:01 are also frequent in most Arabs, such as Tunisians (Ghannouch; 20.7%) [33], Jordanians (17.8%) [31], Palestinians (17.6%) [29] and Lebanese (16.8%) [35].

DQB1�05:01 is frequent among Bahrainis (29.2%) [35], Tunisians (Berbers of Jerba; 22.7%) [40], and Lebanese (20.5%) [35]. Among DQB1�06 subtypes, DQB1�06:02 and �06:03 were the most frequent in most Arab populations, but absent in Bahrainis where DQB1�06:01 is very frequent (13.20%) [35]. Furthermore, all DQB1�04 subtypes are rare among Arabs, particularly

Table 1. List of Arab populations used in the present work.

N o

Populations Symbols Size References N o

Populations Symbols Size References

1 Algiers Alg 102 [67] 29 Comorians Com 117 [43]

2 Algerians-B Alg-B 97 [23] 30 Jordanians Jor 146 [31]

3 Algerians-A Alg-A 132 [48] 31 Jordanians-A Jor-A 1254 [46]

4 Algerians-Oran Ora 100 [23] 32 Syrians Syr 200 [47]

5 Gabesians Gab 77 [59] 33 Syrians-A Syr-A 225 [58]

6 Gabesians-A Gab-A 96 [40] 34 Lebanese Leb 95 [35]

7 Ghannouchians Gha 82 [33] 35 Lebanese-A Leb-A 1123 [45]

8 Berbers-Jerba Ber-J 55 [40] 36 Lebanese-B Leb-B 191 [44]

9 Berbers-Matmata Ber-M 81 [40] 37 Lebanese-Armen Leb-Ar 368 [27]

10 Berbers-Zrawa Ber-Z 70 [24] 38 Lebanese-KZ Leb-Kz 93 [39]

11 Tunisians Tun 376 [61] 39 Lebanese-NS Leb-Ns 59 [39]

12 Tunisians-A Tun-A 80 [60] 40 Lebanese-Yohmor Leb-Y 75 [39]

13 Tunisians-B Tun-B 101 [34] 41 Palestinians Pal 165 [29]

14 Tunisians-C Tun-C 100 [63] 42 Palestinians-A Pal-A 109 [36]

15 Tunisians-M Tun-M 123 [26] 43 Saudis Sau 105 [28]

16 Southern Tunisians Tun-S 250 [62] 44 Saudis-A Sau-A 213 [23]

17 Libyans Lib 118 [32] 45 Saudis-B Sau-B 158 [49]

18 Libyans-Jews Lib-J 119 [36] 46 Saudis-C Sau-C 499 [23]

19 Berbers-Metelsa Ber-Me 99 [64] 47 Saudis-D Sau-D 383 [50]

20 Moroccans Mor 96 [25] 48 Omanis-A Oma-A 259 [30] [51]

21 Moroccans-A Mor-A 110 [42] 49 Kuwaitis Kuw 212 [52]

22 Moroccans-Agadir Mor-Ag 98 [37] 50 Kuwaitis-A Kuw-A 114 [53]

23 Moroccans-Chaouya Mor-Ch 98 [65] 51 Bahrainis Bah 72 [35]

24 Moroccans-Jews Mor-J 94 [66] 52 Emiratis Emi 373 [23]

25 Egyptians Egy 101 [39] 53 Iraq kurds Ira-K 209 [54]

26 Egyptians-A Egy-A 121 [38] 54 Yemenite-Jews Yem-J 76 [36]

27 Sudanese Sud 200 [23] 55 Yemen-sana’a Yem 50 [55]

28 Sudanese-Nuba Sud-N 46 [23] 56 Omanis Oma 118 [56] [57]

https://doi.org/10.1371/journal.pone.0192269.t001

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 6 / 24

DQB1�04:01 which is virtually absent, except in Egyptians (10.17%) [38]. The most common DQB1�04 subtype in Arabs is DQB1�04:02.

Allelic comparison between Tunisians and other populations

Allelic comparisons were done at Neighbor-Joining, correspondence analysis, and standard

genetic distances. Analyses were performed with Class I and Class II markers, and at generic

and high-resolution levels to make the most of available data, and seeing that some of the pop-

ulations included in these comparisons lack high-resolution data.

Neighbor-joining dendrograms. Comparison at the generic level was made using genetic

distances based on DRB1� and DQB1� allelic frequencies. Four groups can be interpreted from Fig 3. The first group comprises North African Arabs (Tunisians, Algerians, Moroccans, Liby-

ans), Western Mediterranean Europeans (Iberians, French), Arabian Peninsula Arabs (Saudis,

Kuwaitis, Yemenis), and Arab minority of Iran (Khuzestani). The second group is formed by

Eastern Mediterranean Europeans (Greeks, Cretans, Albanians, Turks, Macedonians), Italians,

Levant Arabs (Palestinians, Lebanese, Syrians), Iraqi-Kurds, Tunisian Berbers (Djerba), and

Iranians. The third group comprises Sub-Saharan Africans (Fulani, Mossi, Rimaibe, Bubi,

Mandenka, and Senegalese). Omanis, Bahrainis, Egyptians, and Sudanese form a heteroge-

neous group containing Asians and Sub-Saharan Africans. Similar results but with notable dif-

ferences, were observed in dendrograms built with standards genetic distances (SGD) based

on generic DRB1(S1 Fig) and generic B loci (S2 Fig). Correspondence analysis. High-resolution DRB1 correspondence analysis (Fig 4) dem-

onstrated the clustering of the studied populations into three groups. The first containing

North Africans (Tunisians, Algerians, Moroccans, and Libyans), Iberians (Basques, Spaniards,

Table 2. Worldwide populations included in the meta-analysis.

N o

Populations Symbols Size References N o

Populations Symbols Size References

1 Spaniards Spa 176 [41] 23 Mossi Mos 42 [39]

2 Portuguese Por 118 [39] 24 Mandenka Mad 200 [39]

3 Murcians Mur 173 [80] 25 Amhara Amh 98 [39]

4 Italians Ita 284 [68] 26 Bubi Bub 101 [39]

5 Basques-A Bas-A 82 [41] 27 Congolese Con 85 [72]

6 Basques-Arratia Bas-Ar 83 [77] 28 Fulani Ful 38 [39]

7 Basques-B Bas-B 99 [70] 29 Gabonese Gab 167 [85]

8 French Fre 179 [68] 30 Nigerians Nig 258 [23]

9 French-Rennes Fre-R 200 [34] 31 Oromo Oro 83 [39]

10 Balearic Bal 90 [71] 32 Rimaibe Rim 39 [39]

11 Corsica Cor 100 [71] 33 Senegalese Sen 177 [39]

12 Sardinians Sar 91 [68] 34 Famoori Arabs Fam 84 [73]

13 Ashkenazi-Jews Ash-J 132 [66] 35 India-Northeast Ind-N 188 [83]

14 Greeks-A Gre-A 96 [39] 36 Indians-Delhi Ind-D 112 [84]

15 Greeks-B Gre-B 101 [39] 37 Iranian-Jews Ira-J 91 [73]

16 Greeks-C Gre-C 98 [39] 38 Iranians Ira 120 [74]

17 Greeks-D Gre-D 242 [23] 39 Iranians-A Ira-A 100 [75]

18 Macedonians Mac 172 [78] 40 Iranians-Azeri Ira-Az 100 [81]

19 Turks Tur 250 [23] 41 Iranians-Kurd Ira-k 100 [81]

20 Turks-A Tur-A 228 [79] 42 Khuzestani Arabs Khu 50 [73]

21 Albanians Alb 160 [76] 43 Pakistanis-Pathan Pak-P 100 [82]

22 Cretans Cre 135 [69] 44 Pakistanis-Sindh Pak-S 101 [82]

https://doi.org/10.1371/journal.pone.0192269.t002

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 7 / 24

Portuguese, Murcians), French, Saudis, Yeminis-Jews, and Khuzestani Arabs. The second con-

tains Eastern Mediterraneans (Greeks, Cretans, Lebanese, Palestinians, and Macedonians),

Berbers of Djerba, Italians, Iraqi-Kurds, Iranians, Egyptians, Ashkenazi-Jews, and Moroccan-

Jews. The last cluster consists of Sub-Saharan populations. It should be noted that Jordanians,

Bahrainis, and Sudanese were outside these main groups. Similarly, correspondence analysis

using class I (A and B) identified three main clusters (Fig 5). The first cluster contained all Sub-Saharan Africans along with Sudanese. The second cluster contains Eastern Mediterra-

nean populations (Albanians, Greeks, Cretans, Lebanese, Palestinians, and Macedonians), Ital-

ians, Iraqi-Kurds, Ashkenazi-Jews, and Jordanians-A. The last cluster includes North Africans

(Tunisians, Algerians, Moroccans, and Libyans), Iberians (Basques, Spaniards), French, and

Saudis.

Correspondence analysis based on generic DRB1 data, and using only Arab populations shows that Arabs can cluster into four groups (Fig 6). The first contains the North Africans

(Tunisians, Algerians, Moroccans, and Libyans), Saudis, Yemenis, Kuwaitis, and Khuzestanis

(Iranian Arabs). The second cluster includes the Arabs of Levant (Palestinians, Jordanians,

Lebanese, Syrians), Egyptians, Iraqi Kurds, and Moroccans Jews. The third group consists of

Table 3. Most frequent HLA-A� and–B� alleles in Arab populations.

HLA-A A�01 A�02 A�03 A�24 A�30 A�68 Population % Population % Population % Population % Population % Population %

Tun-M 15.0 Sau-D 30.4 Ira-k 15.1 Leb-Ar 17.3 Sud 17.6 Sau 10.5

Mor 14.8 Ber-Z 29.3 Leb-Ar 14.0 Gha 15.2 Mor-C 13.0 Tun-M 09.4

Jor-A 14.7 Mor 26.2 Pal 10.7 Ira-k 13.9 Tun-A 11.8 Mor 09.3

Ira-k 13.2 sud 25.9 Lib 10.3 Sau-B 13.3 Jor 11.5 Alg-K 08.6

Pal 12.5 Emi 25.2 Mor-A 10.0 Jor-A 10.7 Alg-K 10.2 sud 08.5

Leb-A 12.2 Oma 24.9 Alg-K 09.3 Pal 10.1 Sau-B 10.2 Emi 08.4

Sau-A 12.2 Alg 24.6 Jor-A 09.1 Alg 09.4 Pal 08.4 Lib 08.2

Alg 11.9 Lib 23.5 Emi 09.1 Lib 09.3 Oma-A 07.5 Jor 07.6

Lib 11.5 Jor-A 22.0 Sau-A 08.9 Mor 07.3 Leb-A 06.7 Oma-A 07.1

Oma 07.2 pal 20.5 Gab 07.7 Oma 06.3 Lib 06.4 Leb-A 05.1

Sud 06.5 Leb-A 18.7 Sud 07.1 Sud 06.1 Emi 05.0 Ira-k 03.8

Emi 06.2 Ira-k 17.0 Oma 06.4 Emi 05.2 Ira-k 03.8 Pal 03.6

HLA-B B�07 B�08 B�35 B�44 B�50 B�51 Population % Population % Population % Population % Population % Population %

Jor 27.1 Oma 11.0 Pal 20.3 Ber-Z 32.8 Sau-D 18.8 Sau-C 19.3

Sau-A 11.7 sau-B 10.1 Leb-Ar 19.8 Ira-k 10.3 Lib 16.1 Oma 17.5

Mor 09.0 Emi 08.6 Ira-k 15.6 Mor-C 10.2 Ber-Z 15.7 Emi 156

Lib 07.7 Gha 08.5 Oma-A 15.3 pal 09.6 Tun-S 14.2 Ira-K 15.6

Tun-A 07.5 Ira-k 07.2 Jor-A 14.9 Alg 08.8 Mor-C 12.5 Gha 12.2

Alg-k 07.1 Lib 06.4 Emir 11.1 Leb-Ar 08.4 Emi 09.4 Leb-Ar 12.1

Leb-Ar 04.5 Mor-C 06.2 Alg 10.3 Lib 07.6 Jor-A 06.4 Lib 11.1

Ira-k 04.1 Jor 04.7 Lib 10.1 Jor-A 05.6 Pal 05.8 Jor-A 10.3

Oma-A 03.1 Sud 04.0 Tun-M 09.8 Sau-D 03.5 Leb-Ar 05.2 Sud 07.8

Sud 02.8 Alg 03.5 Sau 08.6 Sud 02.3 Alg 05.1 Mor 07.4

Emi 02.4 Leb-Ar 03.0 Mor-C 06.9 Emi 02.3 Oma-A 04.2 Pal 06.4

Pal 01.8 Pal 02.7 sud 06.1 Oma-A 02.1 Sud 02.5 Alg-k 04.7

Only one population per country is illustrated; the frequencies are ranked from highest to lowest for each allele; to identify the population and country see Table 1

https://doi.org/10.1371/journal.pone.0192269.t003

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 8 / 24

Table 4. Most frequent HLA-DRB1� and–DQB1� alleles in Arab populations.

HLA-DRB1� 03:01 07:01 11:01 13:01 13:02 15:01 Population % Population % Population % Population % Population % Population %

Tun-B 21.9 Gha 28.6 Leb 36.8 Sud 23.3 Mor-Me 11.1 Alg-B 13.4

Mor-Me 20.2 Jor 26.9 Bah 16.0 Sau-A 10.6 Lib 09.3 Mor-c 12.6

Sau-B 16.5 Sau-B 26.6 Egy-A 13.2 Ber-M 08.0 Sau-A 08.9 Ber-Z 11.4

Ora 15.1 Yem-J 22.1 Gab-A 11.2 Leb-B 06.8 Egy-A 07.4 Jor 09.0

Bah 13.9 Mor-Ag 20.5 Pal 10.0 Alg-B 05.6 Tun-C 06.7 Sau-A 08.9

Sud 13.8 Lib-Y 19.6 Ora 08.6 Lib 05.5 Leb-N 05.0 Bah 07.6

Lib 13.6 Lib 17.0 Sud 08.3 Yem-J 05.4 Ora 04.5 Leb 04.7

Yem-J 12.0 Alg-B 15.9 Jor 08.3 Egy-A 04.6 Yem-J 04.0 Lib 04.2

Leb-B 09.6 Pal 12.7 Lib 05.1 Mor-Me 03.5 Pal 03.9 Pal 03.6

Pal 07.6 Bah 09.0 Sau-A 04.8 Jor 02.1 Jor 00.3 Sud 03.3

Egy-A 07.0 Egy-A 08.3 Yem-J 03.4 Bah 02.1 Sud 00.0 Egy-A 02.5

Jor 02.4 Sud 07.8 Mor-C 02.5 Pal 00.9 Bah 00.0 Yem-J 02.0

HLA-DQB1� 02:0X 03:01 03:02 05:01 06:02 06:03 Population % Population % Population % Population % Population % Population %

Gha 40.1 Leb-NS 45.0 Gha 20.7 Bah 29.2 Mor-C 12.9 Egy-A 10.2

Yem-J 39.1 Ora 35.1 Jor 17.8 Ber-J 22.7 Alg 12.8 Jor 08.3

Mor-Ag 37.8 Lib-J 29.6 Pal 17.6 Leb 20.5 Egy-A 12.7 Ber-J 07.8

Sau-B 37.3 Ber-J 27.4 Leb 16.8 Alg 13.9 Tun-A 12.6 Lib-J 07.4

Jor 35.9 Pal 26.7 Yem-J 14.2 Mor-C 12.3 Jor 10.7 Yem-J 06.1

Lib-J 33.3 Yem-J 19.1 Lib-J 13.0 Pal 11.8 Sau-B 05.1 Ora 04.3

Bah 25.7 Bah 16.0 Alg 12.3 Sau-B 10.1 Pal 04.2 Sau-B 04.1

ora 24.5 Mor-C 15.4 Mor-C 12.3 Jor 09.3 Leb-Y 03.7 Leb-Y 03.3

Pal 20.9 Egy 11.9 Bah 09.7 Egy-A 08.5 Yem-J 02.0 Mor-C 01.8

Leb-Y 20.0 Jor 10.0 Sau-B 08.9 Yem-J 06.1 Lib-J 00.8 Pal 01.2

Only one population per country is illustrated; the frequencies are ranked from highest to lowest for each allele; to identify the population and country see Table 1

https://doi.org/10.1371/journal.pone.0192269.t004

Fig 3. Neighbor-Joining dendrograms, based on Standard genetic distances (SGD), showing relatedness between

Arabs and other populations using generic HLA-DRB1� and -DQB1� allele frequencies data. Populations’ data were taken from references detailed in Tables 1 and 2. Bootstrap values from 1.000 replicates are shown.

https://doi.org/10.1371/journal.pone.0192269.g003

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 9 / 24

Bahrainis, Omanis, Emiratis and Famoori (Iranian Arab). The fourth is composed of Suda-

nese, Sudanese from Nuba, and Comorians.

Genetic distances. Table 5 illustrates standard genetic distances (SGD) between Arabs

and other populations, using generic DRB1� allele frequencies. North Africans and Iberians are the closest to Saudis. Moroccans (Agadir, 0.0024), Basques-Ar (0.0057), and Tunisians-S.

Fig 4. Correspondence analysis (bi-dimensional representation), based on the standard genetic distances, showing

the relationship between Arabs and other populations according to high resolution HLA-DRB1� allele frequencies data. Only individuals with defined DRB1� subtypes are considered. Populations data were taken from references detailed in Tables 1 and 2.

https://doi.org/10.1371/journal.pone.0192269.g004

Fig 5. Correspondence analysis (bi-dimensional representation), based on the standard genetic distances, showing

a global view of the relationship among Arabs and other populations according to generic HLA�-A and–B� allele frequencies data. Populations data were taken from references detailed in Tables 1 and 2.

https://doi.org/10.1371/journal.pone.0192269.g005

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 10 / 24

Syrians are genetically close to Eastern Mediterranean, as Cretans (-0.0001) and Lebanese

Armenians (0.0050), while Tunisians are closed to Western Mediterraneans as North Africans

and Iberians, and Saudis. The populations most related to Tunisians are the other Tunisian

populations (Gabesians, -0.0139), Moroccans (Agadir; -0.0080), and Algerians (-0.0055). Sub-

Saharans such as Congolese (0.0519) and Nigerians (0.0828), and Greeks (0.0836) showed the

closest genetic distances to Comorians. It is noteworthy that Arab minority in Khuzestan

(Iran) displayed close relatedness with North Africans [as Gabesians from Tunisia (-0.0086)

and Orans from Algeria], and Saudis (0.0231).

HLA Class I and Class II haplotype HLA-A-B haplotypes. HLA A-B haplotypic data are extremely rare in Arabs. The most

frequent A-B haplotypes in Arabs are shown in Table 6. A�02:01-B�50:01 (9.0%) and A�02:01-B�44:02/03 (7.5%) were the haplotypes with the highest frequencies in Berbers of Zrawa. Diversity in A-B haplotype frequencies are found among Arabs, hence demonstrat- ing comparable frequencies of A-B haplotype in Arab populations, which did not exceed 5.3% in Gabesians (Tunisia). For example, while A�34:02-B�08:01 and A�29:01-B�45:01 characterize Tunisians, A�01-B�57(02.9%), A�30-B�18 (01.50%), and A�33:01-B�14:01 (02.50%) characterize Algerians. Several haplotypes identified in Arabs were also seen in

other Mediterraneans. For example, A�32:01-B�40:02 was seen in Greeks (2%) [39] and Spaniards (0.5%) [41], while A�02:01-B�50:01 was seen in Italians (2%) [68], Portuguese (3%) [39], and Moroccan Jews (3%) [66]. A�24:02-B�08:01 (4.75%) and A�30:02-B�53:01 (3.48%) were only identified in Saudis.

HLA-DRB1-DQB1 haplotypes. The most frequent DRB1-DQB1 haplotypes with signifi- cant LD in Arabs are listed in Table 7. In general, class II haplotype frequencies are markedly

higher than those of class I haplotypes. DRB1�03:01-DQB1�02:01 haplotype was the most fre- quent DRB1-DQB1 haplotype in Arabs (Table 7), and its frequency ranging from 3.2% in Leba- nese to 16.60% in Tunisians. DRB1�03:01-DQB1�02:01 is a common class II haplotype in the

Fig 6. Correspondence analysis (bi-dimensional representation), based on the standard genetic distances, showing

the relationship between different Arab populations according to generic HLA-DRB1� allele frequencies data. Populations data were taken from references detailed in Tables 1 and 2.

https://doi.org/10.1371/journal.pone.0192269.g006

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 11 / 24

Table 5. The closest populations to Arabs using standard genetic distances (SGD) based on HLA-DRB1� alleles.

Saudis-B Emiratis Omanis-A Sudanese

Population SGD Population SGD Population SGD Population SGD

Moroccans-Ag 0.0024 Omanis-A 0.0411 Emirates 0.0411 Nigerians 0.0497

Basques-Ar 0.0057 Bahrain 0.0429 Sardinians 0.0939 Egyptians-A 0.0556

Tunisians-S 0.0124 Sardinians 0.0593 Bahrain 0.1327 Congolese 0.0594

Saudis-C 0.0160 Kuwaitis 0.0688 Kuwait 0.2014 Egyptians 0.0620

Ghanouchians 0.0203 Tunisians-B 0.1169 Famoori Arabs 0.2377 Mandenka 0.0908

Saudis 0.0258 Khuzestanis 0.1213 Macedonians 0.2461 Moroccans 0.0984

Tunisians 0.0272 Tunisians-A 0.1276 Tunisians-B 0.3071 Senegalese 0.1044

Kuwaitis-A 0.0312 Algerians-Oran 0.1371 Khuzestanis 0.3192 Bubi 0.1078

Khuzestanis 0.0349 Algerians-A 0.1407 Greeks-B 0.3197 Palestinians-A 0.1111

Spaniards 0.0354 Algerians-B 0.1612 Tunisians-A 0.3261 Pakistanis-S 0.1122

Saudis-D 0.0374 Algiers 0.1639 Kuwaitis-A 0.3544 Tunisians-A 0.1133

Gabesians 0.0377 Saudis-C 0.1746 Algerians-Oran 0.3600 Libyans 0.1197

Gabesians-A 0.0394 Macedonians 0.1756 Algerians-A 0.3639 Sudanese-Nuba 0.1234

Jordanians 0.0428 Gabesians 0.1820 Greeks-D 0.3657 Algerians-B 0.1315

Algerians-B 0.0433 Saudis-D 0.1820 Algerians-B 0.3867 Berbers-Matmata 0.1317

Basques-B 0.0449 Moroccans-Agadir 0.1830 Greeks-C 0.3927 Algerians-A 0.1407

Saudis-A 0.0450 Kuwaitis-A 0.1837 Turks 0.3944 Berbers-Zrawa 0.1409

Algerians-A 0.0497 Famoori Arabs 0.1894 Saudis-C 0.3984 Gabesians 0.1413

Tunisians-C 0.0533 Moroccans-A 0.1900 Algiers 0.4027 Jordanians-A 0.1434

Yemenite-J 0.0536 Gabesians-A 0.1908 Albanians 0.4034 Gabesians-A 0.1442

Khuzestanis Tunisians Syrians-A Comorians

Population SGD Population SGD Population SGD Population SGD

Gabesians -0.0086 Gabesians -0.0139 Cretans -0.0001 Congolese 0.0519

Orans -0.0074 Gabesians-A -0.0081 Lebanese-Ar 0.0050 Nigerians 0.0828

Gabesians-A -0.0025 Moroccans-Agadir -0.0080 Syrians 0.0076 Greeks-A 0.0836

Algerians-A -0.0015 Southern Tunisians -0.0062 Iranians-Kurd 0.0100 Gabonese 0.0904

Moroccans-Ag 0.0106 Algerians-A -0.0055 Lebanese-A 0.0149 Iranians-A 0.0947

Tunisians-S 0.0140 Moroccans-A 0.0010 Lebanese-Y 0.0151 Egyptians-A 0.1090

Tunisians 0.0161 Algerians-B 0.0019 Iranians 0.0159 Iranians 0.1184

Tunisians-C 0.0195 Berbers-Zrawa 0.0027 Lebanese-B 0.0161 Italians 0.1222

Yemenite-J 0.0217 Libyans 0.0028 Iranians-Azeri 0.0185 Iranians-Azeri 0.1394

Tunisians-M 0.0225 Algerians-Oran 0.0033 Turks 0.0192 Iranians-Kurd 0.1418

Saudis-C 0.0231 Tunisians-M 0.0038 Iraq kurdistan 0.0198 Albanians 0.1426

Spaniards 0.0291 Saudis-C 0.0061 Ashkenazi-Jews 0.0222 Turks 0.1428

Saudis 0.0324 Tunisians-C 0.0083 Iranians-A 0.0223 Syrians 0.1470

Saudis-B 0.0349 Algiers 0.0103 Palestinians-A 0.0228 Cretans 0.1483

Algerians-B 0.0353 Berbers-Matmata 0.0106 Italians 0.0241 Egyptians 0.1483

Tunisians-B 0.0422 Moroccans-Chaouya 0.0111 Turks-A 0.0288 Greeks-C 0.1487

Indians-Delhi 0.0454 Spaniards 0.0126 Lebanese 0.0320 Palestinians-A 0.1559

Algiers 0.0461 Moroccans 0.0144 Jordanians-A 0.0355 Iraq Kurdistan 0.1564

Basques-Ar 0.0471 Saudis-D 0.0159 Lebanese-KZ 0.0368 Greeks-D 0.1594

Libyans 0.0485 Khuzestani Arabs 0.0161 Greeks-A 0.0407 Syrians-A 0.1617

(0.0124) had the closest genetic distances from Saudis, while Emiratis were closely related to Omanis (0.0411), Bahrainis (0.0429), Sardinians (0.0593), and Kuwaitis

(0.0688). On the other hand, Sudanese are related to Sub-Saharans, including Nigerians (0.0497), Congolese (0.0594), and Egyptians (0.556).

https://doi.org/10.1371/journal.pone.0192269.t005

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 12 / 24

Mediterranean basin, and is frequent among Basques (17.5%) [41], Moroccans (17.3%) [25],

Algerians (11.3%) [67], and Cretans (7.4%) [69]. In addition, DRB1�07:01-DQB1�02:02 is also frequent in Arabs, such as Moroccans (16.70%), and is reportedly common in Spaniards

(17.3%) [41], and Moroccans (12.6%) [25], but rare in Southern Tunisians (2.10%) (Gabe-

sians). In addition, DRB1�07:01-DQB1�02:01 is also a common DRB1-DQB1 haplotype, and its frequency exceeds 4% in several Arab populations.

Table 6. Most frequent (%) HLA Class I (A-B) two-locus haplotypes with significant linkage disequilibrium (P<0.05) in Arabs.

A-B haplotype Tun Saudi-B Alg Mor-Ch Mor-a Ber-Z Lib Gab

01:01–50:01 - - - 04.10 - - - - 01–57 - - 02.90 - - - - - 02:01–07:02 - - - - - - 02.97 - 02:01–44:02/03 03.86 - - 02.10a 02.95c 07.50b - 05.26 02:01–50:01 03.30 - - - 01.99d 09.01 - - 02:01–51:01 - 04.66 - 03.40 01.62f - - - 23:01–50:01 - 04.90 - - - - 02.97 - 24:02–08:01 - 04.75 - - - - - - 29:01–45:01 01.79 - - - - - - 02.10 29:02–44:03 - - - 02.70 - - - - 30–18 - - 01.50 - 02.60 03.00 - - 30:02–53:01 - 03.48 - - - - - - 32:01–40:02 00.80 - - - - 05.66 - - 33:01–14:01 - - 02.50 - 01.86e 01.41 - - 34:02–08:01 02.12 - - - - 06.11 - 02.10

a02:01–44:02. b02:01–44. c02-44. d02-50. e33-14. f02-51.

https://doi.org/10.1371/journal.pone.0192269.t006

Table 7. Most frequent (%) HLA Class II (DRB1-DQB1) two-locus haplotypes with significant linkage disequilibrium (P<0.05) in Arabs.

HLA-DRB1-DQB1 Tun Sau-B Mor-Ch Bah Leb Alg Lib-J Yem-J Ber-Z Ber-J

01:02–05:01 02.40 02.85 - - - 08.00 02.10 0.70 09.85 04.50 07:01–02:02 14.80 12.32 16.70 - - - 24.70a 22.10a 16.03 - 03:01–02:01 16.60 13.56 12.30 12.02 03.21 11.30 05.60a 12.00a 11.26 - 10:01–05:01 03.80 03.80 - 01.35 04.90 00.30 00.80 04.00 01.41 03.30 07:01–02:01 - - - 09.38 04.20 09.90 - - - 11.00 15:01–06:02 07.80 03.80 08.90 - - 09.90 - - 11.26 02.00 04:02–03:02 02.60 - 06.20 - - 04.20 03.00 07.50 05.15 - 13:01–06:03 02.40 - - - - 03.30 07.70 05.40 05.63 01.80 16:01–05:01 - - - 13.18 03.79 - - - - - 04:01–03:02 - - - 02.78 14.16 - - - - - 11:01–03:01 07.20b 02.22 - 11.98 31.42 04.70 09.30 03.40 07.00b 03.20

aDQB1�02 b11:01/04-03:01

https://doi.org/10.1371/journal.pone.0192269.t007

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 13 / 24

In addition, DRB1�16:01-DQB1�05:01 and DRB1�04:01-DQB1�03:02, rare in neighboring populations and Mediterraneans, were identified only in Lebanese and Bahraini Arabs. The

high frequency of DRB1�11:01-DQB1�03:01 haplotype (31.42%) among Lebanese is notewor- thy, since it is the highest in all populations studied, but rare in Saudi (2.2%). Furthermore,

DRB1�11:01/04-DQB1�03:01, identified in Arabs, is also frequent in Cretans (18.5%) [69] and Basques (3.1%)[41], while DRB1�01:02-DQB1�05:01 was seen in Spaniards (6.30%) [41]. Var- ied frequency of DRB1�13:01-DQB1�06:03 was also reported for Spaniards (13.23%) [86], Cre- tans (3.3%) [69], and Germans (10.8%) [87]. Likewise, DRB1�15:01-DQB1�06:02 was observed in Cretans (2.6%) [65], German population (25.2%) [87], and Southern Ireland (14.90%) [23].

HLA class I and class II extended haplotypes. Table 8 shows the most frequent extended haplotypes in Arab populations, and their likely origins. The systematic review did not reveal

haplotypes shared by Arab populations because of partial presentation of haplotypic data, dis-

parity in the level of typing resolution, variability of the studied loci, and lack of data. In addi-

tion, Arab populations share their frequent extended haplotypes with several European,

especially Mediterranean, and Asian populations (Table 8). Furthermore, the possible origins

of the most frequent extended haplotypes among Arabs are mainly European, Asian or

Autochthonous.

Table 8. The most frequent (%) HLA extended haplotypes in Arabs.

HLA Extended haplotypes Arab Populations [references] Possible origin A�02:01-B�50:01-DRB1�07:01-DQB1�02:02a Southern Tunisians (3.2%)[62], Berbers of Zrawa (8.12%) [24] Euro-Asiatic A�02:01–B�44– DRB1�04:02–DQB1�03:02b Berbers of Zrawa (6.5%)[24] Tunisians (0.6%) [61] Western European A�24:02-B�08:01-C�07:02-DRB1�03:01c Saudis (3.16%) [49] Euro-Asiatic A�23:01-B�50:01-C�06:02-DRB1�07:01 Saudis (3.16%) [49] Autochthonous A�33-C�8-B�14-DRB1�01:02-DQA1�01:01-DQB1�05:01d Algerians (1.5%) [88] Mediterranean A�30-C�5-B�18-DRB1�03:01-DQA1�05:01-DQB1�02:01e Algerians (1.5%) [88] Iberian-paleo-North

African

A�02:01-C�06:02-B�50:01-DRB1�07:01-DQA1�02:01-DQB1�02:02f Moroccans (2.9%) [65] Euro-Asiatic A�01:01-C�06:02-B�50:01-DRB1�03:01-DQA1�05:01-DQB1�02:01g Moroccans (2.9%) [65] Mediterranean A�30-B�07-DRB1�03-DQA1�05:01-DQB1�02:01h Jordanians (1.38%) [31] Euro-Asiatic A�1-B�8-DRB1�03-DQA1�05:01-DQB1�02:01i Jordanians (1.03%) [31] Pan-European A�02:01-B�50:01-DRB1�07:01j Libyans (4.24%) [32] Tunisians (1.8%) [60], and Ghannouch (2.5%)

[33].

North African

A�11:01-B�52:01-DRB1�15:02k Libyans (2.54%) [32]; Yemen Jews (0.93%) [23] Mediterranean A�69-B�49-DRB1�04:03-DQB1�03:02 Palestinians (2.4%) [29] Autochthonous A�24-B�18-DRB1�11:04-DQB1�03:01l Palestinians (1.8%) [29] Central-South-Eurasian

a present in Spaniards (1.2%) [41], Turks (1.3%) [79], Italians (0.5%) [68], and Moroccan Jews (2%) [66].

b also found in British (2.6%), Cornish (7.9%), Danes (2%) [39], Italians (0.9%) [68], Spaniards (0.6%) [41], Spanish Basques (1.9%), Pasiegos (3.3%), Cabuemigos (2.2%)

[77], and Portuguese (3.1%) [39]. c

present at low frequencies in the Euro-Asian minorities of Germany [23]. d

found in Armenians (0.031), Sardinians (0.027), French (0.014), Greeks (0.011), and Italians (0.007) [68]. e

also found in Sardinians (11.4%), and French-Basques (4.7%) [68]. f present also in Mongolians [68], Turks [79].

g found in Spaniards, Italians, and north Africans [65].

h present in Cornish (0.084), British (3.3%), and Danes (3.8%) [68].

i present in Basques (5%), Spaniards (3.4%) [41], Macedonians (4.9%) [78], Yugoslavians (7.7%), British (2.9%), and Germans (4.8%) [68].

j found in Poland Jews (1.15%); Ashkenazi Jews (0.92%) [23].

k present in Ashkenazi Jews (1.05%) [23].

l found in Armenians (2.1%) and Italians (0.7%) [23].

https://doi.org/10.1371/journal.pone.0192269.t008

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 14 / 24

Discussion

This meta-analysis is the first genetic anthropology study in MENA region, and included 100

populations from 36 Arab and neighbouring countries, and comprising in excess of 16,000

individuals. A main outcome of the study is the lack of striking differences in the distribution

of HLA alleles and haplotypes between North Africans and Arabian Peninsula populations. On the contrary, key differences were noted between Levant Arabs (Lebanese, Palestinians, Syr-

ians), and other Arab populations, highlighted by high frequencies of A�24, B�35, DRB1�11:01, DQB1�03:01, and DRB1�11:01-DQB1�03:01 haplotype in Levantine Arabs compared to other Arab populations. Class I haplotype frequencies are lower than Class II haplotypes, because of

weak LD between A and B loci, due to long physical distance between them, compared to DRB1 and DQB1 loci. The identification of shared haplotypes between Arabs and other Medi- terranean and Asian populations is attributed to the higher admixture of Mediterraneans and

Asians in Arab populations.

Iberians, North Africans, and Arabian Peninsula inhabitants

The relatedness between North Africans and Iberians was previously discussed [29, 59–62, 69,

78, 79, 86, 88]. Using correspondence analysis, NJ trees and genetic distances, our results show

that North Africans are genetically close to Iberians, which is supported by historical events.

First, this relatedness is attributed to the Berber migration from the African Sahara northwards

in 10000–4000 BC, because of hyper-arid conditions [69]. It may also be explained by the simi-

lar history between Iberians and North Africans, both of whom were invaded by Phoenicians,

Romans, Germans, Muslim Arabs [89]; the respective invading armies had a mixed genetic

complexity; indeed, most of them were mercenaries recruited in recent conquests like in the

case of Phoenicians [90] and Muslim who invaded Iberia had troops that were mostly Berbers.

The invasion of Iberia by Muslims in the 8th century AD may have had a role in the related-

ness between North Africans and Iberians for two reasons: first, most Muslim invaders recruits

were North African Berbers, and the second is explained by the 8 centuries period of settle-

ment of the Muslims in Iberia, although more ancient and continuous gene exchange since

prehistoric times between Iberia and North Africa may have been induced the main exchange

[86]; massive mixed marriages and breeding across religious Iberian groups under Muslim

rule is not documented.

The analyses performed showed that current North Africans are closely related to Tunisian

(Zrawa and Matmata) and Moroccan (Sousse-Agadir and Eljadida) Berbers, suggesting that

North Africans have a genetic Berber profile. On the contrary, North Africans displayed a

greater distance from the Arabs of Levant (Palestinians, Syrians, Lebanese, and Jordanians),

indicating low genetic contribution of Phoenician and Levant Arab invasion of North Africa.

These observations based on HLA markers prompted the conclusion that all Berbers of North Africa constitute a homogeneous genetic unit, except for small isolates, such as the Berbers of

Djerba, who display a Berber genetic profile.

Saudi populations used in this study originated from Eastern Saudi Arabia, especially from

Riyadh province. There is no reliable HLA data on Eastern Saudi Arabia that shed light on pre- Islamic history; some ancient people may have originated from old Persians, but quantification

is difficult and undetermined [91]. The genetic heterogeneity between Eastern and Western

Saudi Arabia is very possible, and should be taken into account in further interpretation. All

analyses performed here, using HLA-A,-B, -DRB1, and DQB1 markers support the notion that Saudis along with the Kuwaitis and Yemenis are closely related to North Africans.

The most plausible explanation for West Arabia and Yemen clustering with Iberian/North

Africans is a possible important massive migration that occurred when Sahara underwent

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 15 / 24

desiccation in all directions [92, 93]. Cultural and language relatedness of many Mediterranean

languages, including old Iberian and Basque [92], with Berber language are concordant with our

genetic findings and Saharan origin hypothesis; also a part of Arabian Peninsula inhabitants

(including Yemen) may had been reached by Saharan people. In fact, Malika Hachid who has

been studying Saharan and North African Archaeology, culture and rock painting/writing of pre-

historic Sahara, even suggests that first known writing alphabet was originated in Sahara. Proto-

Berber writing rock characters have been used (very similar to present day used Berber scripts).

This Proto-Berber language could have appeared 5,000 years BC [94, 95].

Explanation to HLA Kuwait genetic similarity to this group seems more difficult to achieve but interaction between Arabian Peninsula and Mesopotamia through this strategic Kuwait

area is documented since 6,500 years BC (Ubard Period) [96].

Arabs of Levant

Using genetic distances, correspondence analysis and NJ trees, we showed earlier [61, 62] and

in this study that Palestinians, Syrians, Lebanese and Jordanians are closely related to each

other and to Eastern Mediterranean Europeans (Turks, Cretans, Greeks), Egyptians and Irani-

ans, and confirmed by HLA class I (A, B) and class II markers (DRB1 and DQB1) analysis. However, Levant Arabs are distant from North African Arabs (Tunisians, Algerians, Moroc-

cans and Libyans) and Iberians (Basques, Spaniards). The strong relatedness between Levant

Arab populations is explained by their common ancestry, the ancient Canaanites, who came

either from Africa or Arabian Peninsula via Egypt in 3300 BC [97], and settled in Levant low-

lands after collapse of Ghassulian civilization in 3800–3350 BC [98]. The relatedness is also

attributed to the close geographical proximity, which constituted one territory before 19th cen-

tury British and French colonization.

The close relatedness of Levant Arabs to Egyptians, as confirmed genetic distances using

HLA markers, may be due to three reasons. First, Egypt is a neighbor to Levant Arab countries, and historically part of the Levant. Second, the Egyptians invaded the Levant several times

throughout history; the most significant was 1468 BC invasion, where they settled for 12 centu-

ries [99]. Third, the Canaanites, the likely ancestors of Levant Arabs, may have originated

from Africa through Egypt, where they settled for a long period, suggesting likely admixture

between Canaanites and Egyptians.

Historically, Levant is a wider region that included countries along the Eastern Mediterra-

nean with its islands, and extended from Greece to Cyrenaica [100]. Broadly, Levant was his-

torically characterized by high migratory flow between its sub-regions in all directions. For

example, present-day Levant comprising Palestine, Lebanon, Syria, and Jordan has undergone

successive invasions by populations originating from the great Levant, including Egyptians

(1468 BC), Horites, Amorites, Hitites (Turks), Greeks (1200 BC), Assyrians (1090 BC) [99],

and more recently the Ottomans. This has favored admixture, reduced distances and homoge-

nized Great Levant populations, thus explaining the close relatedness of Levant Arabs to East-

ern Mediterranean populations. On the other hand, Levant Arabs are distant from Saudis,

Kuwaitis, and Yeminis, an indication that the contribution of the Arabian Peninsula popula-

tions to Levantine gene pool is low, probably due to the absence of the demographic aspect of

7th century invasion.

Sudanese and Comorians

Sudanese are close to sub-Saharan Africans (Nigerians, Congolese, and Senegalese), and North

Africans, in particular Egyptians, suggesting that the genetic profile of Sudanese is the admix-

ture between North Africans (especially Egyptians) and sub-Saharan Africans throughout

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 16 / 24

history. The close relatedness of Sudanese to sub-Saharan Africans suggests a reduced genetic

effect of Arabs on Sudanese. Also, the Comorians (Comoros islands officially joined League of

Arab Countries in 1993) are close to sub-Saharan Africans (Congolese, Nigerians, and Gabo-

nese) [43], Egyptians, Iranians, and Eastern Mediterranean. This suggests high admixture

between populations belonging to three continents in the Comoro Islands, and can be

explained by their geographical position as a corridor for international trade.

Bahrainis, Emiratis, and Omanis

Bahrainis, Emiratis, and Omanis are geographically similar populations, which explains their

genetic relationship as demonstrated in this study. These three populations tend to form a het-

erogeneous group with Pakistanis, Indians, Iranian Arabs (Famoori), Sardinians (the later

probably close to Iberians/North Africans but behaving as out layer group in analyses because

of they are a genetic island isolate), Egyptians, and some sub-Saharan Africans, such as Congo-

lese. These populations appear close to certain Eastern Mediterranean populations including

Greeks, Macedonians, and those further, in particular North Africans, hence explaining their

intermediate grouping, and distinction from two main clusters. Collectively, this suggests high

admixture in these populations brought about by their commercially important position. Sar-

dinia is a relative genetic isolate “founded” by Iberian Norax/Nora (first documented Sardin-

ian capital close to Cagliari) and Iberians/North Africans may be genetically related to

Sardinians (A�30-B�18-Cw�5 basic HLA haplotype is very high in Sardinia, Iberia, and North Africa) [93].

Minorities of Arab World

Ethnic minorities. The Kurds and Berbers are the two major ethnic minorities in Arab

world. Berbers are indigenous North African ethnic group found over a vast area stretching

from Atlantic Ocean to Siwa Oasis in Egypt, and from Mediterranean Sea to Niger River. Berbers

number about 20 million people, and constitute 40–45% of Moroccans, 20–25% of Algerians,

and 2–7% in both Libya and Tunisia. The Kurds live in the northern regions of Iraq (15–20%)

and Syria (10%). They constitute an Indo-European ethnic group, and speak Kurdish. Less

important minorities include Armenians, Nubians, Assyrians, and Turkmen [99].

Berbers populations used in this work are closely linked to each other, as well as to present-

day North Africans, and to Western Mediterranean populations, especially Iberians. Indeed,

the Moroccan Berbers are not genetically different from the current Moroccans, nor those of

neighboring populations, like Algerians and Tunisians. This also applies to Tunisian Berbers,

except those of the island of Djerba, who appear to be related to Eastern Mediterranean popu-

lations, including Levant Arabs. This suggests that North African Berbers are in perfect har-

mony with their environments, and that differences between them are cultural rather than

genetic due to 7th century Arabization of the region.

Clustering and genetic distances analyses demonstrated that Iraqi and Iranian Kurds are not

genetically different from Iranians or neighboring populations, including Levant Arab, and are

close to Turks and other Eastern Mediterranean populations. This suggests that Kurds originate

from the region, and are in genetic harmony with neighboring populations, despite the clear

cultural differences. This suggests that Kurds, Syrians, Jordanians, Palestinians, Iraqis, Lebanese,

and Iranians probably share the same genetic profile, with few differences. Accordingly, our

findings confirm the results of an earlier study of Arnaiz-Villena on Iraqi Kurds [54].

Religious minorities. Sunni Muslims constitute the majority (80%) of Arab populations,

followed by Shi’a Muslims (10%) who are present in parts of Iraq, Lebanon, Saudi Arabia,

Kuwait, Yemen, and Bahrain. Non-Muslims make up about 10% of all Arabs, and Christianity

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 17 / 24

(6%) is the second largest religion among Arabs, with about 20 million Christians living in

Lebanon, Egypt, Iraq, Syria, and Jordan. Other minor religions (4%) such as Judaism, Druze

and others are practiced on a much smaller scale [99].

HLA data on Sunni and Shiite Arabs are not available, same as comparison of Muslims to Christians. The only available data are those concerning Arab Jews. In this study, data are

available for three Jewish populations, including two from North Africa (Moroccan and Lib-

yan Jews) and one from the Arabian Peninsula (Yemenite Jews). While genetic distances sepa-

rating these three groups of Jews are small (S1 Table), genetic heterogeneity between these

Jewish populations was noted. For example, Yemenite Jews are related to Western Mediterra-

nean populations, including North Africans and Iberians, while Libyan Jews are related to

Eastern Mediterraneans, including Levantine Arabs. The relatedness of Moroccan Jews

depends to other communities on the studied HLA loci; they associate with Eastern Mediterra- neans using DRB1, but group with Eastern Mediterraneans when the other markers are used.

Conclusion

This study supports the notion that Arabs are divided into four groups. The first consisting

of North Africans (Algerians, Tunisians, Moroccans, and Libyans), Saudis, Kuwaitis, and

Yemenis, with relatedness to Western Mediterraneans, including Iberians. The second

includes Levantine Arabs (Palestinians, Jordanians, Lebanese, and Syrians), Iraqi, and

Egyptians, who appear to be related to the Eastern Mediterranean and Iranians, who in

turn belonged to ’Great Levant’ historically described. The third consists of Sudanese and

Comorians who associate with Sub-Saharan Africans. Finally, the fourth group of Arabs

comprises Omanis, Emiratis, and Bahrainis. This group associates with heterogeneous pop-

ulations (Mediterranean, Asian and sub-Saharan). Lastly, the two main indigenous minori-

ties, Berbers and Kurds, are not genetically different from the ‘host’ and neighboring

populations.

Supporting information

S1 Checklist. PRISMA 2009 checklist.

(DOC)

S1 Fig. Neighbor-Joining dendrograms, based on standard genetic distances (SGD), show-

ing relatedness between Arabs and other populations using generic HLA-DRB1� allele fre- quencies data. Populations’ data were taken from references detailed in Tables 1 and 2.

Bootstrap values from 1.000 replicates are shown.

(TIF)

S2 Fig. Neighbor-Joining dendrograms, based on standard genetic distances (SGD), show-

ing relatedness between Arabs and other populations using generic HLA-B� allele frequen- cies data. Populations’ data were taken from references detailed in Tables 1 and 2. Bootstrap

values from 1.000 replicates are shown.

(TIF)

S1 Table. Genetic distances between three groups of Arab Jews based on HLA-DRB1 and

-DQB1 alleles frequencies.

(DOC)

Author Contributions

Conceptualization: Abdelhafidh Hajjej, Lasmar Hattab, Slama Hmida.

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 18 / 24

Formal analysis: Abdelhafidh Hajjej, Slama Hmida.

Investigation: Abdelhafidh Hajjej.

Methodology: Wassim Y. Almawi, Slama Hmida.

Software: Abdelhafidh Hajjej, Lasmar Hattab.

Supervision: Slama Hmida.

Validation: Abdelhafidh Hajjej, Wassim Y. Almawi, Antonio Arnaiz-Villena, Lasmar Hattab,

Slama Hmida.

Writing – original draft: Abdelhafidh Hajjej.

Writing – review & editing: Wassim Y. Almawi, Antonio Arnaiz-Villena.

References 1. HLA allele database: http://hla.alleles.org (last accessed on September 17, 2017)

2. Hudson RR. Analysis of population subdivision in Handbook of statistical genetics, MBD. Balding

MBD and Cannings C. (Eds). pp. 309–324. John Wiley & Sons Chichester, UK, 2001

3. Takezaki N, Nei M. Empirical tests of the reliability of phylogenetic trees constructed with microsatellite

DNA. Genetics. 2008; 178(1): 385–92. https://doi.org/10.1534/genetics.107.081505 PMID: 18202381

4. Nei M. Phylogenetic analysis in molecular evolutionary genetics. Annual Review of Genetics. 1996;

30: 371–403. https://doi.org/10.1146/annurev.genet.30.1.371 PMID: 8982459

5. Tamura K, Nei M, Kumar S. Prospects for inferring very large phylogenies by using the neighbor-join-

ing method. Proceedings of the National Academy of Sciences USA. 2004; 101(30): 11030–5. https://

doi.org/10.1073/pnas.0404206101 PMID: 15258291

6. Excoffier L, Slatkin M. Maximum-likelihood estimation of molecular haplotype frequencies in a diploid

population. Molecular Biology and Evolution. 1995; 12(5): 921–7. https://doi.org/10.1093/

oxfordjournals.molbev.a040269 PMID: 7476138

7. The World Factbook: https://www.cia.gov/library/publications/the-world-factbook

8. Bengio O, Ben-Dor G. Minorities and the State in the Arab World. Lynne Rienner Publishers, 1999–

224 pages

9. Encyclopædia Britannica, Himyar: https://www.britannica.com/topic/Himyar

10. Korotayev A. Ancient Yemen. Oxford: Oxford University Press, 1995.

11. Korotayev A. Pre-Islamic Yemen. Wiesbaden: Harrassowitz Verlag, 1996.

12. Munro-Hay, Stuart C. Aksum: An African Civilization of Late Antiquity 1991. Edinburgh: Edinburgh

University Press, 1991.

13. Robin CJ. Arabia and Ethiopia, ’in Johnson Scott (ed.) The Oxford Handbook of Late Antiquity, Oxford

University Press 2012 pp. 247–333, p.279.

14. Hoyland R. Arabia and the Arabs: From the Bronze Age to the Coming of Islam, Routledge, 2001,

p.51.

15. Encyclopædia wikipedia: https://en.wikipedia.org/wiki/History_of_Islam

16. Hourani A. A History of the Arab Peoples. Harvard University Press 2002; pp. 15–19. ISBN

9780674010178.

17. Moher D, Liberati A, Tetzlaff J, Altman DG, and PRISMA Group, “Reprint—preferred reporting

items for systematic reviews and meta-analyses: the PRISMA statement”. Physical Therapy. 2009;

89(9): 873–80. https://doi.org/10.1093/ptj/89.9.873 PMID: 19723669

18. Young FW, Bann CM. A visual statistics system. In Stine RA, Fox J, eds. Statistical computing envi-

ronments for social researches. New York: Sage publications. 1996; 207–36.

19. Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Molecular Biology and Evolution. 1987; 4(4): 406–425. https://doi.org/10.1093/oxfordjournals.molbev.

a040454 PMID: 3447015

20. Nei M. Genetic distances between populations. The American Naturalist. 1972; 106:283. http://jstor.

org/stable/2459777

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 19 / 24

21. Nei M. Analysis of gene diversity in subdivided populations. Proceedings of the National Academy of

Sciences USA. 1973; 70(12): 3321–3. PMID: 4519626.

22. Nei M, Tajima F, Tateno Y. Accuracy of estimated phylogenetic trees from molecular data. II. Gene

frequency data.Journal of Molecular Evolution. 1983; 19(2): 153–70. https://doi.org/10.1007/

BF02300753 PMID: 6571220

23. Database of allele frequencies: http://www.allelefrequencies.net, 2017

24. Hajjej A, Sellami MH, Kaabi H, Hajjej G, El-Gaaied A, Boukef K, et al. HLA class I and class II polymor-

phisms in Tunisian Berbers. Annals of Human Biology. 2011; 38 (2): 156–64. https://doi.org/10.3109/

03014460.2010.504195 PMID: 20666704

25. Gomez-Casado E, del Moral P, Martinez-Laso J, Garcı́a-Gómez A, Allende L, Silvera-Redondo C,

et al. HLA gene in Arabic-Speaking Moroccans: close relatedness to Berbers and Iberians. Tissue

Antigens. 2000; 55(3): 239–49. https://doi.org/10.1034/j.1399-0039.2000.550307.x PMID: 10777099

26. Mahfoudh N, Ayadi I, Kamoun A, Ammar R, Mallek B, Maalej L, et al. Analysis of HLA-A, -B, -C, -DR,

-DQ polymorphisms in the South Tunisian population and a comparison with other populations. Annals

of Human Biology. 2013; 40(1): 41–7. https://doi.org/10.3109/03014460.2012.734334 PMID:

23095049

27. Matevosyan L, Chattopadhyay S, Madelian V, Avagyan S, Nazaretyan M, Hyussian A, et al. HLA-A,

HLA-B, and HLA-DRB1 allele distribution in a large Armenian population sample. Tissue Antigens.

2011; 78(1): 21–30. https://doi.org/10.1111/j.1399-0039.2011.01668.x PMID: 21501120

28. Hamdi NM, Al-Hababi FH, Eid AE. HLA class I and class II associations with ESRD in Saudi Arabian

population. PLoS One. 2014 Nov 7; 9(11): e111403. https://doi.org/10.1371/journal.pone.0111403

PMID: 25380295

29. Arnaiz-Villena A, Elaiwa N, Silvera C, Rostom A, Moscoso J, Gómez-Casado E, et al. The origin of

Palestinians and their genetic relatedness with other Mediterranean populations. Retraction in: Suciu-

Foca N, Lewis R. Human Immunology. 2001; 62(9): 889–900. (Accessed on https://commons.

wikimedia.org/wiki/File:Palestinians_hla.pdf) PMID: 11543891

30. Albalushi KR, Sellami MH, Alriyami H, varghese M, Boukef MK, Hmida S. The Investigation of the Evo-

lutionary History of the Omani Population by Analysis of HLA Class I Polymorphism. Anthropologist.

2014; 18(1): 205–210

31. Sánchez-Velasco P, Karadsheh NS, Garcı́a-Martı́n A, Ruı́z de Alegrı́a C, Leyva-Cobián F. Molecular

analysis of HLA allelic frequencies and haplotypes in Jordanians and comparison with other related

populations. Human Immunology. 2001; 62(9): 901–9. https://doi.org/10.1016/S0198-8859(01)

00289-0. PMID: 11543892.

32. Galgani A, Mancino G, Martı́nez-Labarga C, Cicconi R, Mattei M, Amicosante M, et al. HLA-A, -B and

-DRB1 allele frequencies in Cyrenaica population (Libya) and genetic relationships with other popula-

tions. Hum Immunol. 2013; 74(1): 52–9. https://doi.org/10.1016/j.humimm.2012.10.001 PMID:

23079236

33. Hajjej A, Hmida S, Kaabi H, Dridi A, Jridi A, El Gaaled A, et al. HLA genes in Southern Tunisians

(Ghannouch area) and their relationship with other Mediterraneans. European Journal Medical Genet-

ics. 2006; 49(1): 43–56. https://doi.org/10.1016/j.ejmg.2005.01.001 PMID: 16473309

34. Hmida S, Gauthier A, Dridi A, Quillivic F, Genetet B, Boukef K, et al. HLA class II gene polymorphism

in Tunisians. Tissue Antigens. 1995; 45(1): 63–8. https://doi.org/10.1111/j.1399-0039.1995.tb02416.

x PMID: 7725313

35. Almawi WY, Busson M, Tamim H, Al-Harbi EM, Finan RR, Wakim-Ghorayeb SF, et al. HLA class II

profile and distribution of HLA-DRB1 and HLA-DQB1 alleles and haplotypes among Lebanese and

Bahraini Arabs. Clinical and Diagnostic Laboratory Immunology. 2004; 11(4): 770–4. https://doi.org/

10.1128/CDLI.11.4.770-774.2004 PMID: 15242955

36. Amar A, Kwon OJ, Motro U, Witt CS, Bonne-Tamir B, Gabison R, et al. Molecular analysis of HLA

class II polymorphisms among different ethnic groups in Israel. Human Immunology. 1999; 60(8):

723–30. https://doi.org/10.1016/S0198-8859(99)00043-9 PMID: 10439318

37. Izaabel H, Garchon HJ, Caillat-Zucman S, Beaurain G, Akhayat O, Bach JF, et al. HLA class II DNA

polymorphism in a Moroccan population from the Souss, Agadir area. Tissue Antigens. 1998; 51(1):

106–10. https://doi.org/10.1111/j.1399-0039.1998.tb02954.x PMID: 9459511

38. Al-Tonbary Y, Abdel-Razek N, Zaghloul H, Metwaly S, El-Deek B, El-Shawaf R. HLA class II polymor-

phism in Egyptian children with lymphomas. Hematology. 2004; 9(2): 139–45. https://doi.org/10.

1080/1024533042000205487 PMID: 15203870

39. Clayton J, Lonjou C. Allele and Haplotype frequencies for HLA loci in various ethnic groups. In

Charron D, ed. Genetic diversity of HLA. Functional and medical implications. Vol 1. Paris: EDK.

1997; 665–820.

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 20 / 24

40. Abdennaji Guenounou B, Loueslati BY, Buhler S, Hmida S, Ennafaa H, Khodjet-Elkhil H, et al. HLA

class II genetic diversity in Southern Tunisia and the Mediterranean area. International Journal Immu-

nogenetics. 2006; 33(2): 93–103. https://doi.org/10.1111/j.1744-313X.2006.00577.x PMID:

16611253

41. Martinez-Laso J, De Juan D, Martinez-Quiles N, Gomez-Casado E, Cuadrado E, Arnaiz-Villena A.

The contribution of the HLA-A, -B, -C and -DR, -DQ DNA typing to the study of the origins of Spaniards

and Basques. Tissue Antigens. 1995; 45(4): 237–45. https://doi.org/10.1111/j.1399-0039.1995.

tb02446.x PMID: 7638859.

42. Brick C, Bennani N, Atouf O, Essakalli M. HLA-A, -B, -DR and -DQ allele and haplotype frequencies in

the Moroccan population: a general population study. Transfusion Clinique et Biologique. 2006; 13(6):

346–52. https://doi.org/10.1016/j.tracli.2006.12.003 PMID: 17306585

43. Gibert M, Touinssi M, Reviron D, Mercier P, Boëtsch G, Chiaroni J. HLA-DRB1 frequencies of the

Comorian population and their genetic affinities with Sub-Saharan African and Indian Oceanian popu-

lations. Annals of Human Biology. 2006; 33(3): 265–78. https://doi.org/10.1080/03014460600578599

PMID: 17092866

44. Samaha H, Rahal EA, Abou-Jaoude M, Younes M, Dacchache J, Hakime N. HLA class II allele fre-

quencies in the Lebanese population. Molecular Immunology. 2003; 39(17–18): 1079–81. https://doi.

org/10.1016/S0161-5890(03)00073-7 PMID: 12835080

45. Khansa S, Hoteit R, Shammaa D, Khalek RA, El Halas H, Greige L, et al. HLA class II allele frequen-

cies in the Lebanese population. Gene. 2012; 506(2): 396–9. https://doi.org/10.1016/j.gene.2012.06.

063 PMID: 22750800

46. Elbjeirami WM, Abdel-Rahman F, Hussein AA. Probability of finding an HLA-matched donor in imme-

diate and extended families: the Jordanian experience. Biology of Blood and Marrow Transplantation.

2013; 19(2): 221–6. https://doi.org/10.1016/j.bbmt.2012.09.009 PMID: 23025986

47. Mourad J, Monem F. HLA-DRB1 allele association with rheumatoid arthritis susceptibility and severity

in Syria. Revista Brasileira De Reumatologia. 2013; 53(1): 47–56. PMID: 23588515

48. Djidjik R, Allam I, Douaoui S, Meddour Y, Cherguelaı̂ne K, Tahiat A, et al. Association study of human

leukocyte antigen-DRB1 alleles with rheumatoid arthritis in Algerian patients. International Journal of

Rheumatic Diseases. 2014. https://doi.org/10.1111/1756-185X.12272 PMID: 24447879

49. Hajeer AH, Al Balwi MA, AytülUyar F, Alhaidan Y, Alabdulrahman A, Al Abdulkareem I, et al. HLA-A,

-B, -C, -DRB1 and -DQB1 allele and haplotype frequencies in Saudis using next generation sequenc-

ing technique. Tissue Antigens. 2013; 82(4): 252–8. https://doi.org/10.1111/tan.12200 PMID:

24461004

50. Hajeer AH, Sawidan FA, Bohlega S, Saleh S, Sutton P, Shubaili A, Tahan AA, Al Jumah M. HLA class

I and class II polymorphisms in Saudi patients with myasthenia gravis. International Journal of Immu-

nogenetics. 2009; 36(3): 169–72. https://doi.org/10.1111/j.1744-313X.2009.00843.x PMID:

19490212

51. Albalushi KR, Sellami MH, Alriyami H, varghese M, Boukef MK, Hmida S. HLA Class II (DRB1 and

DQB1) Polymorphism in Omanis. Journal of Transplantation Technologies and Research 2014; 4:

134. https://doi.org/10.4172/2161-0991.1000134

52. Haider MZ, Shaltout A, Alsaeid K, Qabazard M, Dorman J. Prevalence of human leukocyte antigen

DQA1 and DQB1 alleles in Kuwaiti Arab children with type 1 diabetes mellitus. Clinical Genetics.

1999; 56(6): 450–6. https://doi.org/10.1034/j.1399-0004.1999.560608.x PMID: 10665665

53. Haider MZ, Zahid MA, Dalal HN, Razik MA. Human leukocyte antigen (HLA) DRB1 alleles in Kuwaiti

Arabs with schizophrenia.American Journal of Medical Genetics. 2000; 96(6): 870–2. https://doi.org/

10.1002/1096-8628(20001204)96:6<870::AID-AJMG36>3.0.CO;2-L PMID: 11121200. 54. Arnaiz-Villena A, Palacio-Grüber J, Muñiz E, Campos C, Alonso-Rubio J, Gomez-Casado E, et al.

Genetic HLA Study of Kurds in Iraq, Iran and Tbilisi (Caucasus, Georgia): Relatedness and Medical

Implications. PLoS One. 2017 Jan 23; 12(1): e0169929. https://doi.org/10.1371/journal.pone.

0169929 PMID: 28114347

55. Nassar MY, Al-Shamahy HA, Masood HA. The Association between Human Leukocyte Antigens and

Hypertensive End-Stage Renal Failure among Yemeni Patients. Sultan Qaboos University Medical

Journal. 2015; 15(2): e241–249. PMID: 26052458

56. Middleton D, Williams F, Meenagh A, Daar AS, Gorodezky C, Hammond M, et al. Analysis of the

distribution of HLA-A alleles in populations from five continents. Human Immunology. 2000; 61

(10): 1048–52. https://doi.org/10.1016/S0198-8859(00)00178-6 PMID: 11082518

57. Williams F, Meenagh A, Darke C, Acosta A, Daar AS, Gorodezky C, et al. Analysis of the distribution

of HLA-B alleles in populations from five continents. Human Immunology. 2001; 62(6): 645–50.

https://doi.org/10.1016/S0198-8859(01)00247-6 PMID: 11390040

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 21 / 24

58. Jazairi B, Khansaa I, Ikhtiar A, Murad H. Frequency of HLA-DRB1 and HLA-DQB1 Alleles and Haplo-

type Association in Syrian Population. Immunological Investigation. 2016; 45(2): 172–9. https://doi.

org/10.3109/08820139.2015.1131293 PMID: 26853713

59. Hajjej A, Hajjej G, Almawi WY, Kaabi H, El-Gaaied A, Hmida S. HLA class I and class II polymorphism

in a population from south-eastern Tunisia (Gabes Area). International Journal of Immunogenetics.

2011; 38(3): 191–9. https://doi.org/10.1111/j.1744-313X.2011.01003.x PMID: 21385325

60. Hajjej A, Kâabi H, Sellami MH, Dridi A, Jeridi A, El borgi W, et al. The contribution of HLA class I and II

alleles and haplotypes to the investigation of the evolutionary history of Tunisians. Tissue Antigens.

2006; 68(2): 153–62. https://doi.org/10.1111/j.1399-0039.2006.00622.x PMID: 16866885

61. Hajjej A, Almawi WY, Hattab L, El-Gaaied A, Hmida S. HLA Class I and Class II Alleles and Haplo-

types Confirm the Berber Origin of the Present Day Tunisian Population. PLoS One. 2015; 10(8):

e0136909. https://doi.org/10.1371/journal.pone.0136909 PMID: 26317228

62. Hajjej A, Almawi WY, Hattab L, El-Gaaied A, Hmida S. The investigation of the origin of Southern Tuni-

sians using HLA genes. Journal of Human Genetics. 2017; 62(3): 419–429. https://doi.org/10.1038/

jhg.2016.146 PMID: 27881842

63. Ayed K, Ayed-Jendoubi S, Sfar I, Labonne MP, Gebuhrer L. HLA class-I and HLA class-II phenotypic,

gene and haplotypic frequencies in Tunisians by using molecular typing data. Tissue Antigens. 2004;

64(4): 520–32. https://doi.org/10.1111/j.1399-0039.2004.00313.x PMID: 15361135

64. Oumhani K, Canossi A, Piancatelli D, Di Rocco M, Del Beato T, Liberatore G, et al. Sequence-Based

analysis of the HLA-DRB1 polymorphism in Metalsa Berber and Chaouya Arabic-speaking groups

from Morocco. Human Immunology. 2002; 63(2): 129–38. https://doi.org/10.1016/S0198-8859(01)

00370-6 PMID: 11821160

65. Canossi A, Piancatelli D, Aureli A, Oumhani K, Ozzella G, Del Beato T, et al. Correlation between

genetic HLA class I and II polymorphisms and anthropological aspects in the Chaouya population

from Morocco (Arabic speaking). Tissue Antigens. 2010; 76(3): 177–193. https://doi.org/10.1111/j.

1399-0039.2010.01498.x PMID: 20492599

66. Roitberg-Tambur A, Witt CS, Friedmann A, Safirman C, Sherman L, Battat S, Nelken D, Brautbar C.

Comparative analysis of HLA polymorphism at the serologic and molecular level in Moroccan and Ash-

kenazi Jews. Tissue Antigens. 1995; 46(2): 104–10. https://doi.org/10.1111/j.1399-0039.1995.

tb02485.x PMID: 7482502

67. Arnaiz-Villena A, Benmamar D, Alvarez M, Diaz-Campos N, Varela P, Gomez-Casado E, et al. HLA

allele and haplotype frequencies in Algerians. Relatedness to Spaniards and Basques. Human Immu-

nology. 1995; 43(4): 259–68. https://doi.org/10.1016/0198-8859(95)00024-X PMID: 7499173

68. Imanishi T, Akaza T, Kimura A, Tokunaga K, Gjobori T. Allele and haplotype frequencies for HLA and

complement loci in various ethnic groups. In, eds. HLA 1991. VOL 1. Oxford: Oxford University

Press. 1992; 1065–220.

69. Arnaiz-Villena A, Iliakis P, González-Hevilla M, Longás J, Gómez-Casado E, Sfyridaki K, et al. The ori-

gin of Cretan populations as determined by characterization of HLA alleles. Tissue Antigens. 1999; 53

(3): 213–26. https://doi.org/10.1034/j.1399-0039.1999.530301.x PMID: 10203014

70. Comas D, Mateu E, Calafell F, Pérez-Lezaun A, Bosch E, Martı́nez-Arias R, et al. HLA class I and

class II DNA typing and the origin of Basques. Tissue Antigens. 1998; 51(1): 30–40. https://doi.org/

10.1111/j.1399-0039.1998.tb02944.x PMID: 9459501

71. Grimaldi MC, Crouau-Roy B, Amoros JP, Cambon-Thomsen A, Carcassi C, Orru S, et al. West Medi-

terranean islands (Corsica, Balearic Islands, Sardinia) and the Basque population: contribution of HLA

class I molecular markers to their evolutionary history. Tissue Antigens. 2001; 58(5): 281–92. https://

doi.org/10.1034/j.1399-0039.2001.580501.x PMID: 11844138

72. Renquin J, Sanchez-Mazas A, Halle L, Rivalland S, Jaeger G, Mbayo K, et al. HLA class II polymor-

phism in Aka Pygmies and Bantu Congolese and a reassessment of HLA-DRB1 African diversity. Tis-

sue Antigens. 2001; 58(4): 211–22. https://doi.org/10.1034/j.1399-0039.2001.580401.x PMID:

11782272

73. Farjadian S, Ghaderi A. HLA class II genetic diversity in Arabs and Jews of Iran. Iranian Journal of

Immunology. 2007; 4(2): 85–93. https://doi.org/IJIv4i2A3 PMID: 17652848

74. Kollaee A, Ghaffarpor M, Ghlichnia HA, Ghaffari SH, Zamani M. The influence of the HLA-DRB1 and

HLA-DQB1 allele heterogeneity on disease risk and severity in Iranian patients with multiple sclerosis.

International Journal of Immunogenetics. 2012; 39(5): 414–22. https://doi.org/10.1111/j.1744-313X.

2012.01104.x PMID: 22404765

75. Sayad A, Akbari MT, Pajouhi M, Mostafavi F, Zamani M. The influence of the HLA-DRB, HLA-DQB

and polymorphic positions of the HLA-DRβ1 and HLA-DQβ1 molecules on risk of Iranian type 1 diabe- tes mellitus patients. International Journal of Immunogenetics. 2012; 39(5): 429–36. https://doi.org/

10.1111/j.1744-313X.2012.01116.x PMID: 22494469

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 22 / 24

76. Sulcebe G, Sanchez-Mazas A, Tiercy JM, Shyti E, Mone I, Ylli Z, et al. HLA allele and haplotype fre-

quencies in the Albanian population and their relationship with the other European populations. Inter-

national Journal of Immunogenetics. 2009; 36(6): 337–43. https://doi.org/10.1111/j.1744-313X.2009.

00868.x PMID: 19703234

77. Sanchez-Velasco P, Gomez-Casado E, Martinez-Laso J, Moscoso J, Zamora J, Lowy E, et al. HLA

alleles in isolated populations from North Spain: origin of the Basques and the ancient Iberians. Tissue

Antigens. 2003; 61(5): 384–92. https://doi.org/10.1034/j.1399-0039.2003.00041.x PMID: 12753657

78. Arnaiz-Villena A, Dimitroski K, Pacho A, Moscoso J, Gómez-Casado E, Silvera-Redondo C, et al.

HLA genes in Macedonians and the sub-Saharan origin of the Greeks. Tissue Antigens. 2001; 57

(2): 118–27. https://doi.org/10.1034/j.1399-0039.2001.057002118.x PMID: 11260506

79. Arnaiz-Villena A, Karin M, Bendikuze N, Gomez-Casado E, Moscoso J, Silvera C, et al. HLA alleles

and haplotypes in the Turkish population: relatedness to Kurds, Armenians and other Mediterraneans.

Tissue Antigens. 2001; 57(4): 308–17. https://doi.org/10.1034/j.1399-0039.2001.057004308.x PMID:

11380939

80. Muro M, Marı́n L, Torı́o A, Moya-Quiles MR, Minguela A, Rosique-Roman J, et al. HLA polymorphism

in the Murcia population (Spain): in the cradle of the archaeologic Iberians. Human Immunology. 2001;

62(9): 910–21. https://doi.org/10.1016/S0198-8859(01)00290-7 PMID: 11543893

81. Farjadian S, Ghaderi A. HLA class II similarities in Iranian Kurds and Azeris. International Journal of

Immunogenetics. 2007; 34(6): 457–63. https://doi.org/10.1111/j.1744-313X.2007.00723.x PMID:

18001303

82. Mohyuddin A, Ayub Q, Khaliq S, Mansoor A, Mazhar K, Rehman S, et al. HLA polymorphism in six eth-

nic groups from Pakistan. Tissue Antigens. 2002; 59(6): 492–501. https://doi.org/10.1034/j.1399-

0039.2002.590606.x PMID: 12445319

83. Agrawal S, Srivastava SK, Borkar M, Chaudhuri TK. Genetic affinities of north and northeastern popu-

lations of India: inference from HLA-based study. Tissue Antigens. 2008; 72(2): 120–30. https://doi.

org/10.1111/j.1399-0039.2008.01083.x PMID: 18721272

84. Rani R, Sood A, Goswami R. Molecular basis of predisposition to develop type 1 diabetes mellitus in

North Indians. Tissue Antigens. 2004; 64(2): 145–55. https://doi.org/10.1111/j.1399-0039.2004.

00246.x PMID: 15245369

85. Migot-Nabias F, Fajardy I, Danze PM, Everaere S, Mayombo J, Minh TN, et al. HLA class II polymor-

phism in a Gabonese Banzabi population. Tissue Antigens. 1999; 53(6): 580–5. https://doi.org/10.

1034/j.1399-0039.1999.530610.x PMID: 10395110

86. Arnaiz-Villena A, Muñiz E, Campos C, Gomez-Casado E, Tomasi S, Martı́nez-Quiles N, et al. Origin of Ancient Canary Islanders (Guanches): presence of Atlantic/Iberian HLA and Y chromosome genes

and Ancient Iberian language. International Journal of Modern Anthropology. 2015; 8: 67–93. https://

doi.org/10.4314/ijma.v1i8.4

87. Reil A, Bein G, Machulla HK, Sternberg B, Seyfarth M. High-resolution DNA typing in immunoglobulin

A deficiency confirms a positive association with DRB1*0301, DQB1*02 haplotypes. Tissue Antigens. 1997; 50(5): 501–6. https://doi.org/10.1111/j.1399-0039.1997.tb02906.x PMID: 9389325

88. Arnaiz-Villena A, Martı́nez-Laso J, Gómez-Casado E, Dı́az-Campos N, Santos P, Martinho A, et al.

Relatedness among Basques, Portuguese, Spaniards, and Algerians studied by HLA allelic frequen-

cies and haplotypes. Immunogenetics. 1997; 47(1): 37–43. PMID: 9382919

89. Stearns PN. The Encyclopedia of World History: Ancient, Medieval, and Modern, Chronologically

Arranged, 6 ed., Houghton Mifflin Harcourt, 2001, 2017, pp. 129–131.

90. Mira-Guardiola MA (2000). Cartago contra Roma. Ed.: Alderaban. Madrid, Spain.

91. Sellier J, Sellier A. Atlas des Peuples d’Orient. Paris, France: Editions La Decouverte, 1993

92. Arnaiz-Villena A, Martinez-Laso J, Alonso-Garciá J. The Correlation Between Languages and Genes:

The Usko-Mediterranean Peoples. Human Immunology. 2001; 62(9): 1051–1061. https://doi.org/10.

1016/S0198-8859(01)00300-7 PMID: 11543906

93. Arnaiz-Villena A, Gomez-Casado E, Martinez-Laso J. Population genetic relationships between Medi-

terranean populations determined by HLA allele distribution and a historic Perspective. Tissue Anti-

gens. 2002; 60(2): 111–21. https://doi.org/10.1034/j.1399-0039.2002.600201.x PMID: 12392505

94. Hachid M. Postface de l’ouvrage “aux origines de l’ecriture au Maroc. corpus des inscriptions ama-

zighes des sites d’art rupestre du maroc” edited by: Skounti A., Lemdjidi A. and Nami M. Publication

de l’institut royal de la culture amazighe. Cealpa, rabat, morocco, 2003.

95. Malika H. Les premier berebers entre mediterranee, tassili et nil. Edited by edisud. aix-en-provence,

France 2000

96. Carter RA. Boat remains and trade in Persian Gulf during the 6th and 5th millenia BC. Antiquity. 2006;

80(307): 52–63. https://doi.org/10.1017/S0003598X0009325X

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 23 / 24

97. Kuhrt A. The ancient Near East (3000–330 BC). Vol II. Barcelona, Editorial Critica, 2001.

98. Hitti PK. History of Syria: Including Lebanon and Palestine, 2004, p26

99. Encyclopaedia Britannica: https://www.britannica.com/

100. Sartre M, D’Alexandre à Zénobie: Histoire du Levant antique, IVe siècle avant Jésus-Christ-IIIe siècle après Jésus-Christ, Fayard, 2001.

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 24 / 24

Genetic heterogeneity of Arab pop-HLA gene-2018.pdf

RESEARCH ARTICLE

The genetic heterogeneity of Arab

populations as inferred from HLA genes

Abdelhafidh Hajjej 1*, Wassim Y. Almawi2¤, Antonio Arnaiz-Villena3, Lasmar Hattab4,

Slama Hmida 1

1 Department of Immunogenetics, National Blood Transfusion Center, Tunis, Tunisia, 2 Department of

Medicine, Harvard Medical School, Boston, MA, United States of America, 3 Department of Immunology,

University Complutense, School of Medicine, Madrid Regional Blood Center, Madrid, Spain, 4 Department of

Medical Analysis, Hospital of Gabes (Ghannouch), Gabes, Tunisia

¤ Current address: School of Pharmacy, Lebanese American University, Byblos, Lebanon * [email protected]

Abstract

This is the first genetic anthropology study on Arabs in MENA (Middle East and North Africa)

region. The present meta-analysis included 100 populations from 36 Arab and non-Arab com-

munities, comprising 16,006 individuals, and evaluates the genetic profile of Arabs using HLA

class I (A, B) and class II (DRB1, DQB1) genes. A total of 56 Arab populations comprising

10,283 individuals were selected from several databases, and were compared with 44 Mediter-

ranean, Asian, and sub-Saharan populations. The most frequent alleles in Arabs are A*01, A*02, B*35, B*51, DRB1*03:01, DRB1*07:01, DQB1*02:01, and DQB1*03:01, while DRB1*03:01-DQB1*02:01 and DRB1*07:01-DQB1*02:02 are the most frequent class II hap- lotypes. Dendrograms, correspondence analyses, genetic distances, and haplotype analysis

indicate that Arabs could be stratified into four groups. The first consists of North Africans

(Algerians, Tunisians, Moroccans, and Libyans), and the first Arabian Peninsula cluster (Sau-

dis, Kuwaitis, and Yemenis), who appear to be related to Western Mediterraneans, including

Iberians; this might be explained for a massive migration into these areas when Sahara under-

went a relatively rapid desiccation, starting about 10,000 years BC. The second includes Levan-

tine Arabs (Palestinians, Jordanians, Lebanese, and Syrians), along with Iraqi and Egyptians,

who are related to Eastern Mediterraneans. The third comprises Sudanese and Comorians,

who tend to cluster with Sub-Saharans. The fourth comprises the second Arabian Peninsula

cluster, made up of Omanis, Emiratis, and Bahrainis. It is noteworthy that the two large minori-

ties (Berbers and Kurds) are indigenous (autochthonous), and are not genetically different from

“host” and neighboring populations. In conclusion, this study confirmed high genetic heteroge-

neity among present-day Arabs, and especially those of the Arabian Peninsula.

Introduction

The human leukocyte antigens (HLA) system plays a key role in self-nonself recognition, and is divided into class I (HLA-A, -B, and -C) and class II (HLA-DP, -DQ, and -DR) loci, and com- prises 220 genes in a 3.6 Mb region found on the short arm of chromosome 6. HLA system is highly polymorphic, and in excess of 17,000 alleles were detected. For example, there are 4,828

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 1 / 24

a1111111111

a1111111111

a1111111111

a1111111111

a1111111111

OPEN ACCESS

Citation: Hajjej A, Almawi WY, Arnaiz-Villena A,

Hattab L, Hmida S (2018) The genetic

heterogeneity of Arab populations as inferred from

HLA genes. PLoS ONE 13(3): e0192269. https://

doi.org/10.1371/journal.pone.0192269

Editor: Amr H Sawalha, University of Michigan,

UNITED STATES

Received: November 6, 2017

Accepted: January 19, 2018

Published: March 9, 2018

Copyright: © 2018 Hajjej et al. This is an open access article distributed under the terms of the

Creative Commons Attribution License, which

permits unrestricted use, distribution, and

reproduction in any medium, provided the original

author and source are credited.

Data Availability Statement: All relevant data are

within the paper and its Supporting Information

files.

Funding: The authors received no specific funding

for this work.

Competing interests: The authors have declared

that no competing interests exist.

B, 3,968 A, and 3,579 C class I alleles, compared with 2,103 DRB1, and 1,142 DQB1 class II alleles. Several HLA alleles were associated with various auto-immune and infectious diseases [1]. HLA class I and class II loci are characterized by high (80–90%) heterozygosity, and thus constitute reliable genetic markers for phylogenetic study, and thus are useful for anthropolog-

ical studies.

Population studies confirmed varied frequencies of HLA alleles and haplotypes according to ethnicity and geographic origin. Given the codominant nature of the expression of HLA markers, this enables distinguishing between heterozygotes from homozygotes, hence allowing

assignment of genotypes and allele frequencies [2]. Linkage disequilibrium (LD) analysis

between HLA alleles identified the number of generations in-between two closely related pop- ulations from the time of their separation. Diversity in haplotype distribution, allele frequency,

and LD analysis reflect the extent of variation between closely related populations. Allele fre-

quency-based genetic distance analysis allows for construction of phylogenetic tree (Dendro-

grams), so as to infer relative estimate of the time that elapsed since the populations existed as

single cohesive units [3–6].

Arabs are a major panethnic group, and their union, Arab League, is a cultural and ethnic

union of 22 member states. As of 2013, nationals of the Arab League countries are 357 mil-

lions, who populate an area of 13 million km 2 , straddling Africa and Asia [7]. Ethnic, religious,

and linguistic diversity (triple heterogeneity) characterize Arabs. Most Arabs follow Islam, and

Christianity is the second largest religion, with over 15 million Christians. There are also

smaller but significant religious minorities (as Druze, Jews), and a number of non-Arab ethnic

minorities (as Berbers, Kurds) [7, 8].

The history of Arabs extends from circa 1200 BC when Southern Arabian Peninsula

was ruled by three successive civilizations: Mineans, who established their capital Karna

(1200–650 BC), Sabeans in Marib (1000 BC—570 AD), and the Himyarite (2nd-6th centu-

ries AD) in Dhafar (Oman) [9–11]. These civilizations were built by authentic Yemeni

tribes. The kingdom of Kinda was established in Central Arabia in 4th-early 6th century

AD, while Dilmun civilization was founded in Eastern Arabia. In 3rd century AD, East

African Kingdom of Aksum extended into Yemen and Western Saudi Arabia [12]. In

addition, the Lakhmids (Yemeni origin), established a dynasty which ruled part of pres-

ent-day Iraq and Syria in 300–602 AD [10, 13, 14]. The Arab Christian Ghassanids

(220–638 AD), originating from Southern Arabia, migrated in 3rd century to Jordan,

where they established their kingdom that extended from Syria to Yathrib (Saudi Arabia)

[12.13]. Islam was introduced in 610 AD to Arabian Peninsula. Shortly thereafter, Arabian

tribes were united as a single Islamic state in the Arabian Peninsula, which was spear-

headed by the Islamic prophet Muhammad. This Islamic state progressively grew in area,

and in types and numbers of populations, and extended from Andalusia (Spain) to the

west, to Indus in the east [14].

Subsequent spread of Islam involved swift invasion of Persia (637-651AD), Iraq, Levant,

and Egypt (639 AD), which extended into North Africa (640–709), and to Spain, Portugal, and

France (Poitiers) in 8 th

century AD. Eastwards, Arab expansion to Central Asia, Bukhara

(Uzbekistan), Afghanistan (637–709), and the Indus border (664–712) followed. Northwards,

Arab invaders were in contact with the Byzantine Empire, and the Caspian and Caucasus to

the north [15, 16]. With the Islamic expansion from 7th century, social and political groups

were gradually Arabized. The spreading of Arab-Muslim culture was at the expense of local

languages (as Berber, Kurdish), especially in Middle East and North Africa, resulting in the

Arabized population speaking variants of Arabic, mixed with original languages (dialect). The extent of gene Arab exchange with these autochthonous groups is undetermined but is thought

to be lower than religious/cultural influence.

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 2 / 24

Given the large number of conquests, Arabs were in contact with different ethnicities resid-

ing on a vast area stretching from Mauritania (West Africa) to the western China border (East

Asia). This suggests that cultural and perhaps genetic relationships were established with these

ethnic groups. This work aims to study the HLA distribution in North African and Oriental Arab populations, and compare them to neighboring populations (Sub-Saharans Africans,

Europeans, and Asians).

Populations and methods

Search strategy

Datasets of HLA allele frequencies were collected from a systematic review performed per Pre- ferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) criteria [Only

the criteria from 1–10, 17, and 26 are applicable to this type of study (S1 Checklist)] [17].

PubMed, ScienceDirect, AlleleFrequencies.net, and ResearchGate databases were searched for

all papers on HLA polymorphism, and HLA disease associations in Arabs. This systematic lit- erature search covering published papers up to May 31, 2017 was conducted by two investiga-

tors (H.A and H.L); the search terms used were: ‘HLA Arabs’, or ‘Human Leukocyte Antigen Arabs’. A search per country followed: ‘HLA Tunisians’, ‘HLA Saudis’, and so on. This was repeated for remaining countries, which resulted in excess of 50 keywords used. A database

from International Histocompatibility Workshops was also used. Some authors were also con-

tacted by e-mail, or through ResearchGate, requesting information and missing data. While

most datasets were taken from studies with an explicit anthropological focus, control groups

from case-control disease studies were also used. There was no language restriction used for

this search.

Inclusion and exclusion criteria

All included studies met the following criteria. HLA allele frequencies must be obtained by molecular typing, and that subjects should be typed for at least one of the following: HLA-A, HLA-B, HLA-DRB1, and HLA-DQB1. Publications were excluded in case of serological data; sample size less than 35 individuals, typed individuals (or controls) were either related and not

randomly selected, presentation of duplicate data sets. Studies were also excluded if they pre-

sented incomplete/partial allele frequencies, or there were significant ambiguities in the typing.

Data extraction

Studies were independently selected by two authors (H.A and H.L). An external referee was

invited in case of disagreements not resolved by both reviewers. Data extracted from selected

papers included publication year, study type (anthropology, association), sample size, HLA-A, -B, -C, -DRB1, and -DQB1 allele frequencies, haplotype frequencies, region, country, and typed loci.

Statistical analysis

A three-dimensional correspondence analysis and bi-dimensional representation were per-

formed using VISTA V5.02 software [18]. Phylogenetic trees were constructed based on allele

frequencies using the Neighbor-Joining (NJ) method [19], and standard genetic distances

(SGD) [20], using DISPAN software containing GNKDST and TREEVIEW software [21, 22].

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 3 / 24

Results

Study flow

The use of more than fifty key words allowed identification of 5,456 papers and HLA datasets, of which 315 were deemed relevant to the study. Of these, 42 articles and 11 HLA datasets con- taining information on 56 Arab populations, and meeting the study criteria, were included.

The study flow is illustrated in Fig 1. In addition, 20 articles and 18 HLA datasets which meet the criteria of this study, containing complete information on 44 other populations were

selected, but without going through systematic review. The populations used in the compari-

son were chosen mainly from neighboring Arab countries. This study relied on a database con-

sisting of 100 populations (of which data of 11 populations were extracted from association

studies) from 36 countries Arab and worldwide countries, and belonging to Asia, Europe, and

Africa. The distribution of populations by region is illustrated in Fig 2A. These populations

represent allele frequency data for 16,006 individuals (160.06 individuals/population), and

from 63 references.

Selected populations

Arab populations. The 42 articles and 11 HLA datasets (http://www.allelefrequencies.net) selected provided information on 56 populations (Table 1), comprising 10,283 individuals

[23–67]. The 56 different ethnic and religious populations were selected from 18 Arab coun-

tries. There were no reliable HLA data for the remaining countries (Somalia, Djibouti, Mauri- tania, and Qatar) (Fig 2B). The studied populations are divided into 29 African (26 North

Africans and 3 Sub-Saharans), and 27 Asian populations (13 Levantines, and 14 Arabian Pen-

insula). With the exception of 8 populations [28, 38, 47, 48, 50, 52, 53, 55], where HLA data were extracted from association studies, the 50-remaining studies were extracted from anthro-

pological ones.

Neighboring populations. Forty-four worldwide populations [23, 34, 39, 66, 68–85] com-

prising 5,723 individuals, were selected from 18 countries in three continents, using the same

criteria previously described (Table 2). These comprised 22 European, 11 non-Arab Asian,

and 11 Sub-Saharan African populations. Of the 11 Asian populations, there were two Arab

minorities living in Iran (Khuzestan and Famoori).

Data of only three populations [74, 75, 84] were extracted from association studies. These

populations were typed for at least HLA-A, -B, -DRB1, or DQB1.

HLA allele frequencies features of Arab populations

Table 3 shows the most frequent HLA-A and -B alleles in Arab populations. A�02 was the most prevalent allele, and its frequency exceeded 25% in some populations, such as Saudis

(30.4%) [23], Tunisian Berbers of Zrawa (29.3%) [24], Moroccans (26.2%) [25], and Suda-

nese (25.9%) [23]. A�01, �03, �24, �30, and �68 alleles were also common in most Arab popu- lations. For example, the highest frequency of A�01 was seen in Tunisians (15%) [26] and Moroccans (14.8%) [25], while A�03 was prevalent among Iraqi Kurds (15.1%) [23], and A�30 was prevalent among Sudanese (17.6%) [23]. In addition, A�24 was common among Lebanese-Armenians (17.3%) [27], while A�68 was prevalent in Saudis (10.5%) [28]. In con- trast, A�25, �28, �34, �36, �43, �66, �69, �74, and �80 are rare among Arabs. It is noteworthy that A�34, described as rare allele among Arabs, is found at a high frequency (22.2%) in Tunisian Berbers from Zrawa [24], the highest reported for any population worldwide.

Results of HLA-B locus are presented in Table 3. B�35 was the most frequent B� allele in Palestinians (20.3%) [29] and Lebanese-Armenians (19.8%) [27]. B�35 was found at varied

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 4 / 24

frequencies in Iraqi Kurds (15.6%) [23], Omanis (15.3%) [30], Jordanians (14.9%) [31], and

Arab Emirati (11.1%) [23] populations. B�51 was the second most frequent allele, and high fre- quencies were recorded for Saudis (19.3%) [23], Omanis (17.5%) [30], and Arab Emirati

(15.6%) [23] populations. B�50 was also a frequent B� allele in most Arabs, including Saudis (18.8%) [23], and Libyans (16.1%) [31], along with B�08, and B�44 among the Tunisian Berbers of Zrawa (32.8%) [24], the latter being the highest frequency worldwide. Similarly, the fre-

quency of B�27 is the highest among Jordanians (27.1%) [31]. In contrast, B�37, �42, �46, �47, �48, �54, �59, �67, and �78 alleles are extremely rare or virtually in all Arab populations.

The most common DRB1 and DQB1 alleles among Arabs are shown in Table 4. DRB1�07:01 was the most frequent allele among Tunisians from Ghannouch (28.6%) [33], Jor-

danians (26.9%) [31], and Saudis (26.6%) [23], while Egyptians (8.3%) and Sudanese had the

lowest frequencies of DRB1�07:01. DRB1�03:01 was the second most frequent DRB1� allele in some Arabs, such as Tunisians of Tunis (21.9%) [34] and Moroccans of Metelsa (20.2%) [23],

Fig 1. Flow diagram of the study selection process.

https://doi.org/10.1371/journal.pone.0192269.g001

Fig 2. The distribution of studied populations by region (A) and country (B).

https://doi.org/10.1371/journal.pone.0192269.g002

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 5 / 24

but rare in Jordanians (2.4%) [31]. DRB1�11:01 was also frequent among some Arabs, such as Lebanese (36.8%) [35], but rare among Saudis (4.8%) and Moroccans of Chayoua (2.5%) [23].

Furthermore, DRB1�13:01, �13:02, and �15:01 alleles are relatively frequent among Arabs. High frequency of DRB1�13:01 were recorded for Sudanese (23.3%), while DRB1�13:02 was virtually absent in Bahraini [35] and Sudanese [23]. All DRB1�09, �12, and �14 subtypes are extremely rare among Arabs. In addition, DRB1�16 subtypes are rare in all Arab populations except for Bahrain, where DRB1�16:01 is found at a high frequency (13.9%) [35].Haut du formulaire.

DQB1�02:0X and �03:01 alleles are the most frequent DQB1� in Arabs. The highest frequen- cies of DQB1�02:0X were reported for Tunisians (Ghannouch; 40.01%) [33], Yemenites-Jews (39.1%) [36], Moroccans (Agadir-Souss; 37.8%) [37] and Saudis (37.3%) [23], while the lowest

frequency was found in Egyptians (6%) [38]. On the other hand, DQB1�03:01 is very common among Lebanese (45%) [39] and Algerians (Oran; 35.1%) [23], but not Saudis (7.6%) [23].

DQB1�03:02 and �05:01 are also frequent in most Arabs, such as Tunisians (Ghannouch; 20.7%) [33], Jordanians (17.8%) [31], Palestinians (17.6%) [29] and Lebanese (16.8%) [35].

DQB1�05:01 is frequent among Bahrainis (29.2%) [35], Tunisians (Berbers of Jerba; 22.7%) [40], and Lebanese (20.5%) [35]. Among DQB1�06 subtypes, DQB1�06:02 and �06:03 were the most frequent in most Arab populations, but absent in Bahrainis where DQB1�06:01 is very frequent (13.20%) [35]. Furthermore, all DQB1�04 subtypes are rare among Arabs, particularly

Table 1. List of Arab populations used in the present work.

N o

Populations Symbols Size References N o

Populations Symbols Size References

1 Algiers Alg 102 [67] 29 Comorians Com 117 [43]

2 Algerians-B Alg-B 97 [23] 30 Jordanians Jor 146 [31]

3 Algerians-A Alg-A 132 [48] 31 Jordanians-A Jor-A 1254 [46]

4 Algerians-Oran Ora 100 [23] 32 Syrians Syr 200 [47]

5 Gabesians Gab 77 [59] 33 Syrians-A Syr-A 225 [58]

6 Gabesians-A Gab-A 96 [40] 34 Lebanese Leb 95 [35]

7 Ghannouchians Gha 82 [33] 35 Lebanese-A Leb-A 1123 [45]

8 Berbers-Jerba Ber-J 55 [40] 36 Lebanese-B Leb-B 191 [44]

9 Berbers-Matmata Ber-M 81 [40] 37 Lebanese-Armen Leb-Ar 368 [27]

10 Berbers-Zrawa Ber-Z 70 [24] 38 Lebanese-KZ Leb-Kz 93 [39]

11 Tunisians Tun 376 [61] 39 Lebanese-NS Leb-Ns 59 [39]

12 Tunisians-A Tun-A 80 [60] 40 Lebanese-Yohmor Leb-Y 75 [39]

13 Tunisians-B Tun-B 101 [34] 41 Palestinians Pal 165 [29]

14 Tunisians-C Tun-C 100 [63] 42 Palestinians-A Pal-A 109 [36]

15 Tunisians-M Tun-M 123 [26] 43 Saudis Sau 105 [28]

16 Southern Tunisians Tun-S 250 [62] 44 Saudis-A Sau-A 213 [23]

17 Libyans Lib 118 [32] 45 Saudis-B Sau-B 158 [49]

18 Libyans-Jews Lib-J 119 [36] 46 Saudis-C Sau-C 499 [23]

19 Berbers-Metelsa Ber-Me 99 [64] 47 Saudis-D Sau-D 383 [50]

20 Moroccans Mor 96 [25] 48 Omanis-A Oma-A 259 [30] [51]

21 Moroccans-A Mor-A 110 [42] 49 Kuwaitis Kuw 212 [52]

22 Moroccans-Agadir Mor-Ag 98 [37] 50 Kuwaitis-A Kuw-A 114 [53]

23 Moroccans-Chaouya Mor-Ch 98 [65] 51 Bahrainis Bah 72 [35]

24 Moroccans-Jews Mor-J 94 [66] 52 Emiratis Emi 373 [23]

25 Egyptians Egy 101 [39] 53 Iraq kurds Ira-K 209 [54]

26 Egyptians-A Egy-A 121 [38] 54 Yemenite-Jews Yem-J 76 [36]

27 Sudanese Sud 200 [23] 55 Yemen-sana’a Yem 50 [55]

28 Sudanese-Nuba Sud-N 46 [23] 56 Omanis Oma 118 [56] [57]

https://doi.org/10.1371/journal.pone.0192269.t001

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 6 / 24

DQB1�04:01 which is virtually absent, except in Egyptians (10.17%) [38]. The most common DQB1�04 subtype in Arabs is DQB1�04:02.

Allelic comparison between Tunisians and other populations

Allelic comparisons were done at Neighbor-Joining, correspondence analysis, and standard

genetic distances. Analyses were performed with Class I and Class II markers, and at generic

and high-resolution levels to make the most of available data, and seeing that some of the pop-

ulations included in these comparisons lack high-resolution data.

Neighbor-joining dendrograms. Comparison at the generic level was made using genetic

distances based on DRB1� and DQB1� allelic frequencies. Four groups can be interpreted from Fig 3. The first group comprises North African Arabs (Tunisians, Algerians, Moroccans, Liby-

ans), Western Mediterranean Europeans (Iberians, French), Arabian Peninsula Arabs (Saudis,

Kuwaitis, Yemenis), and Arab minority of Iran (Khuzestani). The second group is formed by

Eastern Mediterranean Europeans (Greeks, Cretans, Albanians, Turks, Macedonians), Italians,

Levant Arabs (Palestinians, Lebanese, Syrians), Iraqi-Kurds, Tunisian Berbers (Djerba), and

Iranians. The third group comprises Sub-Saharan Africans (Fulani, Mossi, Rimaibe, Bubi,

Mandenka, and Senegalese). Omanis, Bahrainis, Egyptians, and Sudanese form a heteroge-

neous group containing Asians and Sub-Saharan Africans. Similar results but with notable dif-

ferences, were observed in dendrograms built with standards genetic distances (SGD) based

on generic DRB1(S1 Fig) and generic B loci (S2 Fig). Correspondence analysis. High-resolution DRB1 correspondence analysis (Fig 4) dem-

onstrated the clustering of the studied populations into three groups. The first containing

North Africans (Tunisians, Algerians, Moroccans, and Libyans), Iberians (Basques, Spaniards,

Table 2. Worldwide populations included in the meta-analysis.

N o

Populations Symbols Size References N o

Populations Symbols Size References

1 Spaniards Spa 176 [41] 23 Mossi Mos 42 [39]

2 Portuguese Por 118 [39] 24 Mandenka Mad 200 [39]

3 Murcians Mur 173 [80] 25 Amhara Amh 98 [39]

4 Italians Ita 284 [68] 26 Bubi Bub 101 [39]

5 Basques-A Bas-A 82 [41] 27 Congolese Con 85 [72]

6 Basques-Arratia Bas-Ar 83 [77] 28 Fulani Ful 38 [39]

7 Basques-B Bas-B 99 [70] 29 Gabonese Gab 167 [85]

8 French Fre 179 [68] 30 Nigerians Nig 258 [23]

9 French-Rennes Fre-R 200 [34] 31 Oromo Oro 83 [39]

10 Balearic Bal 90 [71] 32 Rimaibe Rim 39 [39]

11 Corsica Cor 100 [71] 33 Senegalese Sen 177 [39]

12 Sardinians Sar 91 [68] 34 Famoori Arabs Fam 84 [73]

13 Ashkenazi-Jews Ash-J 132 [66] 35 India-Northeast Ind-N 188 [83]

14 Greeks-A Gre-A 96 [39] 36 Indians-Delhi Ind-D 112 [84]

15 Greeks-B Gre-B 101 [39] 37 Iranian-Jews Ira-J 91 [73]

16 Greeks-C Gre-C 98 [39] 38 Iranians Ira 120 [74]

17 Greeks-D Gre-D 242 [23] 39 Iranians-A Ira-A 100 [75]

18 Macedonians Mac 172 [78] 40 Iranians-Azeri Ira-Az 100 [81]

19 Turks Tur 250 [23] 41 Iranians-Kurd Ira-k 100 [81]

20 Turks-A Tur-A 228 [79] 42 Khuzestani Arabs Khu 50 [73]

21 Albanians Alb 160 [76] 43 Pakistanis-Pathan Pak-P 100 [82]

22 Cretans Cre 135 [69] 44 Pakistanis-Sindh Pak-S 101 [82]

https://doi.org/10.1371/journal.pone.0192269.t002

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 7 / 24

Portuguese, Murcians), French, Saudis, Yeminis-Jews, and Khuzestani Arabs. The second con-

tains Eastern Mediterraneans (Greeks, Cretans, Lebanese, Palestinians, and Macedonians),

Berbers of Djerba, Italians, Iraqi-Kurds, Iranians, Egyptians, Ashkenazi-Jews, and Moroccan-

Jews. The last cluster consists of Sub-Saharan populations. It should be noted that Jordanians,

Bahrainis, and Sudanese were outside these main groups. Similarly, correspondence analysis

using class I (A and B) identified three main clusters (Fig 5). The first cluster contained all Sub-Saharan Africans along with Sudanese. The second cluster contains Eastern Mediterra-

nean populations (Albanians, Greeks, Cretans, Lebanese, Palestinians, and Macedonians), Ital-

ians, Iraqi-Kurds, Ashkenazi-Jews, and Jordanians-A. The last cluster includes North Africans

(Tunisians, Algerians, Moroccans, and Libyans), Iberians (Basques, Spaniards), French, and

Saudis.

Correspondence analysis based on generic DRB1 data, and using only Arab populations shows that Arabs can cluster into four groups (Fig 6). The first contains the North Africans

(Tunisians, Algerians, Moroccans, and Libyans), Saudis, Yemenis, Kuwaitis, and Khuzestanis

(Iranian Arabs). The second cluster includes the Arabs of Levant (Palestinians, Jordanians,

Lebanese, Syrians), Egyptians, Iraqi Kurds, and Moroccans Jews. The third group consists of

Table 3. Most frequent HLA-A� and–B� alleles in Arab populations.

HLA-A A�01 A�02 A�03 A�24 A�30 A�68 Population % Population % Population % Population % Population % Population %

Tun-M 15.0 Sau-D 30.4 Ira-k 15.1 Leb-Ar 17.3 Sud 17.6 Sau 10.5

Mor 14.8 Ber-Z 29.3 Leb-Ar 14.0 Gha 15.2 Mor-C 13.0 Tun-M 09.4

Jor-A 14.7 Mor 26.2 Pal 10.7 Ira-k 13.9 Tun-A 11.8 Mor 09.3

Ira-k 13.2 sud 25.9 Lib 10.3 Sau-B 13.3 Jor 11.5 Alg-K 08.6

Pal 12.5 Emi 25.2 Mor-A 10.0 Jor-A 10.7 Alg-K 10.2 sud 08.5

Leb-A 12.2 Oma 24.9 Alg-K 09.3 Pal 10.1 Sau-B 10.2 Emi 08.4

Sau-A 12.2 Alg 24.6 Jor-A 09.1 Alg 09.4 Pal 08.4 Lib 08.2

Alg 11.9 Lib 23.5 Emi 09.1 Lib 09.3 Oma-A 07.5 Jor 07.6

Lib 11.5 Jor-A 22.0 Sau-A 08.9 Mor 07.3 Leb-A 06.7 Oma-A 07.1

Oma 07.2 pal 20.5 Gab 07.7 Oma 06.3 Lib 06.4 Leb-A 05.1

Sud 06.5 Leb-A 18.7 Sud 07.1 Sud 06.1 Emi 05.0 Ira-k 03.8

Emi 06.2 Ira-k 17.0 Oma 06.4 Emi 05.2 Ira-k 03.8 Pal 03.6

HLA-B B�07 B�08 B�35 B�44 B�50 B�51 Population % Population % Population % Population % Population % Population %

Jor 27.1 Oma 11.0 Pal 20.3 Ber-Z 32.8 Sau-D 18.8 Sau-C 19.3

Sau-A 11.7 sau-B 10.1 Leb-Ar 19.8 Ira-k 10.3 Lib 16.1 Oma 17.5

Mor 09.0 Emi 08.6 Ira-k 15.6 Mor-C 10.2 Ber-Z 15.7 Emi 156

Lib 07.7 Gha 08.5 Oma-A 15.3 pal 09.6 Tun-S 14.2 Ira-K 15.6

Tun-A 07.5 Ira-k 07.2 Jor-A 14.9 Alg 08.8 Mor-C 12.5 Gha 12.2

Alg-k 07.1 Lib 06.4 Emir 11.1 Leb-Ar 08.4 Emi 09.4 Leb-Ar 12.1

Leb-Ar 04.5 Mor-C 06.2 Alg 10.3 Lib 07.6 Jor-A 06.4 Lib 11.1

Ira-k 04.1 Jor 04.7 Lib 10.1 Jor-A 05.6 Pal 05.8 Jor-A 10.3

Oma-A 03.1 Sud 04.0 Tun-M 09.8 Sau-D 03.5 Leb-Ar 05.2 Sud 07.8

Sud 02.8 Alg 03.5 Sau 08.6 Sud 02.3 Alg 05.1 Mor 07.4

Emi 02.4 Leb-Ar 03.0 Mor-C 06.9 Emi 02.3 Oma-A 04.2 Pal 06.4

Pal 01.8 Pal 02.7 sud 06.1 Oma-A 02.1 Sud 02.5 Alg-k 04.7

Only one population per country is illustrated; the frequencies are ranked from highest to lowest for each allele; to identify the population and country see Table 1

https://doi.org/10.1371/journal.pone.0192269.t003

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 8 / 24

Table 4. Most frequent HLA-DRB1� and–DQB1� alleles in Arab populations.

HLA-DRB1� 03:01 07:01 11:01 13:01 13:02 15:01 Population % Population % Population % Population % Population % Population %

Tun-B 21.9 Gha 28.6 Leb 36.8 Sud 23.3 Mor-Me 11.1 Alg-B 13.4

Mor-Me 20.2 Jor 26.9 Bah 16.0 Sau-A 10.6 Lib 09.3 Mor-c 12.6

Sau-B 16.5 Sau-B 26.6 Egy-A 13.2 Ber-M 08.0 Sau-A 08.9 Ber-Z 11.4

Ora 15.1 Yem-J 22.1 Gab-A 11.2 Leb-B 06.8 Egy-A 07.4 Jor 09.0

Bah 13.9 Mor-Ag 20.5 Pal 10.0 Alg-B 05.6 Tun-C 06.7 Sau-A 08.9

Sud 13.8 Lib-Y 19.6 Ora 08.6 Lib 05.5 Leb-N 05.0 Bah 07.6

Lib 13.6 Lib 17.0 Sud 08.3 Yem-J 05.4 Ora 04.5 Leb 04.7

Yem-J 12.0 Alg-B 15.9 Jor 08.3 Egy-A 04.6 Yem-J 04.0 Lib 04.2

Leb-B 09.6 Pal 12.7 Lib 05.1 Mor-Me 03.5 Pal 03.9 Pal 03.6

Pal 07.6 Bah 09.0 Sau-A 04.8 Jor 02.1 Jor 00.3 Sud 03.3

Egy-A 07.0 Egy-A 08.3 Yem-J 03.4 Bah 02.1 Sud 00.0 Egy-A 02.5

Jor 02.4 Sud 07.8 Mor-C 02.5 Pal 00.9 Bah 00.0 Yem-J 02.0

HLA-DQB1� 02:0X 03:01 03:02 05:01 06:02 06:03 Population % Population % Population % Population % Population % Population %

Gha 40.1 Leb-NS 45.0 Gha 20.7 Bah 29.2 Mor-C 12.9 Egy-A 10.2

Yem-J 39.1 Ora 35.1 Jor 17.8 Ber-J 22.7 Alg 12.8 Jor 08.3

Mor-Ag 37.8 Lib-J 29.6 Pal 17.6 Leb 20.5 Egy-A 12.7 Ber-J 07.8

Sau-B 37.3 Ber-J 27.4 Leb 16.8 Alg 13.9 Tun-A 12.6 Lib-J 07.4

Jor 35.9 Pal 26.7 Yem-J 14.2 Mor-C 12.3 Jor 10.7 Yem-J 06.1

Lib-J 33.3 Yem-J 19.1 Lib-J 13.0 Pal 11.8 Sau-B 05.1 Ora 04.3

Bah 25.7 Bah 16.0 Alg 12.3 Sau-B 10.1 Pal 04.2 Sau-B 04.1

ora 24.5 Mor-C 15.4 Mor-C 12.3 Jor 09.3 Leb-Y 03.7 Leb-Y 03.3

Pal 20.9 Egy 11.9 Bah 09.7 Egy-A 08.5 Yem-J 02.0 Mor-C 01.8

Leb-Y 20.0 Jor 10.0 Sau-B 08.9 Yem-J 06.1 Lib-J 00.8 Pal 01.2

Only one population per country is illustrated; the frequencies are ranked from highest to lowest for each allele; to identify the population and country see Table 1

https://doi.org/10.1371/journal.pone.0192269.t004

Fig 3. Neighbor-Joining dendrograms, based on Standard genetic distances (SGD), showing relatedness between

Arabs and other populations using generic HLA-DRB1� and -DQB1� allele frequencies data. Populations’ data were taken from references detailed in Tables 1 and 2. Bootstrap values from 1.000 replicates are shown.

https://doi.org/10.1371/journal.pone.0192269.g003

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 9 / 24

Bahrainis, Omanis, Emiratis and Famoori (Iranian Arab). The fourth is composed of Suda-

nese, Sudanese from Nuba, and Comorians.

Genetic distances. Table 5 illustrates standard genetic distances (SGD) between Arabs

and other populations, using generic DRB1� allele frequencies. North Africans and Iberians are the closest to Saudis. Moroccans (Agadir, 0.0024), Basques-Ar (0.0057), and Tunisians-S.

Fig 4. Correspondence analysis (bi-dimensional representation), based on the standard genetic distances, showing

the relationship between Arabs and other populations according to high resolution HLA-DRB1� allele frequencies data. Only individuals with defined DRB1� subtypes are considered. Populations data were taken from references detailed in Tables 1 and 2.

https://doi.org/10.1371/journal.pone.0192269.g004

Fig 5. Correspondence analysis (bi-dimensional representation), based on the standard genetic distances, showing

a global view of the relationship among Arabs and other populations according to generic HLA�-A and–B� allele frequencies data. Populations data were taken from references detailed in Tables 1 and 2.

https://doi.org/10.1371/journal.pone.0192269.g005

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 10 / 24

Syrians are genetically close to Eastern Mediterranean, as Cretans (-0.0001) and Lebanese

Armenians (0.0050), while Tunisians are closed to Western Mediterraneans as North Africans

and Iberians, and Saudis. The populations most related to Tunisians are the other Tunisian

populations (Gabesians, -0.0139), Moroccans (Agadir; -0.0080), and Algerians (-0.0055). Sub-

Saharans such as Congolese (0.0519) and Nigerians (0.0828), and Greeks (0.0836) showed the

closest genetic distances to Comorians. It is noteworthy that Arab minority in Khuzestan

(Iran) displayed close relatedness with North Africans [as Gabesians from Tunisia (-0.0086)

and Orans from Algeria], and Saudis (0.0231).

HLA Class I and Class II haplotype HLA-A-B haplotypes. HLA A-B haplotypic data are extremely rare in Arabs. The most

frequent A-B haplotypes in Arabs are shown in Table 6. A�02:01-B�50:01 (9.0%) and A�02:01-B�44:02/03 (7.5%) were the haplotypes with the highest frequencies in Berbers of Zrawa. Diversity in A-B haplotype frequencies are found among Arabs, hence demonstrat- ing comparable frequencies of A-B haplotype in Arab populations, which did not exceed 5.3% in Gabesians (Tunisia). For example, while A�34:02-B�08:01 and A�29:01-B�45:01 characterize Tunisians, A�01-B�57(02.9%), A�30-B�18 (01.50%), and A�33:01-B�14:01 (02.50%) characterize Algerians. Several haplotypes identified in Arabs were also seen in

other Mediterraneans. For example, A�32:01-B�40:02 was seen in Greeks (2%) [39] and Spaniards (0.5%) [41], while A�02:01-B�50:01 was seen in Italians (2%) [68], Portuguese (3%) [39], and Moroccan Jews (3%) [66]. A�24:02-B�08:01 (4.75%) and A�30:02-B�53:01 (3.48%) were only identified in Saudis.

HLA-DRB1-DQB1 haplotypes. The most frequent DRB1-DQB1 haplotypes with signifi- cant LD in Arabs are listed in Table 7. In general, class II haplotype frequencies are markedly

higher than those of class I haplotypes. DRB1�03:01-DQB1�02:01 haplotype was the most fre- quent DRB1-DQB1 haplotype in Arabs (Table 7), and its frequency ranging from 3.2% in Leba- nese to 16.60% in Tunisians. DRB1�03:01-DQB1�02:01 is a common class II haplotype in the

Fig 6. Correspondence analysis (bi-dimensional representation), based on the standard genetic distances, showing

the relationship between different Arab populations according to generic HLA-DRB1� allele frequencies data. Populations data were taken from references detailed in Tables 1 and 2.

https://doi.org/10.1371/journal.pone.0192269.g006

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 11 / 24

Table 5. The closest populations to Arabs using standard genetic distances (SGD) based on HLA-DRB1� alleles.

Saudis-B Emiratis Omanis-A Sudanese

Population SGD Population SGD Population SGD Population SGD

Moroccans-Ag 0.0024 Omanis-A 0.0411 Emirates 0.0411 Nigerians 0.0497

Basques-Ar 0.0057 Bahrain 0.0429 Sardinians 0.0939 Egyptians-A 0.0556

Tunisians-S 0.0124 Sardinians 0.0593 Bahrain 0.1327 Congolese 0.0594

Saudis-C 0.0160 Kuwaitis 0.0688 Kuwait 0.2014 Egyptians 0.0620

Ghanouchians 0.0203 Tunisians-B 0.1169 Famoori Arabs 0.2377 Mandenka 0.0908

Saudis 0.0258 Khuzestanis 0.1213 Macedonians 0.2461 Moroccans 0.0984

Tunisians 0.0272 Tunisians-A 0.1276 Tunisians-B 0.3071 Senegalese 0.1044

Kuwaitis-A 0.0312 Algerians-Oran 0.1371 Khuzestanis 0.3192 Bubi 0.1078

Khuzestanis 0.0349 Algerians-A 0.1407 Greeks-B 0.3197 Palestinians-A 0.1111

Spaniards 0.0354 Algerians-B 0.1612 Tunisians-A 0.3261 Pakistanis-S 0.1122

Saudis-D 0.0374 Algiers 0.1639 Kuwaitis-A 0.3544 Tunisians-A 0.1133

Gabesians 0.0377 Saudis-C 0.1746 Algerians-Oran 0.3600 Libyans 0.1197

Gabesians-A 0.0394 Macedonians 0.1756 Algerians-A 0.3639 Sudanese-Nuba 0.1234

Jordanians 0.0428 Gabesians 0.1820 Greeks-D 0.3657 Algerians-B 0.1315

Algerians-B 0.0433 Saudis-D 0.1820 Algerians-B 0.3867 Berbers-Matmata 0.1317

Basques-B 0.0449 Moroccans-Agadir 0.1830 Greeks-C 0.3927 Algerians-A 0.1407

Saudis-A 0.0450 Kuwaitis-A 0.1837 Turks 0.3944 Berbers-Zrawa 0.1409

Algerians-A 0.0497 Famoori Arabs 0.1894 Saudis-C 0.3984 Gabesians 0.1413

Tunisians-C 0.0533 Moroccans-A 0.1900 Algiers 0.4027 Jordanians-A 0.1434

Yemenite-J 0.0536 Gabesians-A 0.1908 Albanians 0.4034 Gabesians-A 0.1442

Khuzestanis Tunisians Syrians-A Comorians

Population SGD Population SGD Population SGD Population SGD

Gabesians -0.0086 Gabesians -0.0139 Cretans -0.0001 Congolese 0.0519

Orans -0.0074 Gabesians-A -0.0081 Lebanese-Ar 0.0050 Nigerians 0.0828

Gabesians-A -0.0025 Moroccans-Agadir -0.0080 Syrians 0.0076 Greeks-A 0.0836

Algerians-A -0.0015 Southern Tunisians -0.0062 Iranians-Kurd 0.0100 Gabonese 0.0904

Moroccans-Ag 0.0106 Algerians-A -0.0055 Lebanese-A 0.0149 Iranians-A 0.0947

Tunisians-S 0.0140 Moroccans-A 0.0010 Lebanese-Y 0.0151 Egyptians-A 0.1090

Tunisians 0.0161 Algerians-B 0.0019 Iranians 0.0159 Iranians 0.1184

Tunisians-C 0.0195 Berbers-Zrawa 0.0027 Lebanese-B 0.0161 Italians 0.1222

Yemenite-J 0.0217 Libyans 0.0028 Iranians-Azeri 0.0185 Iranians-Azeri 0.1394

Tunisians-M 0.0225 Algerians-Oran 0.0033 Turks 0.0192 Iranians-Kurd 0.1418

Saudis-C 0.0231 Tunisians-M 0.0038 Iraq kurdistan 0.0198 Albanians 0.1426

Spaniards 0.0291 Saudis-C 0.0061 Ashkenazi-Jews 0.0222 Turks 0.1428

Saudis 0.0324 Tunisians-C 0.0083 Iranians-A 0.0223 Syrians 0.1470

Saudis-B 0.0349 Algiers 0.0103 Palestinians-A 0.0228 Cretans 0.1483

Algerians-B 0.0353 Berbers-Matmata 0.0106 Italians 0.0241 Egyptians 0.1483

Tunisians-B 0.0422 Moroccans-Chaouya 0.0111 Turks-A 0.0288 Greeks-C 0.1487

Indians-Delhi 0.0454 Spaniards 0.0126 Lebanese 0.0320 Palestinians-A 0.1559

Algiers 0.0461 Moroccans 0.0144 Jordanians-A 0.0355 Iraq Kurdistan 0.1564

Basques-Ar 0.0471 Saudis-D 0.0159 Lebanese-KZ 0.0368 Greeks-D 0.1594

Libyans 0.0485 Khuzestani Arabs 0.0161 Greeks-A 0.0407 Syrians-A 0.1617

(0.0124) had the closest genetic distances from Saudis, while Emiratis were closely related to Omanis (0.0411), Bahrainis (0.0429), Sardinians (0.0593), and Kuwaitis

(0.0688). On the other hand, Sudanese are related to Sub-Saharans, including Nigerians (0.0497), Congolese (0.0594), and Egyptians (0.556).

https://doi.org/10.1371/journal.pone.0192269.t005

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 12 / 24

Mediterranean basin, and is frequent among Basques (17.5%) [41], Moroccans (17.3%) [25],

Algerians (11.3%) [67], and Cretans (7.4%) [69]. In addition, DRB1�07:01-DQB1�02:02 is also frequent in Arabs, such as Moroccans (16.70%), and is reportedly common in Spaniards

(17.3%) [41], and Moroccans (12.6%) [25], but rare in Southern Tunisians (2.10%) (Gabe-

sians). In addition, DRB1�07:01-DQB1�02:01 is also a common DRB1-DQB1 haplotype, and its frequency exceeds 4% in several Arab populations.

Table 6. Most frequent (%) HLA Class I (A-B) two-locus haplotypes with significant linkage disequilibrium (P<0.05) in Arabs.

A-B haplotype Tun Saudi-B Alg Mor-Ch Mor-a Ber-Z Lib Gab

01:01–50:01 - - - 04.10 - - - - 01–57 - - 02.90 - - - - - 02:01–07:02 - - - - - - 02.97 - 02:01–44:02/03 03.86 - - 02.10a 02.95c 07.50b - 05.26 02:01–50:01 03.30 - - - 01.99d 09.01 - - 02:01–51:01 - 04.66 - 03.40 01.62f - - - 23:01–50:01 - 04.90 - - - - 02.97 - 24:02–08:01 - 04.75 - - - - - - 29:01–45:01 01.79 - - - - - - 02.10 29:02–44:03 - - - 02.70 - - - - 30–18 - - 01.50 - 02.60 03.00 - - 30:02–53:01 - 03.48 - - - - - - 32:01–40:02 00.80 - - - - 05.66 - - 33:01–14:01 - - 02.50 - 01.86e 01.41 - - 34:02–08:01 02.12 - - - - 06.11 - 02.10

a02:01–44:02. b02:01–44. c02-44. d02-50. e33-14. f02-51.

https://doi.org/10.1371/journal.pone.0192269.t006

Table 7. Most frequent (%) HLA Class II (DRB1-DQB1) two-locus haplotypes with significant linkage disequilibrium (P<0.05) in Arabs.

HLA-DRB1-DQB1 Tun Sau-B Mor-Ch Bah Leb Alg Lib-J Yem-J Ber-Z Ber-J

01:02–05:01 02.40 02.85 - - - 08.00 02.10 0.70 09.85 04.50 07:01–02:02 14.80 12.32 16.70 - - - 24.70a 22.10a 16.03 - 03:01–02:01 16.60 13.56 12.30 12.02 03.21 11.30 05.60a 12.00a 11.26 - 10:01–05:01 03.80 03.80 - 01.35 04.90 00.30 00.80 04.00 01.41 03.30 07:01–02:01 - - - 09.38 04.20 09.90 - - - 11.00 15:01–06:02 07.80 03.80 08.90 - - 09.90 - - 11.26 02.00 04:02–03:02 02.60 - 06.20 - - 04.20 03.00 07.50 05.15 - 13:01–06:03 02.40 - - - - 03.30 07.70 05.40 05.63 01.80 16:01–05:01 - - - 13.18 03.79 - - - - - 04:01–03:02 - - - 02.78 14.16 - - - - - 11:01–03:01 07.20b 02.22 - 11.98 31.42 04.70 09.30 03.40 07.00b 03.20

aDQB1�02 b11:01/04-03:01

https://doi.org/10.1371/journal.pone.0192269.t007

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 13 / 24

In addition, DRB1�16:01-DQB1�05:01 and DRB1�04:01-DQB1�03:02, rare in neighboring populations and Mediterraneans, were identified only in Lebanese and Bahraini Arabs. The

high frequency of DRB1�11:01-DQB1�03:01 haplotype (31.42%) among Lebanese is notewor- thy, since it is the highest in all populations studied, but rare in Saudi (2.2%). Furthermore,

DRB1�11:01/04-DQB1�03:01, identified in Arabs, is also frequent in Cretans (18.5%) [69] and Basques (3.1%)[41], while DRB1�01:02-DQB1�05:01 was seen in Spaniards (6.30%) [41]. Var- ied frequency of DRB1�13:01-DQB1�06:03 was also reported for Spaniards (13.23%) [86], Cre- tans (3.3%) [69], and Germans (10.8%) [87]. Likewise, DRB1�15:01-DQB1�06:02 was observed in Cretans (2.6%) [65], German population (25.2%) [87], and Southern Ireland (14.90%) [23].

HLA class I and class II extended haplotypes. Table 8 shows the most frequent extended haplotypes in Arab populations, and their likely origins. The systematic review did not reveal

haplotypes shared by Arab populations because of partial presentation of haplotypic data, dis-

parity in the level of typing resolution, variability of the studied loci, and lack of data. In addi-

tion, Arab populations share their frequent extended haplotypes with several European,

especially Mediterranean, and Asian populations (Table 8). Furthermore, the possible origins

of the most frequent extended haplotypes among Arabs are mainly European, Asian or

Autochthonous.

Table 8. The most frequent (%) HLA extended haplotypes in Arabs.

HLA Extended haplotypes Arab Populations [references] Possible origin A�02:01-B�50:01-DRB1�07:01-DQB1�02:02a Southern Tunisians (3.2%)[62], Berbers of Zrawa (8.12%) [24] Euro-Asiatic A�02:01–B�44– DRB1�04:02–DQB1�03:02b Berbers of Zrawa (6.5%)[24] Tunisians (0.6%) [61] Western European A�24:02-B�08:01-C�07:02-DRB1�03:01c Saudis (3.16%) [49] Euro-Asiatic A�23:01-B�50:01-C�06:02-DRB1�07:01 Saudis (3.16%) [49] Autochthonous A�33-C�8-B�14-DRB1�01:02-DQA1�01:01-DQB1�05:01d Algerians (1.5%) [88] Mediterranean A�30-C�5-B�18-DRB1�03:01-DQA1�05:01-DQB1�02:01e Algerians (1.5%) [88] Iberian-paleo-North

African

A�02:01-C�06:02-B�50:01-DRB1�07:01-DQA1�02:01-DQB1�02:02f Moroccans (2.9%) [65] Euro-Asiatic A�01:01-C�06:02-B�50:01-DRB1�03:01-DQA1�05:01-DQB1�02:01g Moroccans (2.9%) [65] Mediterranean A�30-B�07-DRB1�03-DQA1�05:01-DQB1�02:01h Jordanians (1.38%) [31] Euro-Asiatic A�1-B�8-DRB1�03-DQA1�05:01-DQB1�02:01i Jordanians (1.03%) [31] Pan-European A�02:01-B�50:01-DRB1�07:01j Libyans (4.24%) [32] Tunisians (1.8%) [60], and Ghannouch (2.5%)

[33].

North African

A�11:01-B�52:01-DRB1�15:02k Libyans (2.54%) [32]; Yemen Jews (0.93%) [23] Mediterranean A�69-B�49-DRB1�04:03-DQB1�03:02 Palestinians (2.4%) [29] Autochthonous A�24-B�18-DRB1�11:04-DQB1�03:01l Palestinians (1.8%) [29] Central-South-Eurasian

a present in Spaniards (1.2%) [41], Turks (1.3%) [79], Italians (0.5%) [68], and Moroccan Jews (2%) [66].

b also found in British (2.6%), Cornish (7.9%), Danes (2%) [39], Italians (0.9%) [68], Spaniards (0.6%) [41], Spanish Basques (1.9%), Pasiegos (3.3%), Cabuemigos (2.2%)

[77], and Portuguese (3.1%) [39]. c

present at low frequencies in the Euro-Asian minorities of Germany [23]. d

found in Armenians (0.031), Sardinians (0.027), French (0.014), Greeks (0.011), and Italians (0.007) [68]. e

also found in Sardinians (11.4%), and French-Basques (4.7%) [68]. f present also in Mongolians [68], Turks [79].

g found in Spaniards, Italians, and north Africans [65].

h present in Cornish (0.084), British (3.3%), and Danes (3.8%) [68].

i present in Basques (5%), Spaniards (3.4%) [41], Macedonians (4.9%) [78], Yugoslavians (7.7%), British (2.9%), and Germans (4.8%) [68].

j found in Poland Jews (1.15%); Ashkenazi Jews (0.92%) [23].

k present in Ashkenazi Jews (1.05%) [23].

l found in Armenians (2.1%) and Italians (0.7%) [23].

https://doi.org/10.1371/journal.pone.0192269.t008

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 14 / 24

Discussion

This meta-analysis is the first genetic anthropology study in MENA region, and included 100

populations from 36 Arab and neighbouring countries, and comprising in excess of 16,000

individuals. A main outcome of the study is the lack of striking differences in the distribution

of HLA alleles and haplotypes between North Africans and Arabian Peninsula populations. On the contrary, key differences were noted between Levant Arabs (Lebanese, Palestinians, Syr-

ians), and other Arab populations, highlighted by high frequencies of A�24, B�35, DRB1�11:01, DQB1�03:01, and DRB1�11:01-DQB1�03:01 haplotype in Levantine Arabs compared to other Arab populations. Class I haplotype frequencies are lower than Class II haplotypes, because of

weak LD between A and B loci, due to long physical distance between them, compared to DRB1 and DQB1 loci. The identification of shared haplotypes between Arabs and other Medi- terranean and Asian populations is attributed to the higher admixture of Mediterraneans and

Asians in Arab populations.

Iberians, North Africans, and Arabian Peninsula inhabitants

The relatedness between North Africans and Iberians was previously discussed [29, 59–62, 69,

78, 79, 86, 88]. Using correspondence analysis, NJ trees and genetic distances, our results show

that North Africans are genetically close to Iberians, which is supported by historical events.

First, this relatedness is attributed to the Berber migration from the African Sahara northwards

in 10000–4000 BC, because of hyper-arid conditions [69]. It may also be explained by the simi-

lar history between Iberians and North Africans, both of whom were invaded by Phoenicians,

Romans, Germans, Muslim Arabs [89]; the respective invading armies had a mixed genetic

complexity; indeed, most of them were mercenaries recruited in recent conquests like in the

case of Phoenicians [90] and Muslim who invaded Iberia had troops that were mostly Berbers.

The invasion of Iberia by Muslims in the 8th century AD may have had a role in the related-

ness between North Africans and Iberians for two reasons: first, most Muslim invaders recruits

were North African Berbers, and the second is explained by the 8 centuries period of settle-

ment of the Muslims in Iberia, although more ancient and continuous gene exchange since

prehistoric times between Iberia and North Africa may have been induced the main exchange

[86]; massive mixed marriages and breeding across religious Iberian groups under Muslim

rule is not documented.

The analyses performed showed that current North Africans are closely related to Tunisian

(Zrawa and Matmata) and Moroccan (Sousse-Agadir and Eljadida) Berbers, suggesting that

North Africans have a genetic Berber profile. On the contrary, North Africans displayed a

greater distance from the Arabs of Levant (Palestinians, Syrians, Lebanese, and Jordanians),

indicating low genetic contribution of Phoenician and Levant Arab invasion of North Africa.

These observations based on HLA markers prompted the conclusion that all Berbers of North Africa constitute a homogeneous genetic unit, except for small isolates, such as the Berbers of

Djerba, who display a Berber genetic profile.

Saudi populations used in this study originated from Eastern Saudi Arabia, especially from

Riyadh province. There is no reliable HLA data on Eastern Saudi Arabia that shed light on pre- Islamic history; some ancient people may have originated from old Persians, but quantification

is difficult and undetermined [91]. The genetic heterogeneity between Eastern and Western

Saudi Arabia is very possible, and should be taken into account in further interpretation. All

analyses performed here, using HLA-A,-B, -DRB1, and DQB1 markers support the notion that Saudis along with the Kuwaitis and Yemenis are closely related to North Africans.

The most plausible explanation for West Arabia and Yemen clustering with Iberian/North

Africans is a possible important massive migration that occurred when Sahara underwent

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 15 / 24

desiccation in all directions [92, 93]. Cultural and language relatedness of many Mediterranean

languages, including old Iberian and Basque [92], with Berber language are concordant with our

genetic findings and Saharan origin hypothesis; also a part of Arabian Peninsula inhabitants

(including Yemen) may had been reached by Saharan people. In fact, Malika Hachid who has

been studying Saharan and North African Archaeology, culture and rock painting/writing of pre-

historic Sahara, even suggests that first known writing alphabet was originated in Sahara. Proto-

Berber writing rock characters have been used (very similar to present day used Berber scripts).

This Proto-Berber language could have appeared 5,000 years BC [94, 95].

Explanation to HLA Kuwait genetic similarity to this group seems more difficult to achieve but interaction between Arabian Peninsula and Mesopotamia through this strategic Kuwait

area is documented since 6,500 years BC (Ubard Period) [96].

Arabs of Levant

Using genetic distances, correspondence analysis and NJ trees, we showed earlier [61, 62] and

in this study that Palestinians, Syrians, Lebanese and Jordanians are closely related to each

other and to Eastern Mediterranean Europeans (Turks, Cretans, Greeks), Egyptians and Irani-

ans, and confirmed by HLA class I (A, B) and class II markers (DRB1 and DQB1) analysis. However, Levant Arabs are distant from North African Arabs (Tunisians, Algerians, Moroc-

cans and Libyans) and Iberians (Basques, Spaniards). The strong relatedness between Levant

Arab populations is explained by their common ancestry, the ancient Canaanites, who came

either from Africa or Arabian Peninsula via Egypt in 3300 BC [97], and settled in Levant low-

lands after collapse of Ghassulian civilization in 3800–3350 BC [98]. The relatedness is also

attributed to the close geographical proximity, which constituted one territory before 19th cen-

tury British and French colonization.

The close relatedness of Levant Arabs to Egyptians, as confirmed genetic distances using

HLA markers, may be due to three reasons. First, Egypt is a neighbor to Levant Arab countries, and historically part of the Levant. Second, the Egyptians invaded the Levant several times

throughout history; the most significant was 1468 BC invasion, where they settled for 12 centu-

ries [99]. Third, the Canaanites, the likely ancestors of Levant Arabs, may have originated

from Africa through Egypt, where they settled for a long period, suggesting likely admixture

between Canaanites and Egyptians.

Historically, Levant is a wider region that included countries along the Eastern Mediterra-

nean with its islands, and extended from Greece to Cyrenaica [100]. Broadly, Levant was his-

torically characterized by high migratory flow between its sub-regions in all directions. For

example, present-day Levant comprising Palestine, Lebanon, Syria, and Jordan has undergone

successive invasions by populations originating from the great Levant, including Egyptians

(1468 BC), Horites, Amorites, Hitites (Turks), Greeks (1200 BC), Assyrians (1090 BC) [99],

and more recently the Ottomans. This has favored admixture, reduced distances and homoge-

nized Great Levant populations, thus explaining the close relatedness of Levant Arabs to East-

ern Mediterranean populations. On the other hand, Levant Arabs are distant from Saudis,

Kuwaitis, and Yeminis, an indication that the contribution of the Arabian Peninsula popula-

tions to Levantine gene pool is low, probably due to the absence of the demographic aspect of

7th century invasion.

Sudanese and Comorians

Sudanese are close to sub-Saharan Africans (Nigerians, Congolese, and Senegalese), and North

Africans, in particular Egyptians, suggesting that the genetic profile of Sudanese is the admix-

ture between North Africans (especially Egyptians) and sub-Saharan Africans throughout

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 16 / 24

history. The close relatedness of Sudanese to sub-Saharan Africans suggests a reduced genetic

effect of Arabs on Sudanese. Also, the Comorians (Comoros islands officially joined League of

Arab Countries in 1993) are close to sub-Saharan Africans (Congolese, Nigerians, and Gabo-

nese) [43], Egyptians, Iranians, and Eastern Mediterranean. This suggests high admixture

between populations belonging to three continents in the Comoro Islands, and can be

explained by their geographical position as a corridor for international trade.

Bahrainis, Emiratis, and Omanis

Bahrainis, Emiratis, and Omanis are geographically similar populations, which explains their

genetic relationship as demonstrated in this study. These three populations tend to form a het-

erogeneous group with Pakistanis, Indians, Iranian Arabs (Famoori), Sardinians (the later

probably close to Iberians/North Africans but behaving as out layer group in analyses because

of they are a genetic island isolate), Egyptians, and some sub-Saharan Africans, such as Congo-

lese. These populations appear close to certain Eastern Mediterranean populations including

Greeks, Macedonians, and those further, in particular North Africans, hence explaining their

intermediate grouping, and distinction from two main clusters. Collectively, this suggests high

admixture in these populations brought about by their commercially important position. Sar-

dinia is a relative genetic isolate “founded” by Iberian Norax/Nora (first documented Sardin-

ian capital close to Cagliari) and Iberians/North Africans may be genetically related to

Sardinians (A�30-B�18-Cw�5 basic HLA haplotype is very high in Sardinia, Iberia, and North Africa) [93].

Minorities of Arab World

Ethnic minorities. The Kurds and Berbers are the two major ethnic minorities in Arab

world. Berbers are indigenous North African ethnic group found over a vast area stretching

from Atlantic Ocean to Siwa Oasis in Egypt, and from Mediterranean Sea to Niger River. Berbers

number about 20 million people, and constitute 40–45% of Moroccans, 20–25% of Algerians,

and 2–7% in both Libya and Tunisia. The Kurds live in the northern regions of Iraq (15–20%)

and Syria (10%). They constitute an Indo-European ethnic group, and speak Kurdish. Less

important minorities include Armenians, Nubians, Assyrians, and Turkmen [99].

Berbers populations used in this work are closely linked to each other, as well as to present-

day North Africans, and to Western Mediterranean populations, especially Iberians. Indeed,

the Moroccan Berbers are not genetically different from the current Moroccans, nor those of

neighboring populations, like Algerians and Tunisians. This also applies to Tunisian Berbers,

except those of the island of Djerba, who appear to be related to Eastern Mediterranean popu-

lations, including Levant Arabs. This suggests that North African Berbers are in perfect har-

mony with their environments, and that differences between them are cultural rather than

genetic due to 7th century Arabization of the region.

Clustering and genetic distances analyses demonstrated that Iraqi and Iranian Kurds are not

genetically different from Iranians or neighboring populations, including Levant Arab, and are

close to Turks and other Eastern Mediterranean populations. This suggests that Kurds originate

from the region, and are in genetic harmony with neighboring populations, despite the clear

cultural differences. This suggests that Kurds, Syrians, Jordanians, Palestinians, Iraqis, Lebanese,

and Iranians probably share the same genetic profile, with few differences. Accordingly, our

findings confirm the results of an earlier study of Arnaiz-Villena on Iraqi Kurds [54].

Religious minorities. Sunni Muslims constitute the majority (80%) of Arab populations,

followed by Shi’a Muslims (10%) who are present in parts of Iraq, Lebanon, Saudi Arabia,

Kuwait, Yemen, and Bahrain. Non-Muslims make up about 10% of all Arabs, and Christianity

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 17 / 24

(6%) is the second largest religion among Arabs, with about 20 million Christians living in

Lebanon, Egypt, Iraq, Syria, and Jordan. Other minor religions (4%) such as Judaism, Druze

and others are practiced on a much smaller scale [99].

HLA data on Sunni and Shiite Arabs are not available, same as comparison of Muslims to Christians. The only available data are those concerning Arab Jews. In this study, data are

available for three Jewish populations, including two from North Africa (Moroccan and Lib-

yan Jews) and one from the Arabian Peninsula (Yemenite Jews). While genetic distances sepa-

rating these three groups of Jews are small (S1 Table), genetic heterogeneity between these

Jewish populations was noted. For example, Yemenite Jews are related to Western Mediterra-

nean populations, including North Africans and Iberians, while Libyan Jews are related to

Eastern Mediterraneans, including Levantine Arabs. The relatedness of Moroccan Jews

depends to other communities on the studied HLA loci; they associate with Eastern Mediterra- neans using DRB1, but group with Eastern Mediterraneans when the other markers are used.

Conclusion

This study supports the notion that Arabs are divided into four groups. The first consisting

of North Africans (Algerians, Tunisians, Moroccans, and Libyans), Saudis, Kuwaitis, and

Yemenis, with relatedness to Western Mediterraneans, including Iberians. The second

includes Levantine Arabs (Palestinians, Jordanians, Lebanese, and Syrians), Iraqi, and

Egyptians, who appear to be related to the Eastern Mediterranean and Iranians, who in

turn belonged to ’Great Levant’ historically described. The third consists of Sudanese and

Comorians who associate with Sub-Saharan Africans. Finally, the fourth group of Arabs

comprises Omanis, Emiratis, and Bahrainis. This group associates with heterogeneous pop-

ulations (Mediterranean, Asian and sub-Saharan). Lastly, the two main indigenous minori-

ties, Berbers and Kurds, are not genetically different from the ‘host’ and neighboring

populations.

Supporting information

S1 Checklist. PRISMA 2009 checklist.

(DOC)

S1 Fig. Neighbor-Joining dendrograms, based on standard genetic distances (SGD), show-

ing relatedness between Arabs and other populations using generic HLA-DRB1� allele fre- quencies data. Populations’ data were taken from references detailed in Tables 1 and 2.

Bootstrap values from 1.000 replicates are shown.

(TIF)

S2 Fig. Neighbor-Joining dendrograms, based on standard genetic distances (SGD), show-

ing relatedness between Arabs and other populations using generic HLA-B� allele frequen- cies data. Populations’ data were taken from references detailed in Tables 1 and 2. Bootstrap

values from 1.000 replicates are shown.

(TIF)

S1 Table. Genetic distances between three groups of Arab Jews based on HLA-DRB1 and

-DQB1 alleles frequencies.

(DOC)

Author Contributions

Conceptualization: Abdelhafidh Hajjej, Lasmar Hattab, Slama Hmida.

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 18 / 24

Formal analysis: Abdelhafidh Hajjej, Slama Hmida.

Investigation: Abdelhafidh Hajjej.

Methodology: Wassim Y. Almawi, Slama Hmida.

Software: Abdelhafidh Hajjej, Lasmar Hattab.

Supervision: Slama Hmida.

Validation: Abdelhafidh Hajjej, Wassim Y. Almawi, Antonio Arnaiz-Villena, Lasmar Hattab,

Slama Hmida.

Writing – original draft: Abdelhafidh Hajjej.

Writing – review & editing: Wassim Y. Almawi, Antonio Arnaiz-Villena.

References 1. HLA allele database: http://hla.alleles.org (last accessed on September 17, 2017)

2. Hudson RR. Analysis of population subdivision in Handbook of statistical genetics, MBD. Balding

MBD and Cannings C. (Eds). pp. 309–324. John Wiley & Sons Chichester, UK, 2001

3. Takezaki N, Nei M. Empirical tests of the reliability of phylogenetic trees constructed with microsatellite

DNA. Genetics. 2008; 178(1): 385–92. https://doi.org/10.1534/genetics.107.081505 PMID: 18202381

4. Nei M. Phylogenetic analysis in molecular evolutionary genetics. Annual Review of Genetics. 1996;

30: 371–403. https://doi.org/10.1146/annurev.genet.30.1.371 PMID: 8982459

5. Tamura K, Nei M, Kumar S. Prospects for inferring very large phylogenies by using the neighbor-join-

ing method. Proceedings of the National Academy of Sciences USA. 2004; 101(30): 11030–5. https://

doi.org/10.1073/pnas.0404206101 PMID: 15258291

6. Excoffier L, Slatkin M. Maximum-likelihood estimation of molecular haplotype frequencies in a diploid

population. Molecular Biology and Evolution. 1995; 12(5): 921–7. https://doi.org/10.1093/

oxfordjournals.molbev.a040269 PMID: 7476138

7. The World Factbook: https://www.cia.gov/library/publications/the-world-factbook

8. Bengio O, Ben-Dor G. Minorities and the State in the Arab World. Lynne Rienner Publishers, 1999–

224 pages

9. Encyclopædia Britannica, Himyar: https://www.britannica.com/topic/Himyar

10. Korotayev A. Ancient Yemen. Oxford: Oxford University Press, 1995.

11. Korotayev A. Pre-Islamic Yemen. Wiesbaden: Harrassowitz Verlag, 1996.

12. Munro-Hay, Stuart C. Aksum: An African Civilization of Late Antiquity 1991. Edinburgh: Edinburgh

University Press, 1991.

13. Robin CJ. Arabia and Ethiopia, ’in Johnson Scott (ed.) The Oxford Handbook of Late Antiquity, Oxford

University Press 2012 pp. 247–333, p.279.

14. Hoyland R. Arabia and the Arabs: From the Bronze Age to the Coming of Islam, Routledge, 2001,

p.51.

15. Encyclopædia wikipedia: https://en.wikipedia.org/wiki/History_of_Islam

16. Hourani A. A History of the Arab Peoples. Harvard University Press 2002; pp. 15–19. ISBN

9780674010178.

17. Moher D, Liberati A, Tetzlaff J, Altman DG, and PRISMA Group, “Reprint—preferred reporting

items for systematic reviews and meta-analyses: the PRISMA statement”. Physical Therapy. 2009;

89(9): 873–80. https://doi.org/10.1093/ptj/89.9.873 PMID: 19723669

18. Young FW, Bann CM. A visual statistics system. In Stine RA, Fox J, eds. Statistical computing envi-

ronments for social researches. New York: Sage publications. 1996; 207–36.

19. Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Molecular Biology and Evolution. 1987; 4(4): 406–425. https://doi.org/10.1093/oxfordjournals.molbev.

a040454 PMID: 3447015

20. Nei M. Genetic distances between populations. The American Naturalist. 1972; 106:283. http://jstor.

org/stable/2459777

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 19 / 24

21. Nei M. Analysis of gene diversity in subdivided populations. Proceedings of the National Academy of

Sciences USA. 1973; 70(12): 3321–3. PMID: 4519626.

22. Nei M, Tajima F, Tateno Y. Accuracy of estimated phylogenetic trees from molecular data. II. Gene

frequency data.Journal of Molecular Evolution. 1983; 19(2): 153–70. https://doi.org/10.1007/

BF02300753 PMID: 6571220

23. Database of allele frequencies: http://www.allelefrequencies.net, 2017

24. Hajjej A, Sellami MH, Kaabi H, Hajjej G, El-Gaaied A, Boukef K, et al. HLA class I and class II polymor-

phisms in Tunisian Berbers. Annals of Human Biology. 2011; 38 (2): 156–64. https://doi.org/10.3109/

03014460.2010.504195 PMID: 20666704

25. Gomez-Casado E, del Moral P, Martinez-Laso J, Garcı́a-Gómez A, Allende L, Silvera-Redondo C,

et al. HLA gene in Arabic-Speaking Moroccans: close relatedness to Berbers and Iberians. Tissue

Antigens. 2000; 55(3): 239–49. https://doi.org/10.1034/j.1399-0039.2000.550307.x PMID: 10777099

26. Mahfoudh N, Ayadi I, Kamoun A, Ammar R, Mallek B, Maalej L, et al. Analysis of HLA-A, -B, -C, -DR,

-DQ polymorphisms in the South Tunisian population and a comparison with other populations. Annals

of Human Biology. 2013; 40(1): 41–7. https://doi.org/10.3109/03014460.2012.734334 PMID:

23095049

27. Matevosyan L, Chattopadhyay S, Madelian V, Avagyan S, Nazaretyan M, Hyussian A, et al. HLA-A,

HLA-B, and HLA-DRB1 allele distribution in a large Armenian population sample. Tissue Antigens.

2011; 78(1): 21–30. https://doi.org/10.1111/j.1399-0039.2011.01668.x PMID: 21501120

28. Hamdi NM, Al-Hababi FH, Eid AE. HLA class I and class II associations with ESRD in Saudi Arabian

population. PLoS One. 2014 Nov 7; 9(11): e111403. https://doi.org/10.1371/journal.pone.0111403

PMID: 25380295

29. Arnaiz-Villena A, Elaiwa N, Silvera C, Rostom A, Moscoso J, Gómez-Casado E, et al. The origin of

Palestinians and their genetic relatedness with other Mediterranean populations. Retraction in: Suciu-

Foca N, Lewis R. Human Immunology. 2001; 62(9): 889–900. (Accessed on https://commons.

wikimedia.org/wiki/File:Palestinians_hla.pdf) PMID: 11543891

30. Albalushi KR, Sellami MH, Alriyami H, varghese M, Boukef MK, Hmida S. The Investigation of the Evo-

lutionary History of the Omani Population by Analysis of HLA Class I Polymorphism. Anthropologist.

2014; 18(1): 205–210

31. Sánchez-Velasco P, Karadsheh NS, Garcı́a-Martı́n A, Ruı́z de Alegrı́a C, Leyva-Cobián F. Molecular

analysis of HLA allelic frequencies and haplotypes in Jordanians and comparison with other related

populations. Human Immunology. 2001; 62(9): 901–9. https://doi.org/10.1016/S0198-8859(01)

00289-0. PMID: 11543892.

32. Galgani A, Mancino G, Martı́nez-Labarga C, Cicconi R, Mattei M, Amicosante M, et al. HLA-A, -B and

-DRB1 allele frequencies in Cyrenaica population (Libya) and genetic relationships with other popula-

tions. Hum Immunol. 2013; 74(1): 52–9. https://doi.org/10.1016/j.humimm.2012.10.001 PMID:

23079236

33. Hajjej A, Hmida S, Kaabi H, Dridi A, Jridi A, El Gaaled A, et al. HLA genes in Southern Tunisians

(Ghannouch area) and their relationship with other Mediterraneans. European Journal Medical Genet-

ics. 2006; 49(1): 43–56. https://doi.org/10.1016/j.ejmg.2005.01.001 PMID: 16473309

34. Hmida S, Gauthier A, Dridi A, Quillivic F, Genetet B, Boukef K, et al. HLA class II gene polymorphism

in Tunisians. Tissue Antigens. 1995; 45(1): 63–8. https://doi.org/10.1111/j.1399-0039.1995.tb02416.

x PMID: 7725313

35. Almawi WY, Busson M, Tamim H, Al-Harbi EM, Finan RR, Wakim-Ghorayeb SF, et al. HLA class II

profile and distribution of HLA-DRB1 and HLA-DQB1 alleles and haplotypes among Lebanese and

Bahraini Arabs. Clinical and Diagnostic Laboratory Immunology. 2004; 11(4): 770–4. https://doi.org/

10.1128/CDLI.11.4.770-774.2004 PMID: 15242955

36. Amar A, Kwon OJ, Motro U, Witt CS, Bonne-Tamir B, Gabison R, et al. Molecular analysis of HLA

class II polymorphisms among different ethnic groups in Israel. Human Immunology. 1999; 60(8):

723–30. https://doi.org/10.1016/S0198-8859(99)00043-9 PMID: 10439318

37. Izaabel H, Garchon HJ, Caillat-Zucman S, Beaurain G, Akhayat O, Bach JF, et al. HLA class II DNA

polymorphism in a Moroccan population from the Souss, Agadir area. Tissue Antigens. 1998; 51(1):

106–10. https://doi.org/10.1111/j.1399-0039.1998.tb02954.x PMID: 9459511

38. Al-Tonbary Y, Abdel-Razek N, Zaghloul H, Metwaly S, El-Deek B, El-Shawaf R. HLA class II polymor-

phism in Egyptian children with lymphomas. Hematology. 2004; 9(2): 139–45. https://doi.org/10.

1080/1024533042000205487 PMID: 15203870

39. Clayton J, Lonjou C. Allele and Haplotype frequencies for HLA loci in various ethnic groups. In

Charron D, ed. Genetic diversity of HLA. Functional and medical implications. Vol 1. Paris: EDK.

1997; 665–820.

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 20 / 24

40. Abdennaji Guenounou B, Loueslati BY, Buhler S, Hmida S, Ennafaa H, Khodjet-Elkhil H, et al. HLA

class II genetic diversity in Southern Tunisia and the Mediterranean area. International Journal Immu-

nogenetics. 2006; 33(2): 93–103. https://doi.org/10.1111/j.1744-313X.2006.00577.x PMID:

16611253

41. Martinez-Laso J, De Juan D, Martinez-Quiles N, Gomez-Casado E, Cuadrado E, Arnaiz-Villena A.

The contribution of the HLA-A, -B, -C and -DR, -DQ DNA typing to the study of the origins of Spaniards

and Basques. Tissue Antigens. 1995; 45(4): 237–45. https://doi.org/10.1111/j.1399-0039.1995.

tb02446.x PMID: 7638859.

42. Brick C, Bennani N, Atouf O, Essakalli M. HLA-A, -B, -DR and -DQ allele and haplotype frequencies in

the Moroccan population: a general population study. Transfusion Clinique et Biologique. 2006; 13(6):

346–52. https://doi.org/10.1016/j.tracli.2006.12.003 PMID: 17306585

43. Gibert M, Touinssi M, Reviron D, Mercier P, Boëtsch G, Chiaroni J. HLA-DRB1 frequencies of the

Comorian population and their genetic affinities with Sub-Saharan African and Indian Oceanian popu-

lations. Annals of Human Biology. 2006; 33(3): 265–78. https://doi.org/10.1080/03014460600578599

PMID: 17092866

44. Samaha H, Rahal EA, Abou-Jaoude M, Younes M, Dacchache J, Hakime N. HLA class II allele fre-

quencies in the Lebanese population. Molecular Immunology. 2003; 39(17–18): 1079–81. https://doi.

org/10.1016/S0161-5890(03)00073-7 PMID: 12835080

45. Khansa S, Hoteit R, Shammaa D, Khalek RA, El Halas H, Greige L, et al. HLA class II allele frequen-

cies in the Lebanese population. Gene. 2012; 506(2): 396–9. https://doi.org/10.1016/j.gene.2012.06.

063 PMID: 22750800

46. Elbjeirami WM, Abdel-Rahman F, Hussein AA. Probability of finding an HLA-matched donor in imme-

diate and extended families: the Jordanian experience. Biology of Blood and Marrow Transplantation.

2013; 19(2): 221–6. https://doi.org/10.1016/j.bbmt.2012.09.009 PMID: 23025986

47. Mourad J, Monem F. HLA-DRB1 allele association with rheumatoid arthritis susceptibility and severity

in Syria. Revista Brasileira De Reumatologia. 2013; 53(1): 47–56. PMID: 23588515

48. Djidjik R, Allam I, Douaoui S, Meddour Y, Cherguelaı̂ne K, Tahiat A, et al. Association study of human

leukocyte antigen-DRB1 alleles with rheumatoid arthritis in Algerian patients. International Journal of

Rheumatic Diseases. 2014. https://doi.org/10.1111/1756-185X.12272 PMID: 24447879

49. Hajeer AH, Al Balwi MA, AytülUyar F, Alhaidan Y, Alabdulrahman A, Al Abdulkareem I, et al. HLA-A,

-B, -C, -DRB1 and -DQB1 allele and haplotype frequencies in Saudis using next generation sequenc-

ing technique. Tissue Antigens. 2013; 82(4): 252–8. https://doi.org/10.1111/tan.12200 PMID:

24461004

50. Hajeer AH, Sawidan FA, Bohlega S, Saleh S, Sutton P, Shubaili A, Tahan AA, Al Jumah M. HLA class

I and class II polymorphisms in Saudi patients with myasthenia gravis. International Journal of Immu-

nogenetics. 2009; 36(3): 169–72. https://doi.org/10.1111/j.1744-313X.2009.00843.x PMID:

19490212

51. Albalushi KR, Sellami MH, Alriyami H, varghese M, Boukef MK, Hmida S. HLA Class II (DRB1 and

DQB1) Polymorphism in Omanis. Journal of Transplantation Technologies and Research 2014; 4:

134. https://doi.org/10.4172/2161-0991.1000134

52. Haider MZ, Shaltout A, Alsaeid K, Qabazard M, Dorman J. Prevalence of human leukocyte antigen

DQA1 and DQB1 alleles in Kuwaiti Arab children with type 1 diabetes mellitus. Clinical Genetics.

1999; 56(6): 450–6. https://doi.org/10.1034/j.1399-0004.1999.560608.x PMID: 10665665

53. Haider MZ, Zahid MA, Dalal HN, Razik MA. Human leukocyte antigen (HLA) DRB1 alleles in Kuwaiti

Arabs with schizophrenia.American Journal of Medical Genetics. 2000; 96(6): 870–2. https://doi.org/

10.1002/1096-8628(20001204)96:6<870::AID-AJMG36>3.0.CO;2-L PMID: 11121200. 54. Arnaiz-Villena A, Palacio-Grüber J, Muñiz E, Campos C, Alonso-Rubio J, Gomez-Casado E, et al.

Genetic HLA Study of Kurds in Iraq, Iran and Tbilisi (Caucasus, Georgia): Relatedness and Medical

Implications. PLoS One. 2017 Jan 23; 12(1): e0169929. https://doi.org/10.1371/journal.pone.

0169929 PMID: 28114347

55. Nassar MY, Al-Shamahy HA, Masood HA. The Association between Human Leukocyte Antigens and

Hypertensive End-Stage Renal Failure among Yemeni Patients. Sultan Qaboos University Medical

Journal. 2015; 15(2): e241–249. PMID: 26052458

56. Middleton D, Williams F, Meenagh A, Daar AS, Gorodezky C, Hammond M, et al. Analysis of the

distribution of HLA-A alleles in populations from five continents. Human Immunology. 2000; 61

(10): 1048–52. https://doi.org/10.1016/S0198-8859(00)00178-6 PMID: 11082518

57. Williams F, Meenagh A, Darke C, Acosta A, Daar AS, Gorodezky C, et al. Analysis of the distribution

of HLA-B alleles in populations from five continents. Human Immunology. 2001; 62(6): 645–50.

https://doi.org/10.1016/S0198-8859(01)00247-6 PMID: 11390040

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 21 / 24

58. Jazairi B, Khansaa I, Ikhtiar A, Murad H. Frequency of HLA-DRB1 and HLA-DQB1 Alleles and Haplo-

type Association in Syrian Population. Immunological Investigation. 2016; 45(2): 172–9. https://doi.

org/10.3109/08820139.2015.1131293 PMID: 26853713

59. Hajjej A, Hajjej G, Almawi WY, Kaabi H, El-Gaaied A, Hmida S. HLA class I and class II polymorphism

in a population from south-eastern Tunisia (Gabes Area). International Journal of Immunogenetics.

2011; 38(3): 191–9. https://doi.org/10.1111/j.1744-313X.2011.01003.x PMID: 21385325

60. Hajjej A, Kâabi H, Sellami MH, Dridi A, Jeridi A, El borgi W, et al. The contribution of HLA class I and II

alleles and haplotypes to the investigation of the evolutionary history of Tunisians. Tissue Antigens.

2006; 68(2): 153–62. https://doi.org/10.1111/j.1399-0039.2006.00622.x PMID: 16866885

61. Hajjej A, Almawi WY, Hattab L, El-Gaaied A, Hmida S. HLA Class I and Class II Alleles and Haplo-

types Confirm the Berber Origin of the Present Day Tunisian Population. PLoS One. 2015; 10(8):

e0136909. https://doi.org/10.1371/journal.pone.0136909 PMID: 26317228

62. Hajjej A, Almawi WY, Hattab L, El-Gaaied A, Hmida S. The investigation of the origin of Southern Tuni-

sians using HLA genes. Journal of Human Genetics. 2017; 62(3): 419–429. https://doi.org/10.1038/

jhg.2016.146 PMID: 27881842

63. Ayed K, Ayed-Jendoubi S, Sfar I, Labonne MP, Gebuhrer L. HLA class-I and HLA class-II phenotypic,

gene and haplotypic frequencies in Tunisians by using molecular typing data. Tissue Antigens. 2004;

64(4): 520–32. https://doi.org/10.1111/j.1399-0039.2004.00313.x PMID: 15361135

64. Oumhani K, Canossi A, Piancatelli D, Di Rocco M, Del Beato T, Liberatore G, et al. Sequence-Based

analysis of the HLA-DRB1 polymorphism in Metalsa Berber and Chaouya Arabic-speaking groups

from Morocco. Human Immunology. 2002; 63(2): 129–38. https://doi.org/10.1016/S0198-8859(01)

00370-6 PMID: 11821160

65. Canossi A, Piancatelli D, Aureli A, Oumhani K, Ozzella G, Del Beato T, et al. Correlation between

genetic HLA class I and II polymorphisms and anthropological aspects in the Chaouya population

from Morocco (Arabic speaking). Tissue Antigens. 2010; 76(3): 177–193. https://doi.org/10.1111/j.

1399-0039.2010.01498.x PMID: 20492599

66. Roitberg-Tambur A, Witt CS, Friedmann A, Safirman C, Sherman L, Battat S, Nelken D, Brautbar C.

Comparative analysis of HLA polymorphism at the serologic and molecular level in Moroccan and Ash-

kenazi Jews. Tissue Antigens. 1995; 46(2): 104–10. https://doi.org/10.1111/j.1399-0039.1995.

tb02485.x PMID: 7482502

67. Arnaiz-Villena A, Benmamar D, Alvarez M, Diaz-Campos N, Varela P, Gomez-Casado E, et al. HLA

allele and haplotype frequencies in Algerians. Relatedness to Spaniards and Basques. Human Immu-

nology. 1995; 43(4): 259–68. https://doi.org/10.1016/0198-8859(95)00024-X PMID: 7499173

68. Imanishi T, Akaza T, Kimura A, Tokunaga K, Gjobori T. Allele and haplotype frequencies for HLA and

complement loci in various ethnic groups. In, eds. HLA 1991. VOL 1. Oxford: Oxford University

Press. 1992; 1065–220.

69. Arnaiz-Villena A, Iliakis P, González-Hevilla M, Longás J, Gómez-Casado E, Sfyridaki K, et al. The ori-

gin of Cretan populations as determined by characterization of HLA alleles. Tissue Antigens. 1999; 53

(3): 213–26. https://doi.org/10.1034/j.1399-0039.1999.530301.x PMID: 10203014

70. Comas D, Mateu E, Calafell F, Pérez-Lezaun A, Bosch E, Martı́nez-Arias R, et al. HLA class I and

class II DNA typing and the origin of Basques. Tissue Antigens. 1998; 51(1): 30–40. https://doi.org/

10.1111/j.1399-0039.1998.tb02944.x PMID: 9459501

71. Grimaldi MC, Crouau-Roy B, Amoros JP, Cambon-Thomsen A, Carcassi C, Orru S, et al. West Medi-

terranean islands (Corsica, Balearic Islands, Sardinia) and the Basque population: contribution of HLA

class I molecular markers to their evolutionary history. Tissue Antigens. 2001; 58(5): 281–92. https://

doi.org/10.1034/j.1399-0039.2001.580501.x PMID: 11844138

72. Renquin J, Sanchez-Mazas A, Halle L, Rivalland S, Jaeger G, Mbayo K, et al. HLA class II polymor-

phism in Aka Pygmies and Bantu Congolese and a reassessment of HLA-DRB1 African diversity. Tis-

sue Antigens. 2001; 58(4): 211–22. https://doi.org/10.1034/j.1399-0039.2001.580401.x PMID:

11782272

73. Farjadian S, Ghaderi A. HLA class II genetic diversity in Arabs and Jews of Iran. Iranian Journal of

Immunology. 2007; 4(2): 85–93. https://doi.org/IJIv4i2A3 PMID: 17652848

74. Kollaee A, Ghaffarpor M, Ghlichnia HA, Ghaffari SH, Zamani M. The influence of the HLA-DRB1 and

HLA-DQB1 allele heterogeneity on disease risk and severity in Iranian patients with multiple sclerosis.

International Journal of Immunogenetics. 2012; 39(5): 414–22. https://doi.org/10.1111/j.1744-313X.

2012.01104.x PMID: 22404765

75. Sayad A, Akbari MT, Pajouhi M, Mostafavi F, Zamani M. The influence of the HLA-DRB, HLA-DQB

and polymorphic positions of the HLA-DRβ1 and HLA-DQβ1 molecules on risk of Iranian type 1 diabe- tes mellitus patients. International Journal of Immunogenetics. 2012; 39(5): 429–36. https://doi.org/

10.1111/j.1744-313X.2012.01116.x PMID: 22494469

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 22 / 24

76. Sulcebe G, Sanchez-Mazas A, Tiercy JM, Shyti E, Mone I, Ylli Z, et al. HLA allele and haplotype fre-

quencies in the Albanian population and their relationship with the other European populations. Inter-

national Journal of Immunogenetics. 2009; 36(6): 337–43. https://doi.org/10.1111/j.1744-313X.2009.

00868.x PMID: 19703234

77. Sanchez-Velasco P, Gomez-Casado E, Martinez-Laso J, Moscoso J, Zamora J, Lowy E, et al. HLA

alleles in isolated populations from North Spain: origin of the Basques and the ancient Iberians. Tissue

Antigens. 2003; 61(5): 384–92. https://doi.org/10.1034/j.1399-0039.2003.00041.x PMID: 12753657

78. Arnaiz-Villena A, Dimitroski K, Pacho A, Moscoso J, Gómez-Casado E, Silvera-Redondo C, et al.

HLA genes in Macedonians and the sub-Saharan origin of the Greeks. Tissue Antigens. 2001; 57

(2): 118–27. https://doi.org/10.1034/j.1399-0039.2001.057002118.x PMID: 11260506

79. Arnaiz-Villena A, Karin M, Bendikuze N, Gomez-Casado E, Moscoso J, Silvera C, et al. HLA alleles

and haplotypes in the Turkish population: relatedness to Kurds, Armenians and other Mediterraneans.

Tissue Antigens. 2001; 57(4): 308–17. https://doi.org/10.1034/j.1399-0039.2001.057004308.x PMID:

11380939

80. Muro M, Marı́n L, Torı́o A, Moya-Quiles MR, Minguela A, Rosique-Roman J, et al. HLA polymorphism

in the Murcia population (Spain): in the cradle of the archaeologic Iberians. Human Immunology. 2001;

62(9): 910–21. https://doi.org/10.1016/S0198-8859(01)00290-7 PMID: 11543893

81. Farjadian S, Ghaderi A. HLA class II similarities in Iranian Kurds and Azeris. International Journal of

Immunogenetics. 2007; 34(6): 457–63. https://doi.org/10.1111/j.1744-313X.2007.00723.x PMID:

18001303

82. Mohyuddin A, Ayub Q, Khaliq S, Mansoor A, Mazhar K, Rehman S, et al. HLA polymorphism in six eth-

nic groups from Pakistan. Tissue Antigens. 2002; 59(6): 492–501. https://doi.org/10.1034/j.1399-

0039.2002.590606.x PMID: 12445319

83. Agrawal S, Srivastava SK, Borkar M, Chaudhuri TK. Genetic affinities of north and northeastern popu-

lations of India: inference from HLA-based study. Tissue Antigens. 2008; 72(2): 120–30. https://doi.

org/10.1111/j.1399-0039.2008.01083.x PMID: 18721272

84. Rani R, Sood A, Goswami R. Molecular basis of predisposition to develop type 1 diabetes mellitus in

North Indians. Tissue Antigens. 2004; 64(2): 145–55. https://doi.org/10.1111/j.1399-0039.2004.

00246.x PMID: 15245369

85. Migot-Nabias F, Fajardy I, Danze PM, Everaere S, Mayombo J, Minh TN, et al. HLA class II polymor-

phism in a Gabonese Banzabi population. Tissue Antigens. 1999; 53(6): 580–5. https://doi.org/10.

1034/j.1399-0039.1999.530610.x PMID: 10395110

86. Arnaiz-Villena A, Muñiz E, Campos C, Gomez-Casado E, Tomasi S, Martı́nez-Quiles N, et al. Origin of Ancient Canary Islanders (Guanches): presence of Atlantic/Iberian HLA and Y chromosome genes

and Ancient Iberian language. International Journal of Modern Anthropology. 2015; 8: 67–93. https://

doi.org/10.4314/ijma.v1i8.4

87. Reil A, Bein G, Machulla HK, Sternberg B, Seyfarth M. High-resolution DNA typing in immunoglobulin

A deficiency confirms a positive association with DRB1*0301, DQB1*02 haplotypes. Tissue Antigens. 1997; 50(5): 501–6. https://doi.org/10.1111/j.1399-0039.1997.tb02906.x PMID: 9389325

88. Arnaiz-Villena A, Martı́nez-Laso J, Gómez-Casado E, Dı́az-Campos N, Santos P, Martinho A, et al.

Relatedness among Basques, Portuguese, Spaniards, and Algerians studied by HLA allelic frequen-

cies and haplotypes. Immunogenetics. 1997; 47(1): 37–43. PMID: 9382919

89. Stearns PN. The Encyclopedia of World History: Ancient, Medieval, and Modern, Chronologically

Arranged, 6 ed., Houghton Mifflin Harcourt, 2001, 2017, pp. 129–131.

90. Mira-Guardiola MA (2000). Cartago contra Roma. Ed.: Alderaban. Madrid, Spain.

91. Sellier J, Sellier A. Atlas des Peuples d’Orient. Paris, France: Editions La Decouverte, 1993

92. Arnaiz-Villena A, Martinez-Laso J, Alonso-Garciá J. The Correlation Between Languages and Genes:

The Usko-Mediterranean Peoples. Human Immunology. 2001; 62(9): 1051–1061. https://doi.org/10.

1016/S0198-8859(01)00300-7 PMID: 11543906

93. Arnaiz-Villena A, Gomez-Casado E, Martinez-Laso J. Population genetic relationships between Medi-

terranean populations determined by HLA allele distribution and a historic Perspective. Tissue Anti-

gens. 2002; 60(2): 111–21. https://doi.org/10.1034/j.1399-0039.2002.600201.x PMID: 12392505

94. Hachid M. Postface de l’ouvrage “aux origines de l’ecriture au Maroc. corpus des inscriptions ama-

zighes des sites d’art rupestre du maroc” edited by: Skounti A., Lemdjidi A. and Nami M. Publication

de l’institut royal de la culture amazighe. Cealpa, rabat, morocco, 2003.

95. Malika H. Les premier berebers entre mediterranee, tassili et nil. Edited by edisud. aix-en-provence,

France 2000

96. Carter RA. Boat remains and trade in Persian Gulf during the 6th and 5th millenia BC. Antiquity. 2006;

80(307): 52–63. https://doi.org/10.1017/S0003598X0009325X

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 23 / 24

97. Kuhrt A. The ancient Near East (3000–330 BC). Vol II. Barcelona, Editorial Critica, 2001.

98. Hitti PK. History of Syria: Including Lebanon and Palestine, 2004, p26

99. Encyclopaedia Britannica: https://www.britannica.com/

100. Sartre M, D’Alexandre à Zénobie: Histoire du Levant antique, IVe siècle avant Jésus-Christ-IIIe siècle après Jésus-Christ, Fayard, 2001.

Genetic heterogeneity of Arabs

PLOS ONE | https://doi.org/10.1371/journal.pone.0192269 March 9, 2018 24 / 24

Genetics, health, urban Dubai-2018.pdf

City and cosmology: genetics, health, and urban living in Dubai

Aaron Parkhurst

Department of Anthropology, University College London (UCL), London, United Kingdom

ARTICLE HISTORY Received 28 September 2017 Accepted 9 October 2017

ABSTRACT In light of increasingly high rates of diabetes, heart disease, and obesity among citizens of the Arabian Gulf, popular health discourse in the region has emphasised the emergent Arab genome as the primary etiological basis of major health conditions. However, after many years of public dissemination of genomic knowledge in the region, and widespread acceptance of this knowledge among Gulf Arab citizens, the rates of chronic illness continue to increase. This paper briefly explores the clash between indigenous Islamic knowledge systems and biomedical knowledge systems imported into the United Arab Emirates. It presents vignettes collected from interviews and participant observation in Dubai as part of nearly four years of ethnographic research, completed as part of the author’s doctoral work on ‘Anxiety and Identity in Southeast Arabia’. Rather than radically informing health seeking behaviours among many UAE citizens, the emphasis on the ‘Arab Genome’ has instead reconfirmed the authority of Bedouin cosmological understandings of disease, reshaping the language that people use to engage with their bodies and their health. Local cosmology remains a powerful discursive element that often operates in contention, in sometimes powerfully subtle ways, with novel health initiative regimes. For many people in the region, genomic information, as it is often discussed and propagated in the UAE, shares an intimate relationship with ideas of fate and national identity, and sometimes serves to mitigate the increasingly uncertain terms of engagement that people share between the body, their health, and rapidly changing urban landscapes.

KEYWORDS Genetics; medical anthropology; chronic illness; fate; urban anthropology

Introduction

The underlying premise of this article extends from a simple, but profound anthropologi- cal critique in the practice of biomedicine in different societies. That is, when policy plan- ners and health professionals try to think through ideas of behaviour change that accompany much of the discourse on obesity, diabetes, heart disease, and global health in general, they need to take into account people’s perceptions or ideas of their ability to cre- ate bodily change for themselves in general. Medical anthropology has long emphasised the role of cultural landscapes and idiosyncrasies in producing powerful regimes of both

CONTACT Aaron Parkhurst [email protected]

© 2018 Informa UK Limited, trading as Taylor & Francis Group

ANTHROPOLOGY & MEDICINE, 2018 VOL. 25, NO. 1, 68–84 https://doi.org/10.1080/13648470.2017.1398815

illness and health, and alarming rates of chronic illness across the globe re-illuminate the systematic neglect of culture in policy planning and debate (Napier et al. 2014). How is agency constructed in ‘health seeking behaviour’, and what are the wider social factors that inform ‘health seeking behaviour’?

This paper is informed from long-term field-work in Dubai that focused on these ques- tions of health seeking behaviours and how they relate to local ideas of fate, agency, and genes. Further to these ideas, however, Dubai provides a unique context to think through many forms of chronic illness that become propagated through individual habits and behaviours. From questions that emerge in my recent inquiries on the human body and urban environments, this paper explores an anthropological problem presented by the body in the city, namely, the disruption of the stable relationship between the human body and the environment. Genetics, as a concept, becomes an explanatory model that men and women in Southeast Arabia utilise to speak towards this disruption.

The ethnographic data used in this paper was collected as part of nearly four years of anthropological fieldwork in Dubai and Abu Dhabi, in which I lived and worked as an anthropologist (February 2007–October 2010). It forms part of a larger body of work on the relationship between globalisation, chronic illness, and tradition within Southeast Ara- bia, undertaken as my doctoral research. The research was conducted in many social and medical spaces, but primarily in participants’ homes, caf�e’s, and other intimate social spaces. Part of this ethnography was also conducted in clinical settings, involving partici- pant observation in three mental health institutions (one in Abu Dhabi and two in Dubai), and two nutrition clinics in Dubai. My anthropological research began as a project study- ing mental health and the stigmatisation of mental illness in the Emirates, as well as men’s health issues in the country in general. The current focus on diabetes and genetics emerged from concerns from both local health authorities and from Emirati lay persons. During my time in the Emirates researching chronic illness, Emiratis in general spoke often and openly about their engagement with genetics, and both their deep love and anx- iety of the city. These themes comprise the focus of this paper.

The research methodology consisted primarily of participant observation and inter- views conducted in both Arabic and English. Unless otherwise stated, the dialogue pre- sented in this paper was conducted in English. Most of the discussions between my participants and myself were qualitative, open ended engagements, though many inter- views directed participants to discuss their understandings of genetics, the city, or both. Participants were recruited in a wide number of contexts: some participated in discussions as part of formal discussions in clinics; others were recruited through participant observa- tion in Dubai, and we met in their homes, caf�e’s, or places of work in which I had access and permission to conduct fieldwork. Still others were part of a support network in my Arabic education. Most of the participants that inform the ethnography of this paper, and with whom I became close, were men. This is partly due to the nature of the overarching research questions on men’s mental and physical health issues in the Emirates, but it is also due to the social structures of the country. While women participated in general interviews in public health spaces, I only had ethnographic access to men in more per- sonal and private social spaces. The participants of whom this paper concerns are almost all Emirati citizens living in Dubai, with the exception of some perspectives from Euro- American health professionals working in the city. Citizenship in the UAE is still informed from tribal affiliation. Many Emirati in Dubai and Abu Dhabi are members of

ANTHROPOLOGY & MEDICINE 69

different branches of the Bani-Yas tribe, a large and powerful kinship group that enjoys a long history in the Arabian Peninsula. However, there are also many who trace their line- age through other large tribes. Emirati tribal leaders (sheikhs) often draw upon Bedouin identity in public discourse in Dubai, though the label of Bedouin is rather fluid. While different families in the Emirates have diverse historical backgrounds and histories that shape their experience of the developing Emirati cities, this paper draws upon shared understandings of the body and cosmology that unify the citizens of the Emirates.

Diabetes in the Emirates

The predominant blood sugar disorder discussed in this paper is Diabetes Mellitus Type 2. This condition is categorised through the inability of the body to respond to insulin prop- erly, and usually develops in adulthood. There are many risk factors that are known to contribute to Diabetes Mellitus Type 2, henceforth often referred to in this paper as sim- ply ‘diabetes’, but most salient in public health narratives are those risk factors that corre- late diabetes to obesity (Body Mass Index of 30 and higher), personal diets, behaviours, and habits. Diabetes is well-understood as contributing profoundly to a wide-range of co- morbidities. Because of its relationship with obesity, they are often discussed in unison by health officials in Dubai.

The experience of diabetes in Dubai is often explained through narratives of ‘energy’. Those who have the condition complain of not having any energy to go shopping, or go to work, and sometimes complain that they do not have the ‘energy’ to go outside, as the heat of Dubai’s oppressive climate stifles them. This is especially frustrating for those who are told their condition is tied to inactivity. The experience of diabetes, however, is highly variable in Dubai, especially as the condition presents itself in increasingly younger indi- viduals. It often first presents itself as a major problem when people have other ailments or are treated for other conditions. The experience of the condition remains confusing for many of the people with whom I spoke, especially for younger individuals (in their late 20s or early 30s). They were aware, and even fearful, of the cardiovascular risks that the condition informs, and they all had personally known others whose death at an early age due to cardiovascular disease was informed by diabetes. While they felt the physical effects of the chronic illness, and indeed, some had been diagnosed after an initial diabetic attack, their social lives, in their own terms, had yet to be grossly impacted by the disease. As a result, it was difficult for many people to narrate their current suffering beyond physical sensation. As I will discuss later, for many the condition was considered with some ambiv- alence. In this regard, when I spoke with people about the experience of living with diabe- tes, they often turned the discussion away from their own lives, and instead borrowed pathology as an opportunity to think through other aspects of their society.

Diabetes, and even obesity in general, is often seen by Emiratis in the UAE as a condi- tion brought about by modernity. The Arabic term for diabetes in the Emirates is ‘da3 al- suker’ and translates literally as ‘disease of sugar’. However, the Latin term ‘diabetes’ is used ubiquitously in both Arabic and English discourse. In this regard, its immediate rela- tionship to food and drink consumption is disrupted, allowing for more fluid and com- plex understandings of the origins of the condition. Long-term medical professionals in the UAE remember and recognise the historical development of blood sugar discourse in the country. For example, a German physician who had practiced in the country for

70 A. PARKHURST

20 years explained, ‘There was an idea, and I still come across this, that we [here he refers to himself, and other Euro-American expatriates] brought some of these conditions with us. Sometimes people might say ‘you made this problem so you fix it’, and I had no idea what they were talking about’. The physician later came to understand that his patients were referring to the idea of Euro-American immigrants as perceived agents of disease, or at least associating these expatriates with the conditions of change and foreign influence that bring sickness. ‘My father thinks these things’, a friend explained to me. ‘He thinks diabetes is a conspiracy from Israel or something like this’. I asked why. ‘Well, people didn’t have this problem, … nobody used to have Diabetes. Or maybe they had it, I think, but nobody knew about these things. So they blamed everyone else. And now we know it is genetic, but even now some people don’t believe that’.

There is great complexity embedded in these ideas. Israel, here, is understood to be in partnership with American and European governments to subvert Arab society, though these ideas are not shared by everybody. There is also an attempt to understand how dia- betes developed so quickly in the rapidly growing city. Other logics concern immigration as a direct process of pathology. In this regard, diabetes is seen less as something that develops from habits, and instead is partially socially constructed as something caused by ambiguous pathogens that accompany immigration. Others see Euro-American expan- sion as an agent of corruption, if not a direct agent of disease. The complex consequence of these commercial and social infiltrations on the human body is a trend seen in many areas of the world, and has been given the moniker ‘cocacolonisation’ (see Leatherman and Goodman 2005). In the past, diabetes was not known to be a problem, and suddenly, one day it was. According to the International Diabetes Federation (IDF), during the cul- mination of my fieldwork, The Emirates had the second highest rates of diabetes in the world, behind the small Pacific island nation of Nauru (IDF 2010). This trend remains strong. Current data from the IDF holds that nearly 1 in 5 adults in the UAE is currently afflicted with diabetes, and the country’s rates of diabetes are rising faster than both its neighbours in the Arabian peninsula and in the world at large (IDF 2015). If these rates continue, the prevalence of diabetes is expected to double within a generation.

My participants do not use the term ‘cocacolonisation’, but they are aware of these forces of commercial and social intrusions, and they see these processes centred in the city, namely, Dubai. My friend Ali, for example, spoke often about the problems that the city posed and the dilemmas it caused for him and his peers. Ali explains, ‘There are some people who just think it would be better if everyone (foreign) left, and there are other people who are afraid of what will happen if everybody leaves’. ‘What do you think’, I ask him. ‘I think like most people we love people to come here and we love to share our country. But maybe some people are meant to come live here, maybe some people should only come visit. Smaller is ok too, all these towers… It will be good to slow down, or else people (locals) will never leave their homes, and the people coming here will be bored, and they will stop coming… people are becoming very selfish… . [We] do not have to do much. We need to be better’. At other times, he and his peers would complain about the fast food that they and their children consumed, or the amount of TV their family watched, always wildly gesturing to the streets. The city then becomes tied to indigenous understandings of modernity and disease, and is understood to be mapped upon the human body. The body and the city is, in many ways, still a developing subject of analysis in social science, though it has an emerging collection of thought in a range of disciplines

ANTHROPOLOGY & MEDICINE 71

from geography and anthropology to psychoanalysis. While architectural planning has throughout centuries borrowed upon human corporeality to understand the form of streets, buildings, townships and cities (see Vitruvius and De Vinci, for example), philoso- phers and artists near the beginning of the last century began to recognise the metropolis as a new grounding for human culture and corporeality (e.g. Mumford 1934; Metropolis 1927). In a different vein, other thinkers in anthropology and geography conceived of the body and society as mirrors for each other (Douglas 1966), and the city, specifically, as a metaphor for the human body in which stable urban landscapes inform cultural under- standings of the body and identity (Sennett 1994). In this way, space, place and the body become concretely joined. What Sennett identified is how urban spaces become norma- tive, seemingly stable lived experiences for those who live within them. Yet, he also shows how this normative experience of urban-ness belies the reality of the city as a highly unsta- ble, and profoundly fluid and dynamic space. It is a transformational entity in its own right that shares an anthropologically reciprocal relationship with the human body: the city-cum-body is constructed by the body, much as people embody the dynamic forces of the city (ibid).

In discourse on diabetes, obesity, and heart disease, social scientists have long argued for a more holistic view of the body in relation to society to think through health seeking behaviour (see Edwards 2012; Paul 2005; Mendenhall et al. 2010). Specifically, in order to create changes and shifts in health delivery and demographics, especially in a context such as London or Dubai, policy planners need to think beyond what a health authority might be able to issue, and think additionally about the pragmatics and lived experience of people as they try to move through their daily life. In terms of diet and exercise, this has implications for public transport, daily commutes, housing prices, and a wide range of socio-economic policy and practice. In this regard, city politics and urban management in the US and UK, for example, have informed urban neighbourhood demographics, the distances between an individual’s work and residence, the pragmatics of daily travel, and opportunities to create and utilise time for activities beyond income production and household maintenance. These aspects of quotidian city life are mapped onto the human body in the form of chronic illness (Church et al. 2011; Cetateanu and Jones 2014; Bur- goine et al. 2014; Bourgois 2011). The structural limitations of urban living often provide daunting hurdles to prevention of chronic illness, but there is a psychological aspect to health behaviour and practice that is sometimes ignored. That is the sense of futility many people express and experience in thinking through how they might work upon their bodies.

Diabetes and fate

Obesity and diabetes are made complex in Dubai, as they are medical categories that are often fraught with ambivalence, and they are not always seen as unhealthy body categories in the city and country at large. This is certainly not unique to this region of the world (see Randall 2011, or Popenoe 2003, for example). One of the issues that contributes to high Body Mass Index and high rates of blood-sugar disorders in Southeast Arabia that is not discussed in this paper is the perception of these conditions as normative or healthy, and in the case of obesity, sometimes desired. However, as discussed in the section above, dia- betes, specifically, is often understood as a condition of modernity, a sudden product of

72 A. PARKHURST

‘modernisation’. This is evidenced by my participants in a number of ways. One concern from locals is the idea of Western imperialism as an agent of disease. The widespread idea of diabetes in the region grew in similar terms to the influx of foreign immigration, prod- ucts, and ideas. This type of modernity also brought more robust systems of medicaliza- tion into the country. Very few in the Emirates were diagnosed with diabetes before the invitation towards foreign development, and so it is rather reasonable to deduce that it is a ‘Western’ illness category that expatriates brought (and continue to bring) into the country. This perception is made complicated by discourse that links Western material and social imports to cultural pollutants, if not direct agents of disease. American designed fast-food industries, expensive villas, sport-utility vehicles, mass media, and even increased longevity become objects vacillating between desire and danger. All these vacillating objects were tied to urbanising processes, and the city is perceived to be the locus of these goods. In this regard, the desert was often looked upon as a safe haven. As one of my participants proudly advertised, ‘I make my family go camping to the desert every month usually because it is the best thing to grow up right… It is like a medicine’. Though, even then, my friend’s ‘tent’ was fitted with modern amenities. Vacillation, as theorised by Ghassan Hage (2010).

occurs because we do not always know what we want and we often want contradictory things… we can say that vacillation is when there are many incompatible things giving mean- ing to our lives and we find ourselves pursuing them despite their incompatibility. What is important, though, is that vacillation is not just a movement between various states of being; rather, it is a state of being in itself. (Hage 2010, 152)

My participants often describe themselves in this way, torn between desires for conflicting interests and identities. Some defined the city as ‘a place where people don’t know how to not want things’. The desire for both modernity and tradition, and the perceived futility of pursuing both, creates conditions of uncertainty that my participants expressed often. The city becomes a vessel for this uncertainty, and becomes tied to other categories of ambiguity more closely associated with the body; namely, genetics.

As Kilshaw has demonstrated in her ethnography in Qatar (Kilshaw 2015), the Qatari state’s dedicated mission to become ‘modern’ borrows significantly on the role of genetics, but this is often in contention with the way that local Qataris ‘themselves understand and incorporate genetic knowledge into their lives’ (Kilshaw, this issue). Institutionalised genetic sequencing and testing programmes speak towards a local desire to bring Qatar forward as a global leader in healthcare, and they become representative of a ‘modernity’ of which Qatari citizens are very proud. Yet, balancing these desires with traditional emphasis on inheritance makes genetic dissemination very complex, and in some ways, ironic (Kilshaw 2015, this issue). In the context of Dubai, the imports described above bring both comfort and ‘corruption’, and are problematically, though not necessarily falsely, tied to conditions that are often ethnographically also attributed to genetics, such as ‘misbehaving children’ (in terms of autism spectrum), depression, and, saliently, diabe- tes. All these categories are, then, often understood as diseases brought by the West. Some speak of diabetes as a result of a loss of traditional value and culture or religion. For exam- ple, I met a participant who insisted that soft drinks, and specifically Coca Cola, were ruining the health of the city (indirectly invoking the idea of coca-colonisation discussed above), which is something he and I agreed on to a degree. He asserted, however, that if

ANTHROPOLOGY & MEDICINE 73

locals drank more coffee, as was considered traditional, then the diabetes epidemic could be annihilated. There may be some medical truth to this, depending on the ways and the amounts coffee is consumed. However, my participant’s concern was not with the physi- cal and chemical properties of the drink. The harmful long-term effects of soft drink con- sumption are not always perceived to stem from the ingredients of the products: sugar, corn syrup, or, perhaps, colouring compounds. Rather, it is the nationalism of the prod- uct, and its cultural disruption that is understood to be poison for the human body. ‘Coca- colonisation’, then, is a useful but limited concept in the region as it directs analysis of health seeking behaviour away from the individual and places it within wider systems of structural imbalance. My participants do often recognise that coca-cola, as a ‘material’, leads to Diabetes, but this ‘material’ takes on different meaning depending on its source. In this regard, sugar is good when it is used to make local products, and bad when it is imposed upon those who fall within Euro-American patterns of consumption.

Parallel to local understandings of foreign influence are increasingly prevalent public discourse on genetics. Within popular imagination, there is a widely-held perception of genetics as diabetic aetiology; that is, genes are largely, if not wholly responsible for diabe- tes. For example, where I was discussing aetiology with one of my participants, I was speaking about genetic susceptibility for type 2 diabetes, a ‘gene’ for diabetes, and he was speaking of ‘Al Djinn’, those ambiguous agents of the desert, usually frustratingly amoral, that are known to influence the world of humans and disrupt human agency. I am careful to note that he probably does not mean this literally, that genes and Djinn are one and same. Or, if he does, it remains speculative. However, in many regions of Southeast Ara- bia, genes and Djinn, as ambiguous categories of nature and fate, do borrow each other’s language, if not further synonymy. It is a recognition that the sands and vastness of the Rub al Khali, the vast desert that lies across the Southeastern Arabian peninsula, and the human body were both their own cosmologies, populated by cosmological agents that can affect one’s life and well-being.

In this way, genes have been incorporated into indigenous cosmology. The language and rhetoric that my participants apply to discourses of fate are often re-appropriated to help them think through genetics and other biomedical body knowledge. While I do not have the space in this paper to unpack the complex construction of ‘fate’ itself in Dubai, my larger ethnography has shown that fate is a language of uncertainty in Dubai, but is often incommensurable and sometimes even congruous with deep personal agency (Par- khurst 2014). In thinking through the body in the city, and the body of the future, fate becomes a rhetoric that is helpful to situate oneself in the conditions of vacillation I have described above. In relationship to disease, other anthropologists have shown how Islamic conceptions of fate are better understood as languages for structural imbalance. Sherine Hamdy’s work in Egypt, for example, shows how fate is invoked by some as mechanism to take action and meaning within systems of political failure and structural violence (Hamdy 2008, 2009). In contrast to traditional perceptions of ‘Islamic fate’ by colonialist thinkers, my participants often invoked strong sentiments of personal cultivation and cos- mological futility simultaneously. Because of its place in religion and other systems of social relations, fate, as locally defined as submission to God, is proudly locally owned as a marker of identity, yet is practiced with ambivalence. Processes of modernity and urbani- sation as understood by my participants, because of their own ambiguity, and because of their association with bringing both success and disease, are then placed within this

74 A. PARKHURST

language of fate. As genes become increasingly understood as carriers of both identity and disease, they become tied to these languages as well.

The development and dissemination of molecular biological science in laboratory cul- tures over the last five decades informs the social understandings of genes as the science is imported into new contexts. Outside of the Middle-East, this trend has provoked wide philosophical and bioethical debate. In discussing genes with patients, or the public, an often-overlooked consequence is a lay understanding of genes as destiny. Within the sci- entific community, this problem has been discussed for decades, asking, in a broader sense, what it means to say ‘x-gene determines y’. Richard Dawkins has fought against this type of genetic deterministic understanding, asking, ‘Why are genetic determinants thought to be any more ineluctable, or blame-absolving, than ‘environmental’ ones?’ (1999, 10–11). There is, arguably, a cultural miscommunication here between the cultures of laboratories and the general public. For many philosophers, and laypeople, the question is somewhat teleological, for biologists, the question is statistical (ibid). Nonetheless, the human body, as it ambiguously weaves through all systems of social relations, blurring biology and culture, remains a steadfast anthropological problem (Csordas 1994; Scheper- Hughes and Lock 1987), and genetics, understood as synecdoche for the body, have only complicated long-running debates on what it means to ‘be in the world’ (Franklin 1995). Ethnographically speaking, genetic understandings can be strikingly and profoundly meaningful, and have the potential to elicit powerful change in individual and social iden- tity (Rabinow 1996). Anthropologists, recognising the need to create new theoretical tools to think through the ramifications of genomic information in society, have taken on- board this concept of ‘biosociality’ to help understand the role of genes within ethnogra- phy (see Gibbon and Novas 2008). However, they are also critical of instilling too much power within the gene as definitive instruments of change and control (Rabinow 2008). Within the clinic, semantics of genes can radically inform patient behaviour, in both informing aetiology (see Senior et al. 1999 for example), and, in new ways, avoiding aetiol- ogy (Franklin and Roberts 2006). Within larger debates in anthropology, these disruptions of nature and culture perhaps provide evidence for the post-modern viewpoint that social science itself has ill-constructed binaries which it debates and refutes (Latour 1993). Molecular biology may have a role here as well. New genetic sciences and epigenetic influ- ences on the body contribute to the development of radically new debates within anthro- pology on nature and culture (Lock 2013, 2015). However, it is worth noting that, for the people within the context of this study, the line between the biological and the social has always been very weakly drawn. The people of Dubai do recognise genes are biological agents, but they are simultaneously social ones, as I will discuss.

How people construct the notion of fate, or destiny, in relation to genes, is just as deli- cate as social and biological binaries. The language of genetics, premised on imaginations of the inevitability of nature, remains an instrument that can invoke a sense of fate, or a prescription of behaviour. This is further complicated by deeply held values of genetics as specifications of race, and by extension, ethnicity (Fullwiley 2007). As I have argued else- where (2014) one implication here is that many geneticists still wantonly operate under the same formulas for ‘national character’ that social sciences have accused the Oriental- ists of perpetrating, and that scholars have attempted to weed out of anthropology. In this way, the semantics of genes are translated outside of the laboratory to the public to give chemical and organic evidence towards national identity.

ANTHROPOLOGY & MEDICINE 75

In the Emirates, the relationship between genes and national identity often takes com- plex forms. While there exists a robust local knowledge of the mechanisms of inheritance and kinship in Southern Arabia that I have not the space to discuss in depth here, genes as biological entities are not necessarily part of, and not always associated with this inheri- tance and kinship. Genes are widely known as identity markers independent of kinship. They are widely known to be carriers of disease, but are not generally understood to con- tain the essence of, or the benign traits of, a person. The following brief conversation between myself and two of my participants, a debate on the genetic influences to, say, hair colour vs. diabetes, illustrates local incommensurability between genes and inheritance. It began from a popular discussion among my participants – what makes a person beautiful.

(Ali) ‘A woman’s hair comes from her mother, and that is why they are keeping it like this [silky, and pitch black]’

(Myself) ‘What about diabetes’, I asked, ‘is this something that comes from the mother or from the father?’

(Ali) ‘No, this one is genetic I think.’

(I continued) ‘Sure, but do you get it from your mother’s side of the family, or does it come from your father’s side of the family?.’

(Ali) ‘No, these ones, these diseases they are genetic.’

(Myself) ‘Fine, but where does it come from?’

(Ali) ‘No, yaani, they do not come from anywhere. I am trying to tell you that. They are not coming from anywhere.’

(Myself) ‘But if they are genetic, they are inherited from someone!’

(Ali) ‘Yes, but no it does not come from anywhere, yaani, this is why it means it is genetic’

(Myself) ‘Well, what does genetic mean?’

(Ali) ‘It means that you have genes… that it is because you are Arab or maybe like these peo- ple’, he points to a group of Filipinos who were working at the caf�e in which we met.

(Myself) ‘[The Filipinos] are genetic?’, I asked. The two men at the table could see that I was confused.

(Rahman) ‘Don’t you know that Arabs have these genes and that British people have these genes and all these peoples have these genes.’, one of them yells at me.

(Ali) ‘He means different genes’, his colleague explains.

(Rahman) ‘Yes, yaani, different genes all these people,’ he clarifies.

76 A. PARKHURST

(Myself) ‘Yes, I understand that, but where do these genes come from?’

(Ali) ‘But they are not coming from anywhere is what I am telling you. They are because they are these people… .’ His friend interrupts,

(Rahman)‘We are Arab so we have some of these ones [genes].’

(Myself) ‘Is being Arab genetic then?’, I asked.

(Rahman) ‘Yes, of course, and like you are coming here from England.’

(Myself) ‘Is being English genetic?’

(Rahman)‘Yes that is what we are trying to tell you.’

(Myself) ‘Ok, so is being Emirati genetic?.’ This question seemed to provoke some thinking. After a short time, they answered.

(Both) ‘No, this one is not genetic, it is coming from who your father is.’

The debate continued for some time. I asked about skin (from the mother), height (from the father), obesity (genetic), cancer (genetic), eyes (mother), gender (father), and so forth. I continued these questions with many people throughout my fieldwork, with more or less the same responses. In terms of pathology, diabetes, cancer, obesity, and both psychotic and non-psychotic mental illness: these conditions and behaviours were perceived to be genetic. However, certain types of nationality and general behaviour, and the phenotypic attributes of appearance were said to originate with parents, in the home, and in the womb. Ethnicity, as a concept, and as a broad signifier, is often slippery. Being ‘Arab’ or Chinese, or White-European, in local terms, was discussed as evidenced through genes. Being Emirati, for example, is inherited, but not genetic, while being Arab, and more spe- cifically, deriving from Southeastern Arabia at large (Bahrain, Qatar, Emirates, Oman, possibly Saudi, but not Yemen) is said to be informed through genetics. Beyond biology, many factors contribute to these designations: Bani-Yas tribal affiliations, ties to desert and coastal landscapes, concepts of wealth, constructs of purity, and language practices – but to name a few. While the limits of genetic influence in popular imagination provide further ethnographic evidence on the nature of agency in kinship and reproductive practi- ces, the ambiguous coupling between pathology and ethnicity speaks to the constructs of genes in this paper.

John Avise, in his monograph on the Genetic Gods (2001), extends genetic determinism to the structural realm of cosmology, attempting to ask and answer questions that are, for many people, religious. The link between genes and gods can be, Avise argues, a rather rapid one. Certainly, as invoked in the anecdote above regarding Al Djinn diabetes, there is evidence for this in my field-site as well. Here, of course, the connection is not made with ‘Gods’, but it is still made with religious cosmological entities. This synonymy and parallelism presents an anthropological question: If genes conjure up their own cosmology within the imagination, then is it reasonable to suggest that an already present and strong cosmology might inform genetics? In the Arabian Gulf, genetics have found an audience

ANTHROPOLOGY & MEDICINE 77

with which it was unfamiliar. The intentions behind its language are especially vulnerable. The men and women of the Emirates already have a very robust and complex language of their own with which they can engage fate. Genetic dissemination was bound, in some way, to be reworked under these powerful Arabic articulations. There is not space here to do justice to the diverse and encompassing language of fate in the Emirates, let alone the Arab-speaking world at large, and despite the complexity of fatalistic discourse in the region, modern ethnography conducted in the Arabian Peninsula remains sparse. This paper in many ways takes the presence of fatalistic language as an ethnographic given, even if the link between behaviour and discourse is often nebulous and even sometimes careless (Chaves 2010).

In terms of how genetics and fate are interwoven in Southeast Arabia in general, other research has provided insight in contexts outside chronic illness. Kilshaw (2015) has ana- lysed how maternal prospects, marriage and consanguinity highlights genetics and the management of risk in Qatari communities in and around Doha. Similarly, inherited blood disorders and genomic testing not only encroach upon marriage practice in Oman, but become novel signifiers of nationalism, history and identity in a context in which nor- mative concepts of time and history are politically prescribed (Beaudevin 2013).

The research presented here complements these works. Rather than simply replace the cultural models of the world that the Bedouin and coastal tribes of the UAE know to be true, foreign medical and scientific concepts are re-shaped and interpreted through the languages of the desert, themselves becoming common discursive elements of public knowledge. In thinking through ‘genes’ as agents of disease, and Djinn as ambiguous spi- rits of the desert, my participants see congruences. The slippage between Djinn and genes becomes a powerful metaphor to depict the fallacies inherent in the designs of globaliza- tion and in the assumptions embedded in Western scientific empiricism and dissemina- tion. The direct association between these terms is not as important here. What I argue is that the failure to recognise genetics as its own cosmology can indeed perpetuate suffering. I have argued that Emirati conceptions of the self and body in relation to nature, spirits and foreigners are challenged by the promises of globalization and modernity. As people move through the desert, the coast and the rapidly growing cities, their quest for an elu- sive notion of modernity ricochets into local systems of destiny, cosmology, agency, body practices, and kinship, and the languages one uses to articulate the ‘self’ and world are transformed.

The language of fate is a language in which genetics is often fully embedded. As dis- cussed, while the epistemology of ‘genetic determinism’ has been a trope borrowed in both social and biological landscapes in the West, fate is far more culturally owned in the Gulf, and in much wider social ways than in, say, the UK. Ideas of Islamic fatalism are often a proudly culturally owned category in the region. As Bourdieu has told us of the Kabalye, ‘Submission to nature is inseparable from submission to the passage of time scanned in the rhythms of nature’ (Bourdieu 1963, 57). Bourdieu was attempting to understand fate, fatalism, or determinism among his Islamic informants in Algeria. In his writings, his informants understood fate as scanned in the rhythms of nature. This is not foreign to my participants who invoke a similar symbolic association with nature, and specifically the moon, the tides, and even genes. However, these terms are variable in meaning for my participants depending on the context in which fate is invoked. I have argued elsewhere (2014) that ‘Islamic fatalism’ in the Gulf is often a poor concept. Rather

78 A. PARKHURST

than see themselves and their fate inescapable from the moon or the tide (as James Fraser (1990) has poetically described a century ago), I struggled to find notions of Islamic fatal- ism in common practice, and in the reality of people moving through their day. Rather, nature was a stable stage in which people took comfort. The coast, tides and waves, and above all, the desert, provided a language of empowerment, that the individual could effect change in the world, and should indeed do so. Oil provided an index of power that was granted from nature, and Bedouin reliance of the ever-stable Earth reinforced these motivations for planning, hurrying, scheming and creating – at least among the Gulf’s elite. Bourdieu’s Algerian notions of fate and hubris do exist in conversation and song, but they usually contradict practice.

Chronic illness in Dubai, and specifically diabetes, complicates applications of fate. Many of my participants, and especially young and middle-aged men, understood diabe- tes as something within the body that can make one sick, but not as a constant condition in and of itself. In times of diabetic distress, patients would eagerly seek immediate medi- cal attention, and then participate in health planning in the few days and weeks following their distress, though these behaviours would generally transform into old habits. It was difficult for many participants to imagine themselves being ill in those times in which they did not feel ill, and indeed felt normal. It is in these contexts that the language of fate and genes were simultaneously invoked.

In these ways, I have briefly outlined how genes in the Emirates become simultaneously tied to pathology, race, ethnicity, and fate. These relationships are made evident in local discourse in complex ways. Consanguinity, in the Emirates, for example, is increasing. Studies indicate that slightly over half of Emirati marriages are consanguineous (Al-Gazali et al. 1997). However, as the local population increases, and as the Emirati population has more access to education and health services, the rates of consanguinity have increased. Contrary to patterns in many other parts of the world, in the span of one generation, research indicates that rates of consanguineous relations have risen another 10%, and the preferred marriage is between first cousins (ibid.). Studies in the Emirates have attempted to examine the effects of the trends in these marriage practices on health patterns (Tadmouri 2009; Abdulrazzaq et al. 1997; Al Gazali 1995). However, new local under- standings of genetics give novel meanings to inheritance. Paired with traditional ideas of fate, and tied, again, to anxieties related to an ever-increasingly heterogenous city, genetic information, for many, may ironically help inform higher rates of consanguinity. Simi- larly, Kilshaw has collected narratives of women in a similar context in Qatar in which dialogue between genes, responsibilities towards health, arranged marriages, and familial obligations are constantly contested and negotiated (2015, this issue).

Chronic illness presents similar challenges. Chronic illness is, by its nature, confusing as negatively constructed pathology. If disease in general can be discussed through fatalis- tic terms, chronic illness, for which patients may not recognise or anticipate future symp- toms, becomes even more of a logical consequence of destiny. Race is constructed in Dubai as a profoundly positive form of cultural capital, and genes as markers of race are proudly owned. In other aspects of social engagement, many are very protective of what is acceptable as informed by genetics. Mental illnesses are often said to be genetic, but sexual behaviour is not, and my participants become deeply offended at suggestions that sexual behaviour might be informed from biology. When race, seen as a profoundly positive social capital, is made parallel to pathology in terms of genetic dissemination, an

ANTHROPOLOGY & MEDICINE 79

individual’s natural approach to their chronic illness often becomes marked by indiffer- ence, and on some occasions, might even be embraced as a socially owned form of cultural capital, regardless of the health consequences. As a result, emergent public genetic educa- tion on the ‘Arab’ genome, designed by health authorities to curb those habits that encourage and spread chronic illness, is local embraced as authoritative knowledge, mir- roring the language of fate that local residents have long used to articulate their world. However, as authoritative as the concept of the gene in the Emirates is, it fails to produce the behaviour change for which health planners have hoped. Indeed, the opposite effect has occurred, as the rates of diabetes and obesity continue to climb.

The body and the city

The ‘city’, however, creates a new and very real dilemma for those who inhabit them, and it has ramifications for the body. In my previous fieldwork, I set off to answer a very broad question in the Emirates: What happens to identity within indigenous culture when faced with globalization and modernization on such a rapid course? Dubai, perhaps more than anywhere else in the world, is well suited to afford opportunities to explore this question. The city itself became a protagonist, and a type of an anti-heroine. In the years I lived in the city, I was able to watch megaliths rise from the sand. Countless workers from South Asia spun webs of steel and scaffolding from dawn until after dusk. Every evening the towers were half a metre taller. One can drive somewhere in the morning, only to be lost when the road is wiped away by evening. The city is a fortress against nature, a place that – even for my participants – could not be, should not be. For many of the residents of Dubai, the city is an impossible landscape, save for the vision of the sheikhs, and the blessings of God. Dubai, for many local people, is itself an articulation of their sub-con- scious, arising from the dreams of their leaders who imagined the wealth of the city as they stared across what was once a tiny creek babbling along sand and rock. Because of this perception of Dubai as a materiality of local dream-scape, her betrayal is especially harsh. Many of the men and women who watched the first cargo come to Jebel Ali Port, and who remember the first hotels and towers, now feel that the city is designed for every- one except them. Some people act as if the city has its own agency, and there is a sense of amorality in its development, but for my participants, who are fiercely loyal to each other, to the sheikhs, and to Dubai, there is a sense that Dubai has not reciprocated, that at some point the city began to be disloyal.

The sand, the coast, and even the oil, previously dependable wells of wealth, gifted by the desert and the sea, that are worthy of their own ethnography (see Limbert 2010), are no longer the stable entities with which local people can pivot themselves against to enact identity. In this way, the relationship between people and their environment, as I have witnessed it in Dubai, becomes profoundly disrupted. Local Bedouin and Beni Yas tribal cosmology has long seen actors subject to the permanence of land and the inevitability of predictable – if sometimes oppressive – nature, the extreme reality of the desert, and the moon and tides in which they see the natural symbols of fate. The city, in this sense is pro- foundly disruptive. Cosmology which has long-depended upon a relationship between moving bodies and stable Earth fails to cohere in a landscape of rising monoliths, 14-lane highways, and an influx of cultures and languages from abroad, and, of course, genetic heterogeneity. Emiratis now compose less than 10 per cent of Dubai’s population. No

80 A. PARKHURST

longer the flexible bodies against the rock, many people develop a deep anxiety which lim- its their ability for action across the gambit of individual and social enterprise. In other words, the uncertainty of ‘modernity’, whatever ‘modernity’ is, makes thoroughly intoler- able the complexity of life’s choices.

I have studied how this frustration over uncertainty becomes enacted in local cosmol- ogy, among the Djinn of the Emirates, who lash out against both the past and the future (2014). Here, though, there are repercussions for the body, which is, in the Emirates at least, one of the casualties in the conflict between local identity and the changing urban landscape. Fate becomes enacted uniquely here. Rapid change radically disrupts people’s ability to see themselves in the future, and so ‘health seeking behaviour’ becomes desir- able, but highly directionless. Genes, too, take on further meaning in light of the shaky ground. As a biological category of both fate and ethnicity, they are relied upon to provide an anchor to identity when identity is under threat from a newly uncertain world. They become a cosmology in and of themselves, synonymous with tradition, and their associa- tion with pathology is forgiven, and even valued as a consequence of fate. Health educa- tion directed towards managing and preventing chronic illness asks the individual to imagine one’s body in the future. However, the body, as outlined at the beginning of this paper, is inexorably intertwined with the urban cosmos, and for many, the unstable, uncertain city makes this request for vision cognitively exasperating and disheartening. The city, as I have described, is a site of vacillation for my participants. They have called it, poetically, the ‘inescapable place of desire’, highlighting their deep frustrations. My par- ticipants are not usually resentful towards the city. Indeed, they often express deep love for it along with their exasperation. They do not want to city to collapse, but they are simultaneously overwhelmed by it. Diabetes and genes becomes enmeshed in this exas- peration, and many turn to concepts of fate to cope with their precariousness. Genes help concretise this fate within the human body.

Conclusion

I suggest as a final thought that both chronic illness and anxiety in the Emirates is partly the result of the ways in which many local people define what modernity means to them. Perhaps Dubai’s betrayal is that it grew too quickly. Foreigners come to the desert and sift in and out of memory and landscapes, but it is the local Emirati who are left to make sense of the shadows of all this movement. Genes and Djinn, germs and fate, SUVs and oil, sky- scrapers in the city, and the sands of the empty quarter all must be constantly reimagined, and it can be very arduous work. Emirati citizens value tradition and preservation, and they do want to preserve the new city. The task at hand is how paradoxically to create tra- dition and sustainability against a backdrop of something entirely new, but not just new, from something that has no firm foundation. Emirati locals do by and large know the steps they need to take for healthier lives, and they are educated on what health-seeking behaviours will drive communal health, but with both local imagination and local health structures, they lack novel frameworks in which these behaviours carry deeper meaning. For my participants, whilst their futures and their city sit upon volatile terrain, they hold steadfast to cosmologies that help anchor them to the world that they value, and they are fiercely proud of constructed Arab identifiers that help index their lives as both desert and urban people. Genes become valued as these identifiers, and are tied to conceptions of

ANTHROPOLOGY & MEDICINE 81

fate. Local people understand pathology when it is presented through genetic discourse, but in terms of the uncertainty of the city, and the threats the city presents to local iden- tity, pathology becomes equally tied to fate.

The systems which inform rising rates of obesity and diabetes around the world are massively complex, and there is a host of social and biological factors that inform these body categories. My simple existential point is that when suddenly faced with the intensely myriad choices of the modern world, many people (regardless of nationality, religion, gender or race) simply, and ironically, cannot make any. This includes choices on health and habits. It becomes profoundly difficult to consider the future body in a landscape that wantonly clouds future vision. In the context of Dubai’s rapid urban growth, residents rely upon structures of cosmology that they hold self-evident to cope with radical uncertainty, and they apply these cosmologies of the body and to emergent biomedical categories. In addition to health care education, and policy that addresses structural violence, in all its many forms, I argue that health-care planning and policy can still be profoundly informed by local cosmology, and it must take into account how the human figure pivots itself against a world that is, for many, no longer as sturdy and dependable as they once had known.

Ethical approval

This paper is derived from research that was conducted with ethics approval from UCL.

Acknowledgments

This paper would not be possible without the participation and help from my informants in the United Arab Emirates, and I am grateful for the time they have given me. The author would like to thank the editors of this special edition, Susie Kilshaw, Sahra Gibbon, and Margaret Sleeboom- Faulkner for their reviews and suggestions that helped develop this paper. The author also thanks the anonymous reviewers for their helpful comments and edits.

Disclosure statement

No potential conflict of interest was reported by the author.

ORCID

Aaron Parkhurst http://orcid.org/0000-0002-0762-0929

References

Abdulrazzaq, Y. M., A. Bener, L. I. Al-Gazali, A. I. Al-Khayat, R. Micallef, and T. Gaber. 1997. “A Study of Possible Deleterious Effects of Consanguinity.” Clinical Genetics 51: 167–173.

Al Gazali, L. I., A. Bener, Y. M. Abdulrazzaq, R. Micallef, A. I. Al-Khayat, and T. Gaber. 1997. “Consanguineous Marriages in the United Arab Emirates.” Journal of Biosocial Science 29 (4): 491–497.

Al-Gazali, L. I., A. H. Dawodu, K. Sabarinathan, and M. Varghese. 1995. “The Profile of Major Congenital Abnormalities in the United Arab Emirates (UAE) Population.” Journal of Medical Genetics 32: 7–13.

82 A. PARKHURST

Avise, John C. 2001. The Genetic Gods: Evolution and Belief in Human Affairs. Boston, MA: Harvard University Press. (first published 1998).

Beaudevin, Claire. 2013. “Old Diseases & Contemporary Crisis. Inherited Blood Disorders in Oman.” Anthropology & Medicine 20 (2): 175–189.

Bourdieu, Pierre. 1963. “The Attitude of the Algerian Peasant Toward Time.” In Mediterranean Countrymen: Essays in the Social Anthropology of the Mediterranean, edited by J. Pitt-Rivers, 55– 72. Paris: Mouton.

Bourgois, P. 2011. “Lumpen Abuse: The Human Rights Cost of Righteous Neoliberalism.” City and Society 23 (1): 2–12.

Burgoine, T., N. G. Forouhi, S. J. Griffin, N. J. Wareham, and P. Monsivais. 2014. “Associations Between Exposure to Takeaway Food Outlets, Takeaway Food Consumption, and Body Weight in Cambridgeshire, UK: Population Based, Cross Sectional Study.” British Medical Journal 348: g1464. doi:10.1136/bmj.g1464.

Cetateanua, A., and A. Jones. 2014. “Understanding the Relationship Between Food Environments, Deprivation and Childhood Overweight and Obesity: Evidence from a Cross Sectional England- Wide Study.” Health & Place 27: 68–76.

Chaves, Mark. 2010. “SSSR Presidential Address: Rain Dances in the Dry Season: Overcoming the Religious Congruence Fallacy.” Journal for the Scientific Study of Religion 49 (1): 1–14.

Church, T. S., D. M. Thomas, C. Tudor-Locke, P. T. Katzmarzyk, C. P. Earnest, R. Q. Rodarte, C. K. Martin, S. N. Blair, and C. Bouchard. 2011. “Trends Over 5 Decades in U.S. Occupation-Related Physical Activity and Their Associations with Obesity.” PLoS ONE 6(5): e19657.

Csordas, Thomas J. 1994. Embodiment and Experience: The Existential Ground of Culture and Self. Cambridge, UK: Cambridge University Press.

Dawkins, Richard. 1999. The Extended Phenotype: The Long Reach of the Gene. Oxford, UK: Oxford University Press.

Douglas, Mary. 1966. Purity and Danger. London: Routledge. Edwards, N. 2012. “Taking Action on Health Inequities: Essential Contributions by Qualitative

Researchers.” International Journal of Qualitative Methods 11: 61–63. Farmer, Paul. 2005. Pathologies of Power: Health, Human Rights, and The New War on The Poor.

Berkeley: University of California Press. Franklin, Sarah. 1995. “Science as Culture, Cultures of Science.” Annual Review of Anthropology 24:

163–184. Franklin, Sarah, and Roberts, Celia. 2006. An Ethnography of Preimplantation Genetic Diagnosis.

Princeton, NJ: Princeton University Press Frazer, J. G. 1990. “The Golden Bough.” In The Golden Bough, 701–711. London, UK: Palgrave

Macmillan. Fullwiley, Duana. 2007. “Race and Genetics: Attempts to Define the Relationship.” Biosocieties

2 (02): 221–237. Gibbon, Sahra, and Novas, Carlos. 2008. Biosocialities, Genetics and the Social Sciences: Making

Biologies and Identities. London: Routledge. Hage, Ghassan. 2010. “Hating Israel in the Field.” In Emotions in the Field, edited by James Davies

and Dimitrina Spencer, 129–154. Palo Alto, CA: Stanford University Press. Hamdy, Sherine F. 2008. “When the State and Your Kidneys Fail: Political Etiologies in an Egyptian

Dialysis Ward.” American Ethnologist 35 (4): 553–569. Hamdy, Sherine F. 2009. “Islam, Fatalism, and Medical Intervention: Lessons from Egypt on the

Cultivation of Forbearance (Sabr) and Reliance on God (Tawakkul).” Anthropological Quarterly 82 (1): 173–196.

International Diabetes Federation. 2010. IDF Diabetes Atlas. 5th ed. Accessed 04 April 2017. http:// www.diabetesatlas.org/resources/previous-editions.html, http://www.allcountries.org/ranks/dia betes_prevalence_country_ranks.html

International Diabetes Federation. 2015. IDF Diabetes Atlas. 7th ed. Accessed 04 April 2017. http:// www.diabetesatlas.org/resources/previous-editions.html

Kilshaw, S., T. Al Raisi, and F. Alshaban. 2015. “Arranging Marriage; Negotiating Risk: Genetics and Society in Qatar.” Anthropology & Medicine 22 (2): 98–113.

ANTHROPOLOGY & MEDICINE 83

Latour, Bruno. 1993. We Have Never Been Modern. Translated by C. Porter. London: Harvester Wheatsheaf.

Leatherman, Thomas L., and Alan Goodman. 2005. “Coca-Colonization of Diets in the Yucatan.” Social Science & Medicine (The Social Production of Health: Critical Contributions from Evolu- tionary, Biological and Cultural Anthropology: Papers in Memory of Arthur J. Rubel. The Social Production of Health: Critical Contributions from Evolutionary, Biological and Cultural Anthropology: Papers in Memory of Arthur J. Rubel). 61 (4): 833–846. doi:10.1016/j. socscimed.2004.08.047.

Limbert, Mandana. 2010. In the Time of Oil: Piety, Memory, and Social Life in an Omani Town. Palo Alto, CA: Stanford University Press.

Lock, Margaret. 2013. “The Epigenome and Nature/Nurture Reunification: A Challenge for Anthropology.” Medical Anthropology 32 (4): 291–308.

Lock, Margaret. 2015. “Comprehending the Body in the Era of the Epigenome.” Current Anthropol- ogy 56 (2): 151–177.

Mendenhall, E., R. A. Seligman, A. Fernandez, and E. A. Jacobs. 2010. “Speaking Through Diabetes: Rethinking the Significance of Lay Discourses on Diabetes.” Medical Anthropology Quarterly 24 (2): 220–239.

“Metropolis”. 1927. Dir. Fritz Lang [Film]. Germany: Universum Film AG. Mumford, Lewis. 1934. Technics and Civilization. New York: Harcourt, Brace & Company. Napier, A. David, Clyde Ancarno, Beverley Butler, Joseph Calabrese, Angel Chater, Helen Chatter-

jee, François Guesnet, et al. 2014. “Culture and Health.” The Lancet 384 (9954): 1607–1639. Parkhurst, A. L. 2014. “Genes and Djinn: Identity and Anxiety in Southeast Arabia.” PhD diss, UCL

(University College London). Popenoe, Rebecca. 2003. Feeding Desire: Fatness, Beauty and Sexuality Among a Saharan People:

Fatness and Beauty in the Sahara. London: Routledge Rabinow, P. 1996. Artificiality and Enlightenment: From Sociobioloy to Biosociality. Essays on the

Anthropology of Reason. Princeton, NJ: Princeton University Press. Rabinow, Paul. 2008. “Afterword. Concept Work.” In Biosocialities, Genomics and the Social Scien-

ces; Making Biologies and Identities, edited by S. Gibbon and C. Novas, 188–193. London: Routledge.

Randall, S. C. 2011. “Fat and Fertility, Mobility and Slaves: Long-Term Perspectives on Tuareg Obe- sity and Reproduction.” In Fatness and the Maternal Body: Women’s Experiences of Corporeality and the Shaping of Social Policy, edited by M. Unnithan-Kumar and S. Tremayne, 43–70. Oxford: Berhahn.

Scheper-Hughes, Nancy, and Margaret M. Lock. 1987. “The Mindful Body: A Prolegomenon to Future Work in Medical Anthropology.” Medical Anthropology Quarterly 1 (1): 6–41.

Senior, V., T. M. Marteau, and T. J. Peters. 1999. “Will Genetic Testing for Predisposition for Dis- ease Result in Fatalism? A Qualitative Study of Parents Responses to Neonatal Screening for Familial Hypercholesterolaemia.” Social Science & Medicine 48 (12): 1857–1860.

Sennett, Richard. 1994. Flesh and Stone: The Body and the City in Western Civilization. New York: W.W. Norton

Tadmouri, Ghazi O., Pratibha Nair, Tasneem Obeid, Mahmoud T. Al Ali, Najib Al Khaja, and Hanan A. Hamamy. 2009. “Consanguinity and Reproductive Health Among Arabs.” Reproduc- tive Health 6: 17. doi:10.1186/1742-4755-6-17.

84 A. PARKHURST

Copyright of Anthropology & Medicine is the property of Routledge and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

  • Abstract
  • Introduction
  • Diabetes in the Emirates
    • Diabetes and fate
  • The body and the city
  • Conclusion
  • Ethical approval
  • Acknowledgments
  • Disclosure statement
  • References

journal.pone.0203644.pdf

RESEARCH ARTICLE

Population structure and gene flow of the

tropical seagrass, Syringodium filiforme, in the

Florida Keys and subtropical Atlantic region

Alexandra L. Bijak 1*, Kor-jent van Dijk2, Michelle Waycott2,3

1 Department of Environmental Sciences, University of Virginia, Charlottesville, Virginia, United States of

America, 2 School of Biological Sciences, Environment Institute, Australian Centre for Evolutionary Biology

and Biodiversity, University of Adelaide, Adelaide, South Australia, Australia, 3 State Herbarium of South

Australia, Department of Environment, Water and Natural Resources, Adelaide, South Australia, Australia

* [email protected]

Abstract

Evaluating genetic diversity of seagrasses provides insight into reproductive mode and

adaptation potential, and is therefore integral to broader conservation strategies for coastal

ecosystems. In this study, we assessed genetic diversity, population structure and gene

flow in an opportunistic seagrass, Syringodium filiforme, in the Florida Keys and subtropical

Atlantic region. We used microsatellite markers to analyze 20 populations throughout the

Florida Keys, South Florida, Bermuda and the Bahamas primarily to understand how

genetic diversity of S. filiforme partitions across the Florida Keys archipelago. We found low

allelic diversity within populations, detecting 35–106 alleles across all populations, and in

some instances moderately high clonal diversity (R = 0.04–0.62). There was significant

genetic differentiation between Atlantic and Gulf of Mexico (Gulf) populations (FST = 0.109 ± 0.027, p-value = 0.001) and evidence of population structure based on cluster assignment,

dividing the region into two major genetic demes. We observed asymmetric patterns in gene

flow, with a few instances in which there was higher than expected gene flow from Atlantic to

Gulf populations. In South Florida, clustering into Gulf and Atlantic groups indicate dispersal

in S. filiforme may be limited by historical or contemporary geographic and hydrologic barri-

ers, though genetic admixture between populations suggests exchange may occur between

narrow channels in the Florida Keys, or has occurred through other mechanisms in recent

evolutionary history, maintaining regional connectivity. The variable genotypic diversity, low

genetic diversity and evidence of population structure observed in populations of S. filiforme

resemble the population genetics expected for a colonizer species.

Introduction

Genetic diversity is paramount to the long-term survival of populations, as genetic variation

provides the basis for adaptation to environmental change via natural selection and confers

short-term fitness advantages at the population level. Population structure and gene flow,

PLOS ONE | https://doi.org/10.1371/journal.pone.0203644 September 5, 2018 1 / 18

a1111111111

a1111111111

a1111111111

a1111111111

a1111111111

OPEN ACCESS

Citation: Bijak AL, van Dijk K-j, Waycott M (2018)

Population structure and gene flow of the tropical

seagrass, Syringodium filiforme, in the Florida Keys

and subtropical Atlantic region. PLoS ONE 13(9):

e0203644. https://doi.org/10.1371/journal.

pone.0203644

Editor: Heather M. Patterson, Department of

Agriculture and Water Resources, AUSTRALIA

Received: April 7, 2018

Accepted: August 26, 2018

Published: September 5, 2018

Copyright: © 2018 Bijak et al. This is an open access article distributed under the terms of the

Creative Commons Attribution License, which

permits unrestricted use, distribution, and

reproduction in any medium, provided the original

author and source are credited.

Data Availability Statement: Complete

microsatellite genotype data are available from the

Dryad database (accession number doi:10.5061/

dryad.pp0q255).

Funding: Financial support for this study was

provided by the Jones Environmental Research

Endowment to the Department of Environmental

Sciences at the University of Virginia. The funders

had no role in study design, data collection and

analysis, decision to publish, or preparation of the

manuscript.

which describe the level of genetic differentiation and connectivity between populations, are

important components and drivers of genetic diversity. Quantifying genetic diversity within

and among natural populations enhances conservation efforts because genetic patterns are dif-

ficult to predict given the complex suite of environmental and biological factors that contribute

to genetic diversity and population structure [1]. In seagrass ecosystems, genetic diversity is of

particular concern because within-species diversity may replace the functional role of species

diversity due to the limited number of species present in seagrass communities [2]. Within-

species diversity is also important to the short-term population persistence of seagrasses as

genetically diverse assemblages of multiple unique genotypes (or clones) promote greater resis-

tance and faster recovery following disturbance [3,4]. Successional stage, an aggregate category

based on multiple traits, provides an ecological lens to assess broad patterns in plant genetic

diversity and population structure. Examining population genetics within the context of eco-

logical succession aids in identifying traits that foster resilience in populations of foundational

taxa such as seagrasses, and thereby the ecosystems they support.

Theory suggests early successional species are expected to have diminished genetic diversity

due to founder effects and to develop strong population structure due to limited gene flow,

while later successional species are typified by greater standing genetic diversity and weaker

population structure [5]. In terrestrial ecosystems, long-lived woody species tend to have more

genetic diversity within populations and less variation between populations based on allozyme

studies [6] in congruence with expectations, though patterns in terrestrial pioneering species

are less clear. Populations of an early successional species, Silene dioica, in the Gulf of Bothnia show strong differentiation when the supply of colonists is limited [7], while other European

early colonizing plant species have higher than expected genetic diversity within populations

and low genetic differentiation between populations [8,9]. The relationship between succes-

sional status and population genetics in seagrasses, however, has not been thoroughly

explored.

Seagrasses present an opportunity to study environmental and ecological determinants of

genetic diversity because they comprise a globally distributed, paraphyletic taxon that has

evolved from up to four independent lineages [10] and represents a spectrum of life history

strategies. Analyses of diversity, population structure and gene flow can reveal biological and

physical phenomena that promote or deter the exchange of genetic material across popula-

tions. Species biological traits such as breeding system, pollination mechanisms and dispersal

capability strongly influence genetic diversity as measured by genotypic diversity, gene copy

(or allele) diversity and heterozygosity [11]. The capability to propagate through horizontal

rhizome expansion and reproduction by seed has led to early notions that seagrasses are pre-

dominantly clonal and therefore lack genetic diversity [12,13]. The development of high-reso-

lution markers prompted studies that have countered this expectation by detecting higher

genetic diversity than initially reported for several seagrass species [14], generating questions

regarding the role of dispersal and sexual reproduction in shaping seagrass population genet-

ics. Environmental conditions, such as water quality, prevailing winds and local water move-

ment, contribute to fine-scale population genetic structure in seagrasses [15,16], while

geographic history, including glaciation and continental drift, in conjunction with modern

gene flow patterns influenced by oceanic hydrology, determine genetic connectivity at broader

spatial scales [17]. In this study, we described the genetic diversity, population structure and

gene flow of the opportunistic seagrass, Syringodium filiforme, in the Florida Keys and subtrop- ical Atlantic region.

S. filiforme is widely distributed throughout the western tropical and subtropical Atlantic Ocean in shallow coastal and back reef environments [18,19], and is a common species in sea-

grass meadows that cover as many as 17,629 km 2

of South Florida coastline [20]. These

Genetic diversity of the seagrass, Syringodium filiforme, in the subtropical Atlantic

PLOS ONE | https://doi.org/10.1371/journal.pone.0203644 September 5, 2018 2 / 18

Competing interests: The authors have declared

that no competing interests exist.

meadows support marine food webs including epiphytic algae to large grazers, deliver ecosys-

tem services by stabilizing coastal sediments and improving local water quality, and have

recently been recognized for their role in carbon storage [21,22]. Seagrasses in this region are

threatened by local impacts related to water quality such as sedimentation and nutrient over-

enrichment [23–25] and have experienced substantial die-offs within the past several decades,

notably in Florida Bay [26–29] and Tampa Bay [30]. In order to understand the full impact of

environmental decline and perturbations in seagrass ecosystems, evaluating existing levels of

genetic diversity, population structure and gene flow is essential. Sampling locations for this

study spanned across tens of kilometers in the focal area of the Florida Keys and South Florida,

but also included remote populations in Bermuda and the Bahamas at distances of hundreds

to thousands of kilometers apart in order to compare diversity in South Florida populations to

diversity across more distant populations.

S. filiforme generally dominates early successional meadows because it has relatively high horizontal rhizome elongation rates [31] and tolerates sediment conditions that are less favor-

able to other dominant seagrasses, but is also present in the climax state [32]. These traits

enable S. filiforme to quickly colonize bare areas through clonal propagation, but also through seed and vegetative fragment dispersal, especially following disturbance [33]. Based on the

ability to colonize, reproduce by seed, and generate and maintain substantial biomass, Kilmin-

ster et al. [34] categorized Syringodium species as opportunistic, exhibiting a mixture of life his- tory traits found in both colonizing and persistent species. Previous studies have characterized

the genetic diversity of other common South Florida seagrasses, Thalassia testudinum and Halodule wrightii. In line with theory, T. testudinum, a late successional seagrass species, exhib- ited high genetic diversity within populations and weak genetic structure in Florida Bay and

the Lower Keys (regions within the Florida Keys are typically described on a north-south basis

as Upper, Middle and Lower Keys) [35–37]. As expected for an early colonizer and opportu-

nistic species, most of the genetic variation in H. wrightii partitioned among populations rather than within populations in a study focused on the Gulf of Mexico and Florida Bay [38]. We

hypothesized S. filiforme would exhibit genetic diversity and population structure patterns sim- ilar to those expected for colonizer species, and would therefore reveal high clonality, low

genetic diversity and strong differentiation between populations throughout the Florida Keys

and wider study area.

South Florida coastal waters exhibit particularly complex hydrology, especially around the

Florida Keys, an archipelago that spans 350 km from the South Florida mainland to Key West,

separating the Gulf of Mexico and Florida Bay from the Atlantic Ocean [39]. The distribution

of S. filiforme across the Florida Keys ranges from marginal and patchy in northeastern Florida Bay, to sparse in offshore intermixed beds on the Atlantic Ocean (hereafter referred to as

Atlantic) side, to dense, monospecific stands along the Middle and Lower Keys on the Gulf of

Mexico (hereafter referred to as Gulf) side [40,41]. The division created by the Florida Keys

archipelago separates geographically proximal S. filiforme populations in the Atlantic and Gulf basins, leading us to predict these basins host genetically distinct populations due to limited

opportunity for propagule exchange and gene flow across a physical barrier. We expected the

Bahamas population to have high genetic connectivity with the Florida populations because of

its relative proximity, but the Bermuda population to be genetically distinct from the Florida

populations due to its geographic isolation.

In this study, we used species-specific microsatellite loci to assess genetic diversity, popula-

tion genetic structure and connectivity via gene flow in S. filiforme across the Florida Keys and subtropical Atlantic region. We examined 1) whether clonality varies within 100s of m

2 and

1000s of m 2

spatial scales; 2) the relative amounts of genetic diversity present within individu-

als and populations of S. filiforme; 3) the degree to which genetic differentiation among

Genetic diversity of the seagrass, Syringodium filiforme, in the subtropical Atlantic

PLOS ONE | https://doi.org/10.1371/journal.pone.0203644 September 5, 2018 3 / 18

populations results in population structure; and 4) whether there are patterns in the magnitude

and direction of gene flow between populations.

Methods

Sample collection

We sampled a total of 20 meadows, hereafter termed populations, in South Florida, the Baha-

mas and Bermuda (Fig 1; see S1 Table for site GPS coordinates) following three sampling

designs over the summers of 2014 and 2015. In 2014, we sampled within a ~ 2,500 m 2

area to

estimate clonal extent. After detecting unique genotypes within meters of each other, we

reduced the sampling area in 2015 and modified the sampling area for Florida Bay and Ber-

muda in order to accommodate for greater patchiness of S. filiforme meadows. Genetic data collected with uneven but similar sampling schemes are comparable when using unique geno-

types for regional analyses of genetic diversity and population structure [42], assuming the

alleles detected are representative of the areas sampled [43,44]. The use of three sampling

approaches did not allow for direct comparison of genotypic diversity across sampling designs;

however, the primary goal of this study was to determine allelic diversity in order to evaluate

Fig 1. Map of study area and sampling locations. The inset map shows the relative positions of the Florida Keys, Tampa Bay, the Bahamas and

Bermuda. The main map shows the positions of the Florida Keys sampling locations. Site numbers are displayed to minimize text in the figure

(corresponding site names are available in Table 1). Sampling methodology is represented by shape in both the main and inset maps (sampling area of ~

2,500 m 2 : circle; sampling area of ~ 500 m

2 : square; Florida Bay and Bermuda–composite sampling areas of ~ 70 m

2 : triangle).

https://doi.org/10.1371/journal.pone.0203644.g001

Genetic diversity of the seagrass, Syringodium filiforme, in the subtropical Atlantic

PLOS ONE | https://doi.org/10.1371/journal.pone.0203644 September 5, 2018 4 / 18

population structure and regional connectivity, not to determine fine-scale population struc-

ture or the spatial distribution of clones.

In 2014, we sampled eight populations in the Upper and Middle Keys on the Atlantic side,

two populations on the Gulf side (Sprigger and Sluiceway) and a single population in Tampa

Bay. In 2015, we sampled six populations in the Middle and Lower Keys on the Gulf side and

one population in the northeastern portion of Florida Bay. Additionally in 2015, we sampled

single populations in San Salvador, the Bahamas and Bailey’s Bay, Bermuda. Leaves of at least

50 individual S. filiforme ramets were randomly collected within a ~ 2,500 m2 sampling area, spaced 5 m apart, for the 2014 collection, and within a ~ 500 m

2 sampling area, spaced 1.5 m

apart, for most of the 2015 collection. At the Florida Bay and Bermuda sites where the distribu-

tion of S. filiforme was limited, six and five smaller areas (spaced < 1 km) were sampled, respectively. Within each area, 24 leaves were collected from ramets (spaced 1.5 m apart) in a

~ 70 m 2

sampling area.

Ethics statement

Permits were required for sample collection in the Lower Florida Keys (Florida Keys National

Marine Sanctuary) and Florida Bay (Everglades National Park) in 2015 because sediment was

collected in addition to seagrass plant tissue for supplemental analyses; sampling in these areas

was conducted under FKNMS-2015-085 and EVER-2013-SCI-0058, respectively. Sampling in

Bermuda was conducted under the Bermuda Dept. of Conservation Services License no. 15-

04-16-22.

Genotyping

Total genomic DNA was extracted from the samples collected in 2014 using a DNeasy™ Plant Kit (QIAGEN) according to the manufacturer’s instructions. Extracted DNA was quantified

on a Qubit1 2.0 Fluorometer (Invitrogen). Samples collected in 2015 were sent to the Univer-

sity of Wisconsin Biotechnology (University of Wisconsin, Wisconsin, USA) for extraction

and quantification. DNA was extracted from 40–50 mg of dried leaf tissue using the CTAB

method as described in Saghai-Maroof et al. with minimal modification [45]. Following elu-

tion, a final DNA cleaning step was performed using a 1.5:1 by volume ratio of Axygen Clean-

Seq beads (Corning Life Sciences, Corning, NY, USA) to extracted DNA to remove any

remaining inhibitory compounds in the sample. DNA was quantified using Quant-IT Pico-

Green fluorescent dye (Thermo Fisher, Waltham, MA, USA). All extracted DNA was diluted

to a concentration of ~5ng μL-1. For some samples, DNA extraction was unsuccessful due to the poor tissue quality of senescing seagrass leaves, reducing the sample size for several sites.

A total of 17 microsatellite loci were amplified using fluorescently labeled primers [46].

PCR was conducted in three PCR multiplex panels using a Type-it1 Microsatellite Multiplex

PCR Kit (QIAGEN) in 10 μreactions with 0.5 μL of 2 μM primer mix and 1 μL of diluted tem- plate DNA. PCR conditions were set to the manufacturer’s optimized cycling conditions (QIA-

GEN). PCR products were sequenced on a capillary-based 3730xl DNA Analyzer (Applied

Biosystems) with an internal ET-ROX 500 size standard at the Georgia Genomics Facility

(University of Georgia, Georgia, USA). Fragment lengths for each locus were determined

using the Geneious v7.1.9 (Biomatters Ltd.) and microsatellite plugin [v1.4.0]. Verification

samples from 2014 were included in 2015 PCR and sequencing steps to assess the reproducibil-

ity of our methods. Approximately 32% of the verification sample loci either did not success-

fully amplify during PCR or did not produce microsatellite peaks when sequenced, likely due

to pipetting error. When verification samples were successfully amplified and sequenced,

Genetic diversity of the seagrass, Syringodium filiforme, in the subtropical Atlantic

PLOS ONE | https://doi.org/10.1371/journal.pone.0203644 September 5, 2018 5 / 18

discrepancies between microsatellite peaks in 2014 and 2015 occurred for less than 3% of

samples.

Within-population genetic diversity

The number of unique multi-locus genotypes (MLGs), G, the probability individuals sharing the same genotype were derived via separate sexual events, (Psex), and the probability of clonal identity, (Pgen), were estimated for each population using GENCLONE version 2.0 [47,48]. Unique MLGs were identified under the assumption that scoring error and somatic mutation

rates were negligible (genotypes with a single allele difference were considered distinct). We

tested for the presence of null alleles across all populations using ML-Null Freq with 100,000

randomizations [49]. Genotypic richness, R, the proportion of genetically distinct individuals (or genets) in the population, was calculated as R = (G-1)/(N-1) [50].

For the remainder of population genetic analyses, replicate MLGs were removed from the

dataset to avoid allele frequency bias due to the presence of clones. The total number of alleles,

A, average number of alleles per locus, NA, and average allelic richness per locus standardized by smallest sample size, AR, were calculated using the ‘diveRsity’ package [51] in R [52]. For each population, observed heterozygosity (Ho), expected heterozygosity (He), and deviation from Hardy-Weinberg equilibrium as measured by the inbreeding coefficient, FIS, were calcu- lated in GENALEX version 6.5 [53,54]. We calculated linkage disequilibrium for each population

using log-likelihood tests in GENEPOP version 4.2 [55,56] and determined significance using a

sequential Bonferroni correction to account for multiple comparisons.

Genetic differentiation between populations

An analysis of molecular variance (AMOVA) was performed first on all populations to assess

overall genetic differentiation, and again with only populations bordering the archipelago

(populations with numeric codes 1–16 in Fig 1) in a nested design to evaluate differentiation

between the Gulf and Atlantic populations, following the assumptions of the Infinite Allele

Model in GENODIVE [57]. Standard deviations for AMOVA F-statistics were calculated by jack-

knife resampling over loci, and 999 permutation tests were used to assess significance. Fixation

indices Weir and Cockerham’s FST [58] and Jost’s D [59] were calculated for all possible pair- wise population combinations using the ‘diveRsity’ package in R. Statistical significance was

determined by 95% confidence intervals derived from bias corrected bootstrapping. Principal

components analysis (PCA) was performed in GENODIVE using a covariance matrix based on

individual allele frequencies to determine whether geographically proximal samples exhibit

similar allele frequencies, but without the assumption of hierarchical genetic structure.

Population structure and gene flow

To determine the most likely number of population clusters, K, population assignment utiliz-

ing a Bayesian approach was performed in the genetic software program STRUCTURE [60].

Admixture was specified in the model, allowing genotypes to show membership to more than

one cluster. The correlated allele frequency model was selected and sampling locations were

not used as priors in the analysis. Model parameters were set to K = 1–20, with 10 iterations

run for each K, and an initial burn-in period of 100,000 iterations (sufficient for α, FST to con- verge) followed by 1,000,000 Markov Chain Monte Carlo repetitions. The most likely number

of population clusters was determined by the ad hoc quantity, ΔK [61]. Complementary soft- ware programs, CLUMPAK [62] were used for downstream processing, and DISTRUCT was used for

visual representation of the results [63].

Genetic diversity of the seagrass, Syringodium filiforme, in the subtropical Atlantic

PLOS ONE | https://doi.org/10.1371/journal.pone.0203644 September 5, 2018 6 / 18

Average total migration, Nm, was estimated using FST [64] and rare alleles methods [65] in GENALEX and GENEPOP, respectively. Additionally, pairwise relative migration rates were esti-

mated using Alcala’s Nm [66] and directionality of differentiation was estimated according to methods developed by Sundqvist et al. [67] using ‘diveRsity’ in R.

Results

Within-population genetic diversity

For most populations, Pgen ranged from 3.3 x 10 −8

to 7.0 x 10 −3

, indicating there was a low

probability of generating the observed genotypes under Hardy-Weinberg Equilibrium condi-

tions. Psex ranged from 1.7 x 10 −7

to 2.0 x 10 −3

, though there were higher values for Psex in the following Florida populations: Crane (0.094), Key West (0.15), Tampa Bay (0.063), and Florida

Bay (0.19). The few instances in which Psex exceeded 0.05 occurred in populations dominated by few clones, thereby inflating Psex, and were unlikely to have greatly impacted the accuracy of heterozygosity estimates and other statistical analyses performed. The 37 instances (of the total

2,720 pairwise comparisons) in which linkage disequilibrium was significant after a Bonferroni

correction was applied (p-value < 0.003) were also unlikely to affect subsequent population genetic analyses.

Genotypic richness was highly variable among populations, ranging from 0.04 to 0.62

(Table 1). Genotypic richness for Florida Keys populations sampled in 2014 and 2015 ranged

from 0.37 to 0.62 and from 0.05 to 0.43, respectively. Genotypic richness values may have been

Table 1. Summary genetic statistics for all populations.

Population N G R A NA AR Ho He FIS

1 Carysfort 48 19 0.38 98 5.76 2.53 0.51 ± 0.08 0.43 ± 0.07 -0.16 ± 0.04 2 Elbow 45 28 0.61 106 6.24 2.53 0.47 ± 0.08 0.43 ± 0.07 -0.08 ± 0.02 3 Dixie 50 20 0.39 88 5.18 2.48 0.44 ± 0.08 0.39 ± 0.07 -0.10 ± 0.05 4 Conch 47 18 0.37 85 5.00 2.47 0.51 ± 0.09 0.40 ± 0.06 -0.23 ± 0.05 5 Davis 47 22 0.46 102 6.00 2.56 0.48 ± 0.07 0.44 ± 0.07 -0.09 ± 0.02 6 Molasses 48 22 0.45 93 5.47 2.55 0.49 ± 0.07 0.43 ± 0.07 -0.13 ± 0.01 7 Alligator 45 19 0.41 90 5.29 2.55 0.45 ± 0.07 0.42 ± 0.06 -0.08 ± 0.05 8 Tennessee 46 29 0.62 102 6.00 2.51 0.41 ± 0.07 0.40 ± 0.07 -0.04 ± 0.04 9 Sprigger 32 18 0.55 73 4.29 2.49 0.44 ± 0.07 0.38 ± 0.06 -0.17 ± 0.03

10 Sluiceway 48 22 0.45 66 3.88 2.43 0.42 ± 0.07 0.34 ± 0.06 -0.20 ± 0.04 11 Marathon 22 10 0.43 67 3.94 2.45 0.44 ± 0.09 0.42 ± 0.05 0.08 ± 0.14 12 Pigeon 43 17 0.38 75 4.41 2.44 0.38 ± 0.06 0.33 ± 0.05 -0.12 ± 0.05 13 Bahia Honda 47 15 0.30 67 3.94 2.43 0.39 ± 0.08 0.35 ± 0.06 -0.09 ± 0.07 14 Water 39 12 0.29 67 3.94 2.47 0.42 ± 0.07 0.38 ± 0.05 -0.10 ± 0.08 15 Crane 31 12 0.37 59 3.47 2.37 0.29 ± 0.06 0.28 ± 0.05 -0.05 ± 0.08 16 Key West 23 2 0.05 35 2.06 2.03 0.41 ± 0.12 0.21 ± 0.06 -0.94 ± 0.04 17 Tampa Bay 33 6 0.16 47 2.76 2.24 0.21 ± 0.07 0.18 ± 0.05 -0.09 ± 0.08 18 Florida Bay 123 6 0.04 54 3.18 2.36 0.41 ± 0.08 0.34 ± 0.05 -0.19 ± 0.10 19 Bahamas 44 19 0.42 69 4.06 2.37 0.32 ± 0.08 0.29 ± 0.07 -0.08 ± 0.05 20 Bermuda 107 20 0.18 67 3.94 2.39 0.26 ± 0.06 0.29 ± 0.06 0.12 ± 0.07

Numeric codes are provided alongside location name for each population. Sample size (N), number of unique multilocus genotypes (G), genotypic richness (R), total number of alleles (A), average number of alleles per locus (NA), allelic richness per locus (AR), observed heterozygosity (Ho), expected heterozygosity (He) and inbreeding coefficient (FIS) are reported for each population. Standard error is included for Ho, He and FIS. Values in bold indicate significant deviation from Hardy- Weinberg equilibrium at p-value < 0.05.

https://doi.org/10.1371/journal.pone.0203644.t001

Genetic diversity of the seagrass, Syringodium filiforme, in the subtropical Atlantic

PLOS ONE | https://doi.org/10.1371/journal.pone.0203644 September 5, 2018 7 / 18

overestimated because we did not account for scoring error or somatic mutation when identi-

fying unique MLGs. The total number of alleles ranged from 35 to 106 and the average number

of alleles per locus ranged from 2.06 to 6.24. Once adjusted for sample size, allelic richness was

similar across all populations, ranging from 2.03 to 2.56. Observed heterozygosity ranged from

0.21 to 0.51, and expected heterozygosity ranged from 0.18 to 0.44. Deviation from Hardy-

Weinberg conditions was detected in nine populations (p-value < 0.05), most of which exhib- ited negative inbreeding coefficients. We found no significant effect of null alleles, except in

populations with few genets (G � 10) and loci for which all samples were homozygous for the

same allele, or fixed. Excluding populations with few genets and loci with fixed alleles, the

mean per locus significance of heterozygote deficiency due to null alleles across populations

ranged from 0.276 to 0.827.

Genetic differentiation between populations

AMOVA revealed significant genetic differentiation between all populations (FST = 0.149 ± 0.017, p-value = 0.001) and significant genetic differentiation between the Gulf and Atlantic populations (FST = 0.109 ± 0.027, p-value = 0.001). The results of pairwise population differen- tiation were consistent across both statistics, FST and Jost’s D (S2 Table), with maximum values calculated as 0.531 and 0.295, respectively. Similar patterns in relative differentiation between

populations were observed for both statistics, and differences were primarily in the magnitude

of pairwise values, thus only FST will be described in detail. Pairwise differentiation values were low to moderate within Atlantic populations (FST = 0.000–0.092), and low to high within Gulf populations (FST = 0.012–0.237). Pairwise differentiation values between Atlantic and Gulf populations ranged from 0.041 between Davis in the Upper Keys and Marathon near the Mid-

dle Keys, to 0.330 between Conch in the Upper Keys and Crane in the Lower Keys. Florida Bay

exhibited similar levels of differentiation between Gulf (FST = 0.144–0.273) and Atlantic (FST = 0.177–0.259) sites. Tampa Bay, the westernmost site sampled, exhibited high differentiation

between Atlantic populations (FST = 0.236–0.37) and moderate to high differentiation between Gulf populations (FST = 0.101–0.279). The highest overall pairwise differentiation was found between the Bahamas and Tampa Bay, where FST = 0.531. The next greatest values were found between the Bahamas and Gulf populations (FST = 0.261–0.473), and values were moderate between the Bahamas and Atlantic populations (FST = 0.192 0.259). Bermuda pairwise differ- entiation with Atlantic and Gulf sites was moderate to high, with FST values ranging from 0.142 to 0.224 and 0.176 to 0.280, respectively.

In the PCA, the first two principal component axes contained 18.3% and 7.7% of total vari-

ance, respectively (S1 Fig). The Atlantic and Gulf sites clustered separately, with some overlap

occurring, mostly between Gulf sites proximal to breaks in the Middle keys (Marathon, Pigeon

and Sprigger), and Atlantic sites. Tampa Bay clustered with the Gulf sites, while Bermuda clus-

tered between Gulf and Atlantic sites. The Bahamas clustered separately from all other sites.

Population structure and gene flow

Population structure was present, with greatest statistical support for K = 2 (ΔK = 297.26), fol- lowed by K = 4 (ΔK = 20.84) number of population clusters. For K = 2, Atlantic and Gulf popu- lations clustered separately, and Tampa Bay, Florida Bay and the Bahamas were assigned to

the Gulf cluster (Fig 2). The genotypes in the Bermuda population show mixed membership to

both the Atlantic and Gulf clusters. For K = 4, the Atlantic and Gulf populations still clustered

separately, and the Bahamas and Bermuda were assigned to distinct clusters. For both K = 2

and K = 4, Gulf sites proximal to breaks in the Middle Keys (Sprigger, Marathon and Pigeon)

exhibit admixture with Atlantic populations.

Genetic diversity of the seagrass, Syringodium filiforme, in the subtropical Atlantic

PLOS ONE | https://doi.org/10.1371/journal.pone.0203644 September 5, 2018 8 / 18

Average migration between all populations was 1.7 and 2.6 migrants per generation, follow-

ing the FST method and private alleles method, respectively. Relative pairwise migration was highest among Atlantic populations, ranging from 0.174 to 1, on a scale from 0 to 1 (Fig 3; S3

Table). Within the Atlantic group, lowest genetic exchange occurred from Conch to Elbow,

and the highest from Davis to Carysfort. Exchange within the Gulf populations (excluding Key

West) ranged from 0.029 to 0.792, and the greatest exchange occurred between Sluiceway and

Sprigger, both located on the western edge of Florida Bay. Exchange to and from Key West

was particularly low and did not exceed 0.084. There was greater relative migration from

Atlantic sites to Gulf sites proximal to a break in the Middle Keys (Marathon, Pigeon and

Sprigger) than there was from within the Gulf. Florida Bay exhibited relative migration rates

lower than 0.125 with greatest outgoing migration to the Atlantic site Davis. Tampa Bay exhib-

ited migration rates lower than 0.148 with highest migration coming from Gulf sites. The

Bahamas exhibited negligible migration rates, not exceeding 0.085. Incoming relative migra-

tion to Bermuda was always less than 0.067, while outgoing migration ranged from 0.01 to

0.23, with the highest rates of exchange occurring with the Atlantic populations.

Discussion

Within the Florida Keys and subtropical Atlantic region, S. filiforme exhibits low genetic diver- sity when compared with other temperate and tropical seagrass species. We found 1) the level

of clonality in S. filiforme, as measured by shared multilocus genotypes, to be highly variable among populations; 2) low allelic diversity and heterozygote excess in almost every population;

3) evidence of genetic differentiation and population structure, in which the sampled popula-

tions were assigned to two major demes separated by the Florida Keys archipelago; and 4)

asymmetric gene flow patterns, though average migration rates across all populations exceeded

one migrant per generation.

Fig 2. Diagrams of STRUCTURE cluster assignment. (A) K = 2 cluster assignment and (B) K = 4 cluster assignment. Population names are on the x-

axis, separated by black vertical bands. Individual genotypes are represented as vertical bars and cluster assignment is depicted by color.

https://doi.org/10.1371/journal.pone.0203644.g002

Genetic diversity of the seagrass, Syringodium filiforme, in the subtropical Atlantic

PLOS ONE | https://doi.org/10.1371/journal.pone.0203644 September 5, 2018 9 / 18

Genotypic richness of S. filiforme varied widely across sampling sites, though this cannot be completely explained by disparities in sampling area. Sample collections in 2014 were from

within an area of ~ 2,500 m 2 , in which genotypic richness ranged from 0.37 to 0.62. Sample

collections in 2015 were from within an area of ~ 500 m 2 , in which genotypic richness ranged

from 0.05 to 0.43. Therefore, within each sampling scheme, we observed a wide range in clon-

ality. In the larger areas sampled in 2014, we detected one genet present in two adjacent popu-

lations (Sprigger and Sluiceway), extending over hundreds of meters. For the 2015 collection

sites, sampling in a smaller total area with shorter distances between each shoot sampled may

have led to a decrease in detection efficiency of total genets and number of alleles present in

the population [68]. Allelic richness standardized by smallest sample size was consistent across

all sites, suggesting the observed higher number of alleles and unique MLGs in 2014 collection

populations were related to the spatial scale of sampling and do not necessarily indicate greater

diversity in the Atlantic populations. Low genotypic richness in some populations and low

allelic diversity in all populations of S. filiforme across the subtropical Atlantic region under- scores the advantage of clonal reproduction in this environment. It is also possible that we

observed an edge-of-range effect [69] in which populations of a species closer toward their

range limit express lower genetic diversity than populations in the center of the species’ range.

Though our study did not span across the center of distribution for S. filiforme, we would expect to find greater levels of diversity in Caribbean populations.

The strongest population structure clearly develops in South Florida, where Tampa Bay,

though hundreds of kilometers away from the Florida Keys, groups with Gulf populations,

suggesting the Florida Keys archipelago presented historical barriers to gene flow between

Fig 3. Diagram of relative magnitude and direction of gene flow. Nodes represent populations (refer to Table 1 to match numeric

codes with location names). Arrows are weighted according to Alcala’s Nm values (S3 Table), which range from 0.004 to 1.000, and

arrowheads show the estimated direction of gene flow.

https://doi.org/10.1371/journal.pone.0203644.g003

Genetic diversity of the seagrass, Syringodium filiforme, in the subtropical Atlantic

PLOS ONE | https://doi.org/10.1371/journal.pone.0203644 September 5, 2018 10 / 18

Gulf and Atlantic demes, and perhaps continues to impede gene flow with contemporary land

configurations and sea levels. This is also supported by the relative migration and direction of

gene flow calculations, which revealed asymmetric patterns in the magnitude and direction of

genetic exchange. The Atlantic populations are strongly connected to one another, as are the

Gulf populations, though to a lesser extent. The genetic disjunction between Atlantic and Gulf

S. filiforme populations in Florida may provide evidence of a phylogeographic break, which has been observed for a number of warm-temperate marine and intertidal organisms, related

to increases in seawater temperature (and thereby northward shifts in temperate species’ range

limits) associated with glacial retreats that occurred throughout the Pleistocene [70–73].

Historical changes in sea level (and not necessarily temperature) may have been a primary

factor contributing to the development of the genetic break for S. filiforme, a tropical species tolerant of warm seawater temperatures. During the Pleistocene, glacial advances exposed

more of the Florida peninsula and may have restricted estuarine habitat to a small area within

the western Gulf of Mexico [74], while glacial retreats increased sea level and promoted the

expansion of estuarine habitat, likely causing increased contact between eurythermal species

along the southern tip of the peninsula [75]. Depending on Pliocene distributions of S. fili- forme, changes in sea level that resulted in the final emergence of the Florida peninsula may have instigated the genetic break. McCommas [76] attributed the genetic discontinuity

between the Gulf of Mexico and the Atlantic populations of the sea anemone, Bunodosoma car- vernata, to this vicariant event based on estimated time since divergence. Without fossil evi- dence or molecular clock calculations, we can merely suggest the break we found in Florida

was similarly initiated by prior fluctuations in sea level and maintained by contemporary

ocean currents.

Interestingly, there are exceptions to the Atlantic-Gulf divide for S. filiforme, in which we detected relatively high gene flow from Atlantic populations to Gulf populations proximal to a

break in the Middle Keys (at sites Marathon, Pigeon and Sprigger). Additionally, the Marathon

population appeared more genetically similar to the Atlantic populations than to those in Gulf.

This finding could reflect shared ancestry and relatively recent divergence between the Atlantic

and Gulf populations, but does not exclude the possibility of genetic exchange occurring

between Atlantic and Gulf populations across the archipelago via propagules or rafting

vegetation.

We found relative gene flow between proximal Florida Bay, Tampa Bay and Key West pop-

ulations and other South Florida populations to be comparable to (and in some instances less

than) gene flow levels observed between Florida and more distant non-Florida populations.

These populations were also highly clonal, exhibiting the lowest genotypic richness values

measured in this study. We sampled in the northeastern-most extent of Florida Bay in Black-

water Sound, an enclosed area with few hydrological connections to the greater Florida Bay or

the Atlantic; we suspect the low gene flow and low genotypic richness are related to this isola-

tion. Gene flow between the Key West population and other South Florida populations may be

limited by hydrologic rather than topographic isolation: strong reversing tidal currents flowing

between the Gulf of Mexico and the Florida Straits may prevent mixing between the Key West

population and the further eastward Lower Keys populations. The Tampa Bay population,

located roughly halfway up the Florida peninsula, approaches the northern range limit for S. filiforme in the Gulf of Mexico. In the latter half of the 20th century, Tampa Bay experienced a major decline (~ 70%) in historical seagrass coverage due to rapid population expansion and

development along the coast [77]. Since adopting policies to prevent pollution and dredging

activities, seagrasses in Tampa Bay have been on a recovery trajectory and now exceed histori-

cal extent [78]. It is unclear whether the low gene flow and high clonality in this population

reflects its northern position, past seagrass decline or contemporary dispersal limitations. The

Genetic diversity of the seagrass, Syringodium filiforme, in the subtropical Atlantic

PLOS ONE | https://doi.org/10.1371/journal.pone.0203644 September 5, 2018 11 / 18

Tampa Bay population is not representative of the entire estuary as the samples were

collected near the mouth of the bay, disregarding the meadows within the interior of the

bay. Further research on the population genetics of all seagrass species throughout Tampa Bay

is warranted, particularly given its tumultuous history of environmental decline, restoration

and recovery.

Though Bermuda is the furthest distance from all other populations, the greatest differenti-

ation occurred between the Bahamas and Florida sites. High relatedness between Florida and

Bermuda populations of H. wrightii [79] indicates similar mechanisms may be responsible for this pattern. Population structure and gene flow patterns in the Bahamas and Bermuda popu-

lations were somewhat counter to our expectations, but must be interpreted with caution

because we only sampled one site from each location. In the ΔK = 2 population clusters sce- nario, the Bermuda population contains genotypes with near equal membership to the Atlantic

and Gulf clusters, while the Bahamas population shows complete membership to the Atlantic

cluster. These results imply the Bahamas population groups with the Atlantic populations,

though our previous analyses suggest relatively strong genetic differentiation and limited gene

flow between the Bahamas and all other sites. Contemporary surface ocean currents directing

the movement of propagules, and therefore genetic exchange between populations, may be

responsible for these patterns [80]. Based on the mixed-membership genotypes in the Ber-

muda population, it is plausible the Bermuda population developed from an initial source pop-

ulation in recent evolutionary history that later diverged to create the two major clusters

identified here, and now receives propagules from Florida via the Gulf Stream at a frequency

sufficient to prevent strong genetic differentiation. The moderate degree of gene flow from

Bermuda to the Florida populations estimated here (Fig 3) is interesting, as propagule dispersal

via surface currents in the opposite direction (South to North) along the Gulf Stream seems

more likely. And despite the westward flow of the Antilles current, the topography of the

islands of the Bahamas might restrict gene flow between Florida populations and the remote

sampling location in San Salvador, the easternmost island of the Bahamas. We believe further

sampling across the subtropical Atlantic, especially along the western Bahamian islands, will

clarify unexpected gene flow patterns.

The high genetic exchange within Atlantic populations as evidenced by high migration

rates may be explained by hydrologic connections created by surface currents and eddies that

form along the Florida Keys Atlantic coastline. The gene flow patterns observed here roughly

agree with the modeled and observed movement of spiny lobster (Panulirus argus) larvae along a ‘recruitment conveyor’ in the Florida Keys, in which spawning larvae near the Yucatán

Peninsula have been identified as source populations [81]. The net eastward and northward

movement of the Florida current along the Florida Shelf and the intermittent formation of

small eddies could facilitate local movement and entrainment of seagrass propagules [82]. Less

genetic exchange within the Gulf populations is perhaps related to the isolating topography of

the Lower Keys, in which several small key islands and narrow channels separate the seagrass

meadows, potentially hindering the movement of propagules. Though mean hydrological

transport occurs from the Gulf to the Atlantic, westward tidal flow sometimes pushes Atlantic

waters through channels in the Keys [83,84] and could promote movement of propagules of

Atlantic origin through to Gulf side populations Marathon, Pigeon and Sprigger, facilitating

the admixture of genotypes detected between clusters.

The population genetics of S. filiforme in the subtropical Atlantic appear to match theoreti- cal predictions for a colonizer species. The S. filiforme meadows we sampled contained variable genotypic diversity, likely a result of site-specific properties influencing the growth and repro-

ductive strategies in this species as well as propagule supply [85]. The low allelic diversity

within S. filiforme meadows and evidence for population structure along the possible

Genetic diversity of the seagrass, Syringodium filiforme, in the subtropical Atlantic

PLOS ONE | https://doi.org/10.1371/journal.pone.0203644 September 5, 2018 12 / 18

phylogeographic boundary in the Florida Keys, are typical of colonizers. These findings are

consistent with a study on the only congener of S. filiforme, Syringodium isoetifolium, which also exhibited variable genotypic diversity and population structure defined by bioregions in

the western North Pacific [86]. The climax species of the tropical and subtropical Atlantic, T. testudinum, exhibited high genotypic and allelic diversity, and no evidence of population struc- ture in Florida Bay [37], and similarly high allelic diversity and little evidence of population

structure across ~ 1000 km of coastline along the Yucatan Peninsula in Mexico [87]. In con-

trast, the colonizer species H. wrightii showed high clonality and strong differentiation among edge-of-range populations in Florida, North Carolina and Bermuda [79], and generally high

clonality and weak population structure along the western Gulf of Mexico coast [88]. The pop-

ulation genetics of S. filiforme conform to expectations for colonizer species, with genotypic diversity mediated by local conditions and meadow demographics.

Successional status is derived from environmental tolerances and growth and reproductive

strategies that in turn impact population genetics, while modern oceanic hydrology ultimately

controls dispersal trajectories and therefore genetic exchange. It is likely that evolutionarily

historical population dynamics under past continent arrangements and sea levels are the dom-

inant forces driving the population structure in S. filiforme in the subtropical Atlantic Ocean. The higher genotypic diversity found in S. filiforme in certain populations suggests that some meadows may be more resilient to disturbances than others, and these more resilient meadows

may enhance recovery of depauperate meadows by sustaining a supply of propagules and gene

flow, but only where ocean currents and land barriers do not impede connectivity. Whether

overall low genetic diversity and strong population structure in subtropical Atlantic popula-

tions of S. filiforme equates to limited capability for adaptation to selective pressures has yet to be tested.

Supporting information

S1 Table. Sample site GPS coordinates. GPS coordinates mark the exact location of each

sample site. Latitude and longitude are in decimal degrees.

(DOCX)

S2 Table. Pairwise genetic differentiation. FST values are provided to the left of the diagonal and Jost’s D values are provided to the right of the diagonal. Bold text indicates significance

based on non-overlapping confidence intervals.

(DOCX)

S3 Table. Values for relative magnitude and direction of gene flow. Values represent the rel-

ative amount of gene flow from populations in the first column to receiving populations iden-

tified in the first row. For example, the highest amount of gene flow (1.000) occurs from

Carysfort to Davis, while the lowest amount of gene flow occurs from the Bahamas to Key

West (0.004). Bold text indicate significance based on non-overlapping 95% confidence inter-

vals.

(DOCX)

S1 Fig. Principal components analysis (PCA) plot. Axis loading values are depicted for the

two principle coordinate axes containing the greatest amount of variation, PC1 (18.3% vari-

ance) and PC2 (7.7% variance). Genotypes from each population group are distinguished by

color and shape (Atlantic: blue circles, Gulf: orange triangles, Bermuda: yellow diamonds,

Bahamas: magenta squares).

(EPS)

Genetic diversity of the seagrass, Syringodium filiforme, in the subtropical Atlantic

PLOS ONE | https://doi.org/10.1371/journal.pone.0203644 September 5, 2018 13 / 18

Acknowledgments

The authors thank Thomas Frankovich for assistance in the field, Margot Miller and Ainsley

Calladine for technical and laboratory support at the University of Virginia and University of

Adelaide, Laura K. Reynolds for thoughtful feedback on early versions of the manuscript and

anonymous reviewers whose suggestions also greatly improved the manuscript. Sampling for

this study was conducted under permits FKNMS-2015-085, EVER-2013-SCI-0058, and Ber-

muda Dept. of Conservation Services License no. 15-04-16-22. The Jones Environmental

Research Endowment to the Department of Environmental Sciences at the University of Vir-

ginia funded this research.

Author Contributions

Conceptualization: Alexandra L. Bijak, Michelle Waycott.

Formal analysis: Alexandra L. Bijak, Kor-jent van Dijk.

Investigation: Alexandra L. Bijak.

Methodology: Kor-jent van Dijk.

Supervision: Michelle Waycott.

Visualization: Alexandra L. Bijak, Kor-jent van Dijk.

Writing – original draft: Alexandra L. Bijak.

Writing – review & editing: Alexandra L. Bijak, Kor-jent van Dijk, Michelle Waycott.

References 1. Gray A. Genetic diversity and its conservation in natural populations of plants. Biodivers Lett. 1996; 3:

71–80.

2. Duffy JE. Biodiversity and the functioning of seagrass ecosystems. Mar Ecol Prog Ser. 2006; 311: 233–

250.

3. Hughes AR, Stachowicz JJ. Genetic diversity enhances the resistance of a seagrass ecosystem to dis-

turbance. Proc Natl Acad Sci. 2004; 101: 8998–9002. https://doi.org/10.1073/pnas.0402642101 PMID:

15184681

4. Randall Hughes A, Stachowicz JJ. Seagrass genotypic diversity increases disturbance response via

complementarity and dominance. J Ecol. 2011; 99: 445–453. https://doi.org/10.1111/j.1365-2745.2010.

01767.x

5. Loveless MD, Hamrick JL. Ecological determinants of genetic structure in plant populations. Annu Rev

Ecol Syst. 1984; 15: 65–95.

6. Hamrick JL, Godt MJW, Sherman-Broyles SL. Factors influencing levels of genetic diversity in woody

plant species. New For. 1992; 6: 95–124. https://doi.org/10.1007/978-94-011-2815-5_7

7. Giles BE, Goudet J. Genetic differentiation in Silene dioica metapopulations: Estimation of spatiotempo-

ral effects in successional plant species. Am Nat. 1997; 149: 507–526. https://doi.org/10.1086/286002

8. Raffl C, Schönswetter P, Erschbamer B. “Sax-sess”—genetics of primary succession in a pioneer spe-

cies on two parallel glacier forelands. Mol Ecol. 2006; 15: 2433–2440. https://doi.org/10.1111/j.1365-

294X.2006.02964.x PMID: 16842417

9. Raffl C, Holderegger R, Parson W, Erschbamer B. Patterns in genetic diversity of Trifolium pallescens

populations do not reflect chronosequence on alpine glacier forelands. Heredity. 2008; 100: 526–532.

https://doi.org/10.1038/hdy.2008.8 PMID: 18270530

10. Les DH, Cleland MA, Waycott M. Phylogenetic studies in Alismatidae, II: Evolution of marine angio-

sperms (seagrasses) and hydrophily. Syst Bot. 1997; 22: 443–463.

11. Kendrick GA, Orth RJ, Statton J, Hovey R, Ruiz-Montoya L, Lowe RJ, et al. Demographic and genetic

connectivity: The role and consequences of reproduction, dispersal and recruitment in seagrasses. Biol

Rev Camb Philos Soc. 2016. https://doi.org/10.1111/brv.12261 PMID: 27010433

Genetic diversity of the seagrass, Syringodium filiforme, in the subtropical Atlantic

PLOS ONE | https://doi.org/10.1371/journal.pone.0203644 September 5, 2018 14 / 18

12. Barrett SCH, Eckert CG, Husband BC. Evolutionary processes in aquatic plant populations. Aquat Bot.

1993; 44: 105–145. https://doi.org/10.1016/0304-3770(93)90068-8

13. Kendrick GA, Duarte CM, Marbà N. Clonality in seagrasses, emergent properties and seagrass land- scapes. Mar Ecol Prog Ser. 2005; 290: 291–296. https://doi.org/10.3354/meps290291

14. Arnaud-Haond S, Alberto F, Teixeira S, Procaccini G, Serrão EA, Duarte CM. Assessing genetic diver- sity in clonal organisms: Low diversity or low resolution? Combining power and cost efficiency in select-

ing markers. J Hered. 2005; 96: 434–440. https://doi.org/10.1093/jhered/esi043 PMID: 15743902

15. Oliva S, Romero J, Marta PE. Reproductive strategies and isolation-by-demography in a marine clonal

plant along an eutrophication gradient. Mol Ecol. 2014; 23: 5698–5711. https://doi.org/10.1111/mec.

12973 PMID: 25331192

16. Sinclair EA, Krauss SL, Anthony J, Hovey R, Kendrick GA. The interaction of environment and genetic

diversity within meadows of the seagrass Posidonia australis (Posidoniaceae). Mar Ecol Prog Ser.

2014; 506: 87–98. https://doi.org/10.3354/meps10812

17. Serra IA, Innocenti AM, Di Maida G, Calvo S, Migliaccio M, Zambianchi E, et al. Genetic structure in the

Mediterranean seagrass Posidonia oceanica: Disentangling past vicariance events from contemporary

patterns of gene flow. Mol Ecol. 2010; 19: 557–568. https://doi.org/10.1111/j.1365-294X.2009.04462.x

PMID: 20051010

18. Short FT, Carruthers TJB, Dennison WC, Waycott M. Global seagrass distribution and diversity: A bio-

regional model. J Exp Mar Bio Ecol. 2007; 350: 3–20. https://doi.org/10.1016/j.jembe.2007.06.012

19. Creed JC, Phillips RC, van Tussenbroek. Seagrasses of the Caribbean. In: Green EP, Short FT editors.

World atlas of seagrasses. Berkeley, USA: University of California Press; 2003. pp. 234–240.

20. Fourqurean JW, Durako MJ, Hall MO, Hefty LN. Seagrass distribution in South Florida: A multi-agency

coordinated monitoring program. In: Porter JW, Porter KG editors. The Everglades, Florida Bay, and

coral reefs of the Florida Keys: An ecosystem sourcebook. 2002. pp. 497–522.

21. Costanza R, d’Arge R, de Groot R, Farber S, Grasso M, Hannon B, et al. The value of the world’s eco-

system services and natural capital. Nature. 1997; 387: 253–260. https://doi.org/10.1038/387253a0

22. Fourqurean JW, Duarte CM, Kennedy H, Marbà N, Holmer M, Mateo MA, et al. Seagrass ecosystems as a globally significant carbon stock. Nat Geosci. 2012; 5: 505–509. https://doi.org/10.1038/ngeo1477

23. Sargent FJ, Leary TJ, Crewz DW, Kruer CR. Scarring of Florida’s seagrasses: Assessment and man-

agement options. St. Petersburg (FL): Florida Marine Research Institute; 1995. Report No.: FMRI

Tech. Rep. TR-1.

24. Short FT, Wyllie-Echeverria S. Natural and human-induced disturbance of seagrasses. Environ Con-

serv. 1996; 23: 17. https://doi.org/10.1017/S0376892900038212

25. Orth RJ, Carruthers TJB, Dennison WC, Duarte CM, James W, Heck KL, et al. Global crisis for sea-

grass ecosystems. BioScience. 2006; 56: 987–996.

26. Hall MO, Furman BT, Merello M, Durako MJ. Recurrence of Thalassia testudinum seagrass die-off in

Florida Bay, USA: Initial observations. Mar Ecol Prog Ser. 2016; 560: 243–249. https://doi.org/10.3354/

meps11923

27. Hall MO, Durako MJ, Fourqurean JW, Zieman JC. Decadal changes in seagrass distribution and abun-

dance in Florida Bay. Estuaries. 1999; 22: 445–459. https://doi.org/10.2307/1353210

28. Roblee MB, Barber TR, Carlson PR, Durako MJ, Fourqurean JW, Muehkstein LK, et al. Mass mortality of

the tropical seagrass Thalassia testudinum in Florida Bay (USA). Mar Ecol Prog Ser. 1991; 71: 297–299.

29. Zieman JC, Fourqurean JW, Frankovich TA. Seagrass Die-Off in Florida Bay: Long-term trends in

abundance and growth of turtle grass, Thalassia testudinum. Estuaries. 1999; 22: 460–470. https://doi.

org/10.2307/1353211

30. Johansson JOR. Historical overview of Tampa Bay water quality and seagrass issues and trends. In:

Greening HS, editor. Seagrass Management, It’s Not Just Nutrients! Symposium. St. Petersburg, Flor-

ida; 2002. p. 246.

31. Marbà N, Duarte CM. Rhizome elongation and seagrass clonal growth. Mar Ecol Prog Ser. 1998; 174: 269–280. https://doi.org/10.3354/meps174269

32. Williams SL. Experimental studies of Caribbean seagrass bed development. Ecol Monogr. 1990; 60:

449–469. https://doi.org/10.2307/1943015

33. Kendall MS, Battista T, Hillis-Starr Z. Long term expansion of a deep Syringodium filiforme meadow in

St. Croix, US Virgin Islands: The potential role of hurricanes in the dispersal of seeds. Aquat Bot. 2004;

78: 15–25. https://doi.org/10.1016/j.aquabot.2003.09.004

34. Kilminster K, McMahon K, Waycott M, Kendrick GA, Scanes P, McKenzie L, et al. Unravelling complex-

ity in seagrass systems for management: Australia as a microcosm. Sci Total Environ. Elsevier B.V.;

2015; 534: 97–109. https://doi.org/10.1016/j.scitotenv.2015.04.061 PMID: 25917445

Genetic diversity of the seagrass, Syringodium filiforme, in the subtropical Atlantic

PLOS ONE | https://doi.org/10.1371/journal.pone.0203644 September 5, 2018 15 / 18

35. Schlueter MA, Guttman SI. Gene flow and genetic diversity of turtle grass, Thalassia testudinum, banks

ex könig, in the lower Florida Keys. Aquat Bot. Elsevier; 1998; 61: 147–164. https://doi.org/10.1016/

S0304-3770(98)00063-1

36. Kirsten JH, Dawes CJ, Cochrane BJ. Randomly amplified polymorphism detection (RAPD) reveals high

genetic diversity in Thalassia testudinum banks ex König (Turtlegrass). Aquat Bot. 1998; 61: 269–287.

https://doi.org/10.1016/S0304-3770(98)00070-9

37. Bricker E, Waycott M, Calladine A, Zieman JC. High connectivity across environmental gradients and

implications for phenotypic plasticity in a marine plant. Mar Ecol Prog Ser. 2011; 423: 57–67. https://doi.

org/10.3354/meps08962

38. Angel R. Genetic diversity of Halodule wrightii using random amplified polymorphic DNA. Aquat Bot.

2002; 74: 165–174. https://doi.org/10.1016/S0304-3770(02)00079-7

39. Briceño HO, Boyer JN, Castro J, Harlem P. Biogeochemical classification of South Florida’s estuarine and coastal waters. Mar Pollut Bull. 2013; 75: 187–204. https://doi.org/10.1016/j.marpolbul.2013.07.

034 PMID: 23968989

40. Fourqurean JW, Boyer JN, Durako MJ, Hefty LN, Peterson BJ. Forecasting responses of seagrass dis-

tributions to changing water quality using monitoring data. Ecol Appl. 2003; 13: 474–489. https://doi.org/

10.1890/1051-0761(2003)013[0474:FROSDT]2.0.CO;2

41. Fourqurean JW, Willsie A, Rose CD, Rutten LM. Spatial and temporal pattern in seagrass community

composition and productivity in South Florida. Mar Biol. 2001; 138: 341–354. https://doi.org/10.1007/

s002270000448

42. Diekmann OE, Serrão EA. Range-edge genetic diversity: Locally poor extant southern patches maintain a regionally diverse hotspot in the seagrass Zostera marina. Mol Ecol. 2012; 21: 1647–1657. https://doi.

org/10.1111/j.1365-294X.2012.05500.x PMID: 22369278

43. Arnaud-Haond S, Duarte CM, Alberto F, Serrão EA. Standardizing methods to address clonality in pop- ulation studies. Mol Ecol. 2007; 16: 5115–5139. https://doi.org/10.1111/j.1365-294X.2007.03535.x

PMID: 17944846

44. Balloux F, Lugon-Moulin N. The estimation of population differentiation with microsatellite markers. Mol

Ecol. 2002; 11: 155–165. https://doi.org/10.1046/j.0962-1083.2001.01436.x PMID: 11856418

45. Saghai-Maroof MA, Soliman KM, Jorgensen RA, Allard RW. Ribosomal DNA spacer-length polymor-

phisms in barley: Mendelian inheritance, chromosomal location, and population dynamics. Proc Natl

Acad Sci. 1984; 81: 8014–8018. https://doi.org/10.1073/pnas.81.24.8014 PMID: 6096873

46. Bijak AL, van Dijk K-J, Waycott M. Development of microsatellite markers for a tropical seagrass, Syrin-

godium filiforme (Cymodoceaceae). Appl Plant Sci. 2014; 2: 1–4. https://doi.org/10.3732/apps.1400082

PMID: 25309842

47. Parks JC, Werth CR. A study of spatial features of clones in a population of bracken fern, Pteridium

aquilinum (Dennstaedtiaceae). Am J Bot. 1993; 80: 537–544. https://doi.org/10.1002/j.1537-2197.

1993.tb13837.x PMID: 30139148

48. Arnaud-Haond S, Belkhir K. GENCLONE: A computer program to analyse genotypic data, test for clon-

ality and describe spatial clonal organization. Mol Ecol Notes. 2007; 7: 15–17. https://doi.org/10.1111/j.

1471-8286.2006.01522.x

49. Kalinowski ST, Taper ML. Maximum likelihood estimation of the frequency of null alleles at microsatellite

loci. Conserv Genet. 2006; 7: 991–995. https://doi.org/10.1007/s10592-006-9134-9

50. Dorken ME, Eckert CG. Severely reduced sexual reproduction in northern populations of a clonal plant,

Decodon verticillatus (Lythraceae). J Ecol. 2011; 89: 339–350. https://doi.org/10.1046/j.1365-2745.

2001.00558.x

51. Keenan K, Mcginnity P, Cross TF, Crozier WW, Prodöhl PA. DiveRsity: An R package for the estimation

and exploration of population genetics parameters and their associated errors. Methods Ecol Evol.

2013; 4: 782–788. https://doi.org/10.1111/2041-210X.12067

52. R Core Team. R: A language and environment for statistical computing [Internet]. Vienna, Austria: R

Foundation for Statistical Computing; 2014. Available: http://www.r-project.org/

53. Peakall R, Smouse PE. GENALEX 6: Genetic analysis in Excel. Population genetic software for teach-

ing and research. Mol Ecol Notes. 2006; 6: 288–295. https://doi.org/10.1111/j.1471-8286.2005.01155.x

54. Peakall R, Smouse PE. GenALEx 6.5: Genetic analysis in Excel. Population genetic software for teach-

ing and research-an update. Bioinformatics. 2012; 28: 2537–2539. https://doi.org/10.1093/

bioinformatics/bts460 PMID: 22820204

55. Raymond M, Rousset F. Genepop (Version-1.2): Population genetics software for exact tests and ecu-

menicism. J Hered. 1995; 86: 248–249.

Genetic diversity of the seagrass, Syringodium filiforme, in the subtropical Atlantic

PLOS ONE | https://doi.org/10.1371/journal.pone.0203644 September 5, 2018 16 / 18

56. Rousset F. GENEPOP’007: A complete re-implementation of the GENEPOP software for Windows and

Linux. Mol Ecol Resour. 2008; 8: 103–106. https://doi.org/10.1111/j.1471-8286.2007.01931.x PMID:

21585727

57. Meirmans PG, Van Tienderen PH. GENOTYPE and GENODIVE: Two programs for the analysis of

genetic diversity of asexual organisms. Mol Ecol Notes. 2004; 4: 792–794. https://doi.org/10.1111/j.

1471-8286.2004.00770.x

58. Weir BS, Cockerham CC. Estimating F-statistics for the analysis of population structure. Evolution.

1984; 38: 1358–1370. https://doi.org/10.1111/j.1558-5646.1984.tb05657.x PMID: 28563791

59. Jost L. GST and its relatives do not measure differentiation. Mol Ecol. 2008; 17: 4015–4026. https://doi.

org/10.1111/j.1365-294X.2008.03887.x PMID: 19238703

60. Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype

data. Genetics. 2000; 155: 945–959. https://doi.org/10.1111/j.1471-8286.2007.01758.x PMID:

10835412

61. Evanno G, Regnaut S, Goudet J. Detecting the number of clusters of individuals using the software

STRUCTURE: A simulation study. Mol Ecol. 2005; 14: 2611–2620. https://doi.org/10.1111/j.1365-

294X.2005.02553.x PMID: 15969739

62. Kopelman NM, Mayzel J, Jakobsson M, Rosenberg NA, Mayrose I. CLUMPAK: A program for identify-

ing clustering modes and packaging population structure inferences across K. Mol Ecol Resour. 2015;

https://doi.org/10.1111/1755-0998.12387 PMID: 25684545

63. Rosenberg N a. DISTRUCT: A program for the graphical display of population structure. Mol Ecol

Notes. 2004; 4: 137–138. https://doi.org/10.1046/j.1471-8286.2003.00566.x

64. Wright S. Genetical structure of populations. Ann Eugen. 1951; 15: 323–354. https://doi.org/10.1111/j.

1469-1809.1949.tb02451.x PMID: 24540312

65. Barton NH, Slatkin M. A quasi-equilibrium theory of the distribution of rare alleles in a subdivided popula-

tion. Heredity. 1986; 56: 409–415. https://doi.org/10.1038/hdy.1986.63 PMID: 3733460

66. Alcala N, Goudet J, Vuilleumier S. On the transition of genetic differentiation from isolation to panmixia:

What we can learn from Gst and D. Theor Popul Biol. 2014; 93: 75–84. https://doi.org/10.1016/j.tpb.

2014.02.003 PMID: 24560956

67. Sundqvist L, Keenan K, Zackrisson M, Prodöhl P, Kleinhans D. Directional genetic differentiation and

relative migration. Ecol Evol. 2016; 6: 3461–3475. https://doi.org/10.1002/ece3.2096 PMID: 27127613

68. Leberg PL. Estimating allelic richnes: Effects of sample size and bottlenecks. Mol Ecol. 2002; 11: 2445–

2449. PMID: 12406254

69. Billingham MR, Reusch TBH, Alberto F, Serrão EA. Is asexual reproduction more important at geo- graphical limits? A genetic study of the seagrass Zostera marina in the Ria Formosa, Portugal. Mar Ecol

Prog Ser. 2003; 265: 77–83. https://doi.org/10.3354/meps265077

70. Felder DL, Staton JL. Genetic differentiation in trans-Floridian species complexes of Sesarma and Uca

(Decapoda: Brachyura). J Crustac Biol. 1994; 14: 191–209.

71. Young AM, Torres C, Mack JE, Cunningham CW. Morphological and genetic evidence for vicariance

and refugium in Atlantic and Gulf of Mexico populations of the hermit crab Pagurus longicarpus. Mar

Biol. 2002; 140: 1059–1066. https://doi.org/10.1007/s00227-002-0780-2

72. Lee TN, Foighil DÓ. Hidden Floridian biodiversity: Mitochondrial and nuclear gene trees reveal four

cryptic species within the scorched mussel, Brachidontes exustus, species complex. Mol Ecol. 2004;

13: 3527–3542. https://doi.org/10.1111/j.1365-294X.2004.02337.x PMID: 15488009

73. Mathews LM. Cryptic biodiversity and phylogeographical patterns in a snapping shrimp species complex.

Mol Ecol. 2006; 15: 4049–4063. https://doi.org/10.1111/j.1365-294X.2006.03077.x PMID: 17054502

74. Petuch EJ. Geographical heterochrony: Comtemporaneous coexistence of neogene and recent mollus-

can faunas in the Americas. Palaeogeogr Palaeoclimatol Palaeoecol. Elsevier; 1982; 37: 277–312.

https://doi.org/10.1016/0031-0182(82)90041-4

75. Avise JC. Molecular population structure and the biogeographic history of a regional fauna: A case his-

tory with lessons for conservation biology. Oikos. 1992; 63: 62–76.

76. Mccommas SA. Biochemical genetics of the sea anemone Bunodosoma cavernata and the zoogeogra-

phy of the Gulf of Mexico. Mar Biol. 1982; 68: 169–173.

77. Lewis RR, Durako MJ, Moffler MD, Phillips RC. Seagrass meadows of Tampa Bay—a review. In: Treat

SF, Simon JL, Lewis RR, Whitman RL Jr., editors. Tampa Bay Area Scientific Information Symposium.

Minneapolis, Minnesota: Burgess Publishing Company; 1985. pp. 210–246.

78. Sherwood ET, Greening HS, Johansson JOR, Kaufman K, Raulerson GE. Tampa Bay (Florida, USA):

Documenting seagrass recovery since the 1980’s and reviewing the benefits. Southeast Geogr. 2017;

57: 294–319. https://doi.org/10.1353/sgo.2017.0026

Genetic diversity of the seagrass, Syringodium filiforme, in the subtropical Atlantic

PLOS ONE | https://doi.org/10.1371/journal.pone.0203644 September 5, 2018 17 / 18

79. Digiantonio GB. The genetic diversity of two contrasting seagrass species using microsatellite analysis.

M.Sc. Thesis, University of Virginia. 2017. Available from: https://libraetd.lib.virginia.edu/public_view/

nk322d45k.

80. McMahon K, van Dijk K-J, Ruiz-Montoya L, Kendrick GA, Krauss SL, Waycott M, et al. The movement

ecology of seagrasses. Proc R Soc B. 2014; 281: 20140878. https://doi.org/10.1098/rspb.2014.0878

PMID: 25297859

81. Yeung C, Lee TN. Larval transport and retention of the spiney lobster, Panulirus argus, in the coastal

zone of the Florida Keys, USA. Fish Oceanogr. 2002; 11: 286–309.

82. Lee TN, Williams E. Mean distribution and seasonal variability of coastal currents and temperature in

the Florida Keys with implications for larval recruitment. Bull Mar Sci. 1999; 64: 35–56.

83. Smith NP. Long-term Gulf-to-Atlantic transport through tidal channels in the Florida Keys. Bull Mar Sci.

1994; 54: 602–609.

84. Lee TN, Smith NP. Volume transport variability through the Florida Keys tidal channels. Cont Shelf Res.

2002; 22: 1361–1377. http://dx.doi.org/10.1016/S0278-4343(02)00003-1

85. Kendrick GA, Waycott M, Carruthers TJB, Cambridge ML, Hovey R, Krauss SL, et al. The central role

of dispersal in the maintenance and persistence of seagrass populations. Bioscience. 2012; 62: 56–65.

https://doi.org/10.1525/bio.2012.62.1.10

86. Kurokochi H, Matsuki Y, Nakajima Y, Fortes MD, Uy WH, Campos WL, et al. A baseline for the genetic

conservation of tropical seagrasses in the western North Pacific under the influence of the Kuroshio

Current: the case of Syringodium isoetifolium. Conserv Genet. 2015; https://doi.org/10.1007/s10592-

015-0764-7

87. van Dijk K- J, van Tussenbroek B, Jiménez-Durán K, Márquez-Guzmán G, Ouborg J. High levels of

gene flow and low population genetic structure related to high dispersal potential of a tropical marine

angiosperm. Mar Ecol Prog Ser. 2009; 390: 67–77. https://doi.org/10.3354/meps08190

88. Larkin PD, Maloney TJ, Rubiano-rincon S, Barrett MM. A map-based approach to assessing genetic

diversity, structure, and connectivity in the seagrass Halodule wrightii. Mar Ecol Prog Ser. 2017; 567:

95–107.

Genetic diversity of the seagrass, Syringodium filiforme, in the subtropical Atlantic

PLOS ONE | https://doi.org/10.1371/journal.pone.0203644 September 5, 2018 18 / 18

MEPopulation2014.pdf

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/266151290

The influence of admixture and consanguinity on population genetic diversity

in Middle East

Article  in  Journal of Human Genetics · September 2014

DOI: 10.1038/jhg.2014.81 · Source: PubMed

CITATIONS

9 READS

226

9 authors, including:

Some of the authors of this publication are also working on these related projects:

trait GWAS View project

Copy Number Variations in Human View project

Xiong Yang

Chinese Academy of Sciences

38 PUBLICATIONS   153 CITATIONS   

SEE PROFILE

Qidi Feng

Chinese Academy of Sciences

13 PUBLICATIONS   173 CITATIONS   

SEE PROFILE

Makia Marafie

Kuwait Medical Genetics Centre

58 PUBLICATIONS   657 CITATIONS   

SEE PROFILE

Sindhu Jacob

Kuwait Health Sciences Center

9 PUBLICATIONS   107 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Makia Marafie on 01 October 2014.

The user has requested enhancement of the downloaded file.

ORIGINAL ARTICLE

The influence of admixture and consanguinity on population genetic diversity in Middle East

Xiong Yang1, Suzanne Al-Bustan2, Qidi Feng1, Wei Guo3, Zhiming Ma3, Makia Marafie4, Sindhu Jacob5, Fahd Al-Mulla5 and Shuhua Xu1

The Middle East (ME) is an important crossroad where modern humans migrated ‘out of Africa’ and spread into Europe and Asia.

After the initial peopling and long-term isolation leading to well-differentiated populations, the ME also had a crucial role in

subsequent human migrations among Africa, Europe and Asia; thus, recent population admixture has been common in the ME.

On the other hand, consanguinity, a well-known practice in the ME, often reduces genetic diversity and works in opposition to

admixture. Here, we explored the degree to which admixture and consanguinity jointly affected genetic diversity in ME

populations. Genome-wide single-nucleotide polymorphism data were generated in two representative ME populations (Arabian

and Iranian), with comparisons made with populations worldwide. Our results revealed an overall higher genetic diversity in both

ME populations relative to other non-African populations. We identified a much larger number of long runs of homozygosity in

ME populations than in any other populations, which was most likely attributed to high levels of consanguineous marriages that

significantly decreased both individual and population heterozygosity. Additionally, we were able to distinguish African, European

and Asian ancestries in ME populations and quantify the impact of admixture and consanguinity with statistical approaches.

Interestingly, genomic regions with significantly excessive ancestry from individual source populations are functionally enriched

in olfactory pathways, which were suspected to be under natural selection. Our findings suggest that genetic admixture,

consanguinity and natural selection have collectively shaped the genetic diversity of ME populations, which has important

implications in both evolutionary studies and medical practices.

Journal of Human Genetics advance online publication, 25 September 2014; doi:10.1038/jhg.2014.81

INTRODUCTION

Studies of both mitochondrial DNA (mtDNA) and Y-chromosome lineages indicate that after modern human migrating out of Africa, tens of thousands of years ago, they arrived in the Middle East (ME), and then dispersed into Europe and Asia.1,2 Over thousands of years, most of human populations have been relatively isolated, evolved independently and generated the distinct genomic characteristics as can be noted today. However, Africa, Asia and Europe are geogra- phically connected by the ME, which provides opportunities for population contact and thus population admixture, with this effect being more pronounced following trade and the establishment of the Silk Road. Previous studies have identified admixture events in ME populations when examining both uniparental markers and genome- wide single-nucleotide polymorphisms (SNPs).3–6 A previous study of Uyghurs and African Americans reported that admixture could increase the genetic diversity of the admixed populations.7 Moreover,

ME populations generally have large family sizes, with marriages between relatives very common;8 thus, consanguinity is highly prevalent in this region. A similar situation is encountered in Central Asia, South Asia and the Americas,9,10 especially in Islamic-influenced areas. It is deemed that consanguinity usually decreases the genetic diversity and results in many recessive diseases such as neuromuscular disorders, metabolic disorders, osteopetrosis syndromes and chondrodystrophia.8,11 However, to our knowledge, few studies have focused on the influences of both admixture and consanguinity on population genetic diversity simultaneously. In the present study, we attempt to qualify and quantify the influence of these two forces in two representative ME populations residing in Kuwait, with evidence showing that both populations experienced admixture and consanguinity.3–5,11 Ultimately, mathematical modeling was used to elucidate the degree to which admixture and consanguinity shaped the genetic diversity and structure of the two ME human populations.

1Chinese Academy of Sciences (CAS) Key Laboratory of Computational Biology, Max-Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology (PICB), Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China; 2Department of Biological Sciences, Kuwait University, Safat, Kuwait; 3Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China; 4Kuwait Medical Genetic Center, Maternity Hospital, Sulaibikhat, Kuwait and 5Department of Pathology, Faculty of Medicine, Health Sciences Center, Kuwait University, Safat, Kuwait Correspondence: Professor S Xu, Chinese Academy of Sciences (CAS) Key Laboratory of Computational Biology, Max-Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology (PICB), Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai 200031, China. E-mail: [email protected] Received 8 July 2014; revised 25 August 2014; accepted 27 August 2014

Journal of Human Genetics (2014), 1–8 & 2014 The Japan Society of Human Genetics All rights reserved 1434-5161/14 www.nature.com/jhg

MATERIALS AND METHODS

Samples and quality controls DNA samples were collected from 42 Kuwaitis whose ancestry had been traced back at least four generations to the Arabian Peninsula (ARB) and 22 Kuwaitis whose ancestry had been traced back at least four generations to Persia (IRN) via pedigree analysis. These samples were genotyped by Affymetrix Genome- Wide Human SNP Array 6.0 (Santa Clara, CA, USA) according to standard protocols. Additionally, the raw data of 200 unrelated samples from four populations in the International HapMap Project phase III12 were downloaded to include 50 CHB (Han Chinese in Beijing, China), 50 JPT (Japanese in Tokyo, Japan), 50 YRI (Yoruba in Ibadan, Nigeria) and 50 CEU (Utah residents with Northern and Western European ancestries), which were also genotyped by Affymetrix Genome-Wide Human SNP Array 6.0. All the raw array data were called by Affymetrix Power Tools (APT, Version 1.15.0) with the Affymetrix platform annotation file (Genome-Wide SNP 6 annotation na32, with genome references to UCSC hg19 or NCBI GRCh37). One sample from the ARB sample group was removed from subsequent analyses owing to a calling rate o85%. Data filtering was performed within each population, samples with a missing rate 45% per individual, SNPs with missing rate 45% and SNPs failing the Hardy–Weinberg equilibrium test (P-value o1.0E− 6) were excluded from the analysis. Data from the six populations was merged according to the intercepted SNPs (721 989 autosomal SNPs), SNPs whose minor allele frequency (MAF) o0.01 was excluded (11 362 autosomal SNPs). Finally, 40 ARB samples, 22 IRN samples, all 200 selected HapMap Phase III samples and 710 627 autosomal SNPs were used for subsequent analyses.

Population structure analysis Markers with r240.5 calculated by a 50-SNP sliding window shifted at a 5-SNP interval were removed to reduce strong linkage disequilibrium (LD). Principal component analysis (PCA) was performed on the six populations with the thinned autosomal SNPs (334 705 markers) using smartpca in the EIGENSOFT package (version 4.2),13,14 The population structure was also inferred with ADMIXTURE (version 1.2.3),15 which implemented a maximum-likelihood method to estimate individual ancestries. The analysis was performed with LD-pruned SNPs, with ten fold cross validation error (–cv=10), a K from 2 to 6 and other parameters set to default.

Assessing genetic diversity To eliminate the effects that the ascertainment bias of genotyping data might bring to the measurement of genetic diversity, we merged the SNPs of the four reference populations (CEU, CHB, JPT and YRI) called from the next- generation sequencing database.16,17 Next, we calculated the MAF and divided the MAF ⩾ 0.05 into small intervals: increase each interval by 0.01; that is, [0.05, 0.06), [0.06, 0.07), …, and [0.49, 0.50]. Next, we randomly sampled SNPs from the genotyping data of the same merged four populations according to the proportion in each interval decided by sequencing called SNPs, with the sampling processes repeated 10 times. The expected heterozygosity for each SNP (HSe) was used to measure

the SNP-based diversity of each population by the formula: HSe ¼ 1n

Pn i¼1 1 � Sp2j

� � , where n denotes the number of SNPs in a sliding

window and pj denotes the allele frequency of jth allele. We chose a 100 kb sliding window, with at least five SNPs per 100 kb. Individual-based diversity was measured by dividing the number of heterozygous SNPs by the total number of non-missing SNPs per individual. The data were then phased using Beagle (version 3.3.2)18,19 with default parameters and the expected haplotype heterozygosity (HHe) calculated for the windows of 10, 20, 30, 40, 50, 100, 200 and 500 kb as previously described.7 All measurements were performed independently on the 10 sampling repeats.

Assessing consanguinity If a child received two copies of the same segment from father and mother, it would create a run of homozygosity (ROH).20 The probability for a child from consanguineous marriage receiving the same segment was significantly elevated; thus, ROH could be used to measure the level of consanguinity. Here ROH was calculated by PLINK (version 1.07)21 with a sliding window of 500 kb and at least 50 SNPs, one heterozygote and no more than five missing SNPs allowed

per window. Satisfactory ROHs contained a span of at least 500 kb, with a minimum density of one SNP per 50 kb and a maximum distance between two adjacent SNPs of 100 kb. Two adjacent ROH segments were merged if the proportion of overlap was 40.05. ROH fragments were then clustered into three classes with Mclust (an R package) using the methods described by Pemberton et al.9

Testing population admixture To test whether ARB and IRN were admixed populations, the three-population test (F3 test), a test that can provide strong evidence of population admixture by modeling genetic drift paths, was used.22 The F3 test has a general form F3 (C; A, B), in which C denotes the target population and A and B denote two reference populations; thus, ARB and IRN were deemed target populations. In this study, we selected CEU, CHB and YRI as the reference populations (surrogate ancestral populations) because they are less admixed and high- density SNP data are available. Significantly negative F3 scores support population admixture, with gene flow occurring between the two reference populations.

Inferring local ancestry To determine local ancestry for each SNP from each individual, ELAI, a two- layer hidden Markov model, was used by modeling the LD within and among groups.23 YRI, CEU and CHB were set as surrogate ancestral reference populations, with 50 EM steps, and 3 upper and 30 lower clusters. Previous studies on the admixture events of other ME populations reported the events to have occurred about 100 generations ago;24,25 thus, this time estimate was used as a priori in our local ancestry inference.

Linear regression analysis To determine how admixture and consanguinity jointly influenced genetic diversity, linear regression analysis was performed at both the SNP and individual levels as follows: (1) For SNP diversity, ROH scores were defined as

Xs1 ¼ The occurence times of that SNP in ROH region

Number of individuals in the admixed population

and the admixture effect was defined as

Xs2 ¼ X

ai ´ 2f ið1 � f iÞ; i ¼ 1; 2; 3

where αi is the ancestral contribution to that SNP, fi is the MAF of that SNP and i is the ancestral population. Then, the diversity of an SNP (Ys) was modeled according to

Ys ¼ bs0 þ bs1Xs1 þ bs2Xs2 þ e1; e1BN 0; s12 � �

(2) For the individual diversity, ROH scores were defined as

Xi1 ¼ Number of SNPs in that individual in ROH region

Total number of SNPs in that individual

and the admixture effect was defined as

Xi2 ¼ X

aiHi; i ¼ 1; 2; 3

where αi is the ancestral contribution to that individual, Hi is the mean individual diversity and i is the ancestral population. Similarly, the diversity of an individual (Yi) was modeled according to

Yi ¼ bi0 þ bi1Xi1 þ bi2Xi2 þ e2; e2BN 0; s22 � �

The proposed null hypothesis (H0) for these models assumes the ROH score and admixture effect to have no impact on the observed SNP and individual diversities. Based on this hypothesis, linear regression analysis was performed separately on the two ME populations, ARB and IRN, for the 10 sampling repeats.

Admixture and consanguinity in the Middle East X Yang et al

2

Journal of Human Genetics

RESULTS

Population structure of ME populations PCA was performed at the individual level to investigate the population structure. A plot displaying the two most significant principal components (PCs) (Figure 1a) showed individuals from Africa, Europe and Asia to tightly cluster in their groups. PC1 clearly separated Africans and non-Africans, whereas PC2 separated Asians and Europeans. However, individuals from the two ME populations (ARB and IRN) clustered loosely, with ARB samples located along the edge between YRI and CEU, while the IRN samples shifted slightly towards the Eastern Asian populations (CHB and JPT) (Figure 1a). The long tails exhibited by the two ME populations in the PCA plot imply possible admixture events, or the occurrence of gene flow from other populations. When performing the ADMIXTURE analysis, the lowest cross-validation error could be found when K = 3 (Supplementary Figure S1). These results clearly show that the genetic ancestries of the two ME populations share mainly European (blue) and African (gray) ancestries, as well as a slight Eastern Asian (red) ancestry (Figure 1b), which was consistent with the observed PCA results and suggested admixture events supported by the ADMIX- TURE analysis.

ME populations show higher genetic diversity than the other non- African populations To compare the genetic diversity of ME populations relative to others, SNP-based, haplotype-based and individual-based heterozygosity assessments were measured. All diversities were calculated from ascertainment bias-corrected SNP subsets, with independent sampling repeated 10 times. Furthermore, the mean SNP-based diversity (HSe) in the two ME populations were higher than those in CEU, CHB and JPT populations, but slightly lower than that in the YRI population (Figure 2a), with the same pattern for individual-based heterozygosity noted (Figure 2b). Remarkably, when SNPs in ROH regions were excluded for each individual to control potential consanguinity, the two ME populations exhibited even higher individual heterozygosity than the other non-African populations and showed comparable levels to the African population (Figure 3b). When examining haplotype- based heterozygosity (HHe), similar patterns were observed despite window size (Figure 2c), with increasing values approaching 1 correlated with an increased window size and a value of 1 almost reached with a window size exceeding 500 kb. One possible inter- pretation of these results is that the two ME populations are admixed populations with ancestral contributions from African, European and Asian populations, with the increased genetic diversity due to admixture counteracted by the substantial increase of consanguineous marriage practices, which is consistent with previous findings.3,5

ME populations show higher consanguinity To compare consanguinity, we measured consanguinity using ROH and clustered the ROH fragments into three classes. It was clearly observed that both the total number and total length of ROHs per individual gradually increased with an increase in geographical distance from Africa for both short (Figure 3a) and intermediate (Figure 3b) ROH classes. These observed patterns were consistent with a previous study based on the HGDP data set.9 However, the two ME populations presented large variations in both the total number and total length of ROHs per individual (Figure 3c). When the total ROH length was plotted against the total ROH number per individual, a strong correlation was noted for both the short and intermediate classes, with the distance along the fitted line proportional to the geographical distance from Africa (Figures 3d and e). For long ROH

class, the two ME populations showed a greater total ROH number and longer total ROH length per individual than the other four populations (Figure 3f), with the long ROH fragments most likely arising from a recent background relatedness;that is, consanguinity. Thus, the possibility of consanguinity having reduced the genetic

diversity of the two originally admixed ME populations is plausible, with these populations exhibiting a lower genetic diversity than their surrogate ancestral YRI population. However, the two ME populations still showed higher genetic diversity than the other non-African populations. This may be explained by both European and Asian populations having possibly experienced a bottleneck event since their divergence from the ME populations, in addition to the time and strength of consanguinity being unable to counteract completely the diversity introduced by admixture in the two ME populations.

Evidence of admixture in the ME populations ADMIXTURE analysis revealed that the two ME populations had the highest genetic similarities to Europeans, followed by Africans and Asians. To formally test for admixture in these populations, we first calculated F3 (ARB or IRN; YRI, CEU) and observed significant negative values for both ARB and IRN; then we calculated F3 (ARB or IRN; YRI, CHB or JPT), but none of them were negative; and finally we calculated F3 (ARB or IRN; CEU, CHB or JPT), and only the value for IRN was significantly negative no matter whether the Asian

Figure 1 Population structure analysis. (a) Principal component analysis (PCA) with samples from the two Middle East (ME) populations: Arabian (ARB), Iranian (IRN) and 200 samples of four reference populations (CEU, CHB, JPT and YRI) from the International HapMap Project III. (b) ADMIXTURE analysis with data pruned based on linkage disequilibrium (LD); the lowest cross-validation error was observed at K = 3.

Admixture and consanguinity in the Middle East X Yang et al

3

Journal of Human Genetics

reference population was CHB or JPT (Table 1), thus indicating both of these populations admixed. In summary, the ARB population received ancestral contributions from European and African popula- tions, whereas the IRN population received ancestral contributions from European, African and Asian populations. These results were in accordance with the PCA and ADMIXTURE analyses. Moreover, some individuals showed excessive African ancestry (Figure 1b), suggesting recent gene flow from African population, which was consistent with previous mtDNA studies.3 For the ARB population, negative values were not obtained during the F3 testing using Asian reference populations, possibly because of low levels of gene flow that could not be detected. Furthermore, negative F3 values were not obtained for either of the ME populations with the YRI, CHB or JPT reference populations, which could be attributed to the fact that the admixture events were mainly between European and African populations, with only low-level gene flow occurring with Asian populations.

The direction and magnitude of influences of admixture and consanguinity on genetic diversity To investigate the direction and magnitude of influences that admixture and consanguinity had on genetic diversity, a linear model was proposed, with the ROH score and admixture effect fitted to the observed diversity. Regression analysis was performed on the two ME populations separately with the 10 independent samplings to investigate relationships at both the SNP and individual levels. At the SNP level, the results for both ME populations were highly concordant

among 10 independent samplings, with the intercept (βs0), ROH score (Xs1) coefficient (βs1) and admixture effect (Xs2) coefficient (βs2) all statistically significant (Supplementary Table S1). Owing to the level of consistency among the 10 independent samplings, the regression model parameters βs0, βs1 and βs2 were averaged to generate the final regression model for the SNP diversity as: Ys(ARB) = 0.06722 − 0.05680*Xs1+0.80069*Xs2 (mean adjusted R

2 = 0.66885) and Ys(IRN) = 0.05560 − 0.03963*Xs1+0.82681*Xs2 (mean adjusted R2 = 0.70289). Similar results were obtained for the individual diversity. Both the

regression models for ARB and IRN showed statistical significance and the 10 independent samplings were highly concordant (Supplementary Table S2). Again, the regression model parameters βi0, βi1 and βi2 were averaged to obtain the final individual diversity models: Yi(ARB) = − 1.05647 − 0.36337*Xi1+4.71255*Xi2 (mean adjusted R

2 = 0.90692) and Yi(IRN) = − 1.76090 − 0.28837*Xi1+7.16242*Xi2 (mean adjusted R2 = 0.97046). The positive coefficients for the admixture effect confirmed an

increased genetic diversity owing to admixture, which was consistent with the previous study,7 whereas the negative ROH score coefficients confirmed a decrease in genetic diversity owing to consanguinity. Overall, linear modeling enabled the quantification of the influences of admixture and consanguinity on the genetic diversity in the two ME populations.

Genome-wide distribution of local ancestry in ME populations The local ancestry at each SNP for each individual was estimated by ELAI. The local ancestry contributions from different ancestries were

Figure 2 Single-nucleotide polymorphism (SNP) level, individual level and haplotype level of genetic diversity obtained from 10 independent random samplings. (a) Mean SNP heterozygosity of a 100 kb sliding window. (b) Mean individual heterozygosity calculated from non-missing SNPs with and without runs of homozygosity (ROH) regions considered; and (c) mean haplotype heterozygosity. Haplotype heterozygosity was calculated by sliding windows of 10, 20, 30, 40, 50, 100, 200 and 500 kb.

Admixture and consanguinity in the Middle East X Yang et al

4

Journal of Human Genetics

not uniformly distributed across the genome, with some genomic regions showing excessive ancestry contribution from a given parental population (Figure 4). For the ARB and IRN populations, the loci showing excessive or scarce ancestry contribution beyond the 0.5% quantile were collected and the collected ARB and IRN SNPs were found to be highly overlapped (Supplementary Figure S2). These overlaps could contribute to the populations adapting to the same local environment. Functional annotation of these overlapping SNPs were identified using the DAVID database26,27 and showed all of the top 10 categories to relate to olfactory perception pathway (Benjamini corrected P-value o6.50× 10− 15; false discovery rate P-value o6.60× 10− 14) (Table 2). The genes enriched among the top 10 functional categories were mostly from olfactory families (2, 4, 5, 8, 9, 10, 11 and 12), with the exception of GABBR1 and MAS1L (Supplementary Table S3). GABBR1 is a γ-aminobutyric acid B

receptor, which is the main inhibitory neurotransmitter in the mammalian central nervous system, whereas MAS1L is a G-protein- coupled receptor and is associated with the G-protein coupled receptor protein signaling pathway. It is likely that these two genes are also associated with the olfactory perception pathways indirectly.

DISCUSSION

In this study, we attempted to explore the combined effect of genetic admixture and consanguinity on human genetic diversity. We analyzed genome-wide SNP data of two ME populations, ARB and IRN, and our results showed that the genetic diversity of the two ME populations was higher than that of the other non-African popula- tions, which was consistent with an admixture scenario. At the same time, long ROH fragments were also identified in a vast number of genomic regions in the two ME populations, which was also consistent

Figure 3 Runs of homozygous fragments (ROHs). (a–c) Total length (top) and total number (bottom) of short (a), intermediate (b) and long (c) ROHs per individual, respectively. (d–f) Scatterplot of total length against the total number of short (d), intermediate (e) and long (f) ROHs per individual, respectively. Legends in (e) and (f) are the same as those in (d).

Admixture and consanguinity in the Middle East X Yang et al

5

Journal of Human Genetics

with the expected consequence of consanguineous marriage. These results suggest that the demographic history of the two ME popula- tions is very complex. Considering the geographical location of the ME, the observed

higher genetic diversity in the two ME populations could simply be explained by a possible scenario that these populations were surrogate ancestral populations of the European and Asian populations.

Moreover, signatures of population admixture were also very pro- nounced based on PCA, ADMIXTURE analysis and F3 testing. Therefore, a more likely yet complex scenario was that the two ME populations were admixed populations with gene flow contributions from European, Asian and African populations. These admixture events increased the genetic diversity of the ME populations to levels comparable to or higher than those of African populations, with this diversity gradually decreased because of the prevalent cultural practices of consanguinity. Our results support the second scenario and are consistent with previous findings.3–5 However, the genetic architec- tures of modern ME populations could result from ancient migration, subsequent gene flow (or admixture) between well-differentiated populations and entangle with recent consanguineous marriages. Social and historical documentation in conjunction with other genetic findings all support this interpretation.8,28

A challenge when analyzing genetic diversity based on genotyping data is potential ascertainment bias. The availability of public sequencing data in worldwide populations (e.g., CEU, CHB, JPT and YRI) made it possible to correct for this bias by referencing the MAF spectrum of sequence data (Supplementary Figure S3). This bias was corrected by randomly sampling a subset of SNPs from the genotyping data according to the distribution of MAFs from the sequencing data, with this approach repeated 10 times. Interestingly, even after correction, the CEU diversity was still slightly higher than that of CHB and JPT. One possible cause of these differences between European and Asian populations may be attributed to differences in

Table 1 F3 test results

A B C F3-score S.e. Z-score

YRI CEU ARB − 0.01079 0.000294 − 36.699

YRI CEU IRN − 0.0086 0.000293 − 29.382

YRI CHB ARB 0.017876 0.000618 28.941

YRI CHB IRN 0.019322 0.000653 29.61

YRI JPT ARB 0.018061 0.000619 29.193

YRI JPT IRN 0.019288 0.000651 29.635

CEU CHB ARB 0.008413 0.000285 29.486

CEU CHB IRN − 0.00126 0.00026 − 4.854

CEU JPT ARB 0.008449 0.000289 29.265

CEU JPT IRN − 0.00145 0.000264 − 5.491

Abbreviations: ARB, Arabian; CEU, Utah residents with Northern and Western European ancestries; CHB, Han Chinese in Beijing, China; JPT, Japanese in Tokyo, Japan; IRN, Iranian; YRI, Yoruba in Ibadan, Nigeria. A and B denote the two proxy parental populations and C is the target population tested. Significant negative value observed in F3 test indicates population admixture in target population C. Z-score o− 1.64 (corresponding P-value for one-tailed test is 0.05) indicates statistical significance.

Figure 4 Mean ancestry contributions. (a) Mean ancestry contributions for each single-nucleotide polymorphism (SNP) in Arabian (ARB). Top: Mean European ancestry contribution to ARB; bottom: mean Asian ancestry contribution to ARB; (b) mean ancestry contribution for each SNP in Iranian (IRN). Top: Mean European ancestry contribution to IRN; bottom: mean Asian ancestry contribution to IRN. Black solid line denotes average mean ancestry contribution across genome; blue solid line denotes 99.5% quantile; and red solid line denotes 0.005% quantile.

Admixture and consanguinity in the Middle East X Yang et al

6

Journal of Human Genetics

strength and lasting time of a population bottleneck. To address this problem, the individual diversities of the four reference populations were calculated from sequencing data, with the same pattern observed as when randomly sampling genotyping data (Supplementary Figure S4). Therefore, it seemed that the higher genetic diversity in CEU populations was intrinsic, suggesting that recent gene flow in Europeans could be an important factor, with genetic contributions from other sources such as African or even possibly some archaic humans29,30 having significantly influenced the European gene pool. Admixture has been a common phenomenon throughout the

history of modern humans, with previously isolated populations often come into contact through colonization and migration. It is especially common in the ME since it has been a melting pot of cultures, languages and people. Both prehistoric and recent genetic admixture have greatly influenced the genetic makeup of regional ME popula- tions. On the other hand, consanguineous marriage is prevalent in many ME countries, which is expected to decrease ME genetic diversity. As a retrospective study based on modern human genomic data, it is difficult to fully distinguish the influences of admixture and consanguinity on genetic diversity, as each has a confounding effect on the other. However, we were able to confirm the generated theoretical predictions and roughly estimate the magnitudes of the influence of admixture and consanguinity based on the statistical approaches used in this study. Our analyses revealed that the current genetic archi- tectures of the two ME populations were shaped by a joint effect of the two forces that resulted from historical, cultural and potentially also from religious reasons. Additionally, we further explored the possibility of the influence of a

third type of force on regional genetic diversity, natural selection. Our approach to search for footprints of natural selection in both ME populations was based on admixture analysis seeking to identify genomic regions with local ancestry significantly deviated from the mean genome-wide distribution. While this approach could only detect natural selection signatures after population admixture, it is extremely interesting that the top candidate genes underlying natural selection in the two ME populations were associated with olfactory pathways. While we could not provide a convincing interpretation for these selection signatures, the noted statistical signals could not be explained by stochastic process. This suggests the presence of environmental pressures on these genes in the history of the two ME populations. Taken together, genetic admixture, consanguinity and natural selection have jointly shaped the genetic diversity of the two ME populations, with admixture and consanguinity having opposing effects on diversity, while natural selection exhibits a more

regional effect relative to the genome-wide influences seen from the former two factors.

CONFLICT OF INTEREST

The authors declare no conflict of interests.

ACKNOWLEDGEMENTS

These studies were supported by the Strategic Priority Research Program of the

Chinese Academy of Sciences (CAS) (XDB13040100) and by the National

Science Foundation of China (NSFC) grants (91331204; 31171218). This

research was supported in part by the Ministry of Science and Technology

(MoST) International Cooperation Base of China and by National Center for

Mathematics and Interdisciplinary Sciences (NCMIS), Academy of

Mathematics and Systems Science, CAS. SX is Max-Planck Independent

Research Group Leader and member of CAS Youth Innovation Promotion

Association. SX also gratefully acknowledges the support of the National

Program for Top-notch Young Innovative Talents of The ‘Ten-Thousand-

Talents’ Project and the support of KC Wong Education Foundation,

Hong Kong. Fahd Al-Mulla was supported by the Kuwait Foundation for

Advancement of Sciences (No. 2011-1302-06). WG was supported by the

Fundamental Research Funds for the Central Universities (2011JBZ019).

We thank LetPub (http: //www.letpub.com) for its linguistic assistance during

the preparation of this manuscript. All funders had no role in study design, data

collection and analysis, decision to publish or preparation of the manuscript. Author contributions: SX conceived and designed the study; FA-M, SA-B and

MM collected and genotyped the samples; XY, QF and WG analyzed the data;

XY and SX wrote the paper.

1 Oppenheimer, S. Out-of-Africa, the peopling of continents and islands: tracing uniparental gene trees across the map. Philos. Trans. R. Soc. Lond. Ser. B 367, 770–784 (2012).

2 Wilder, J. A., Kingan, S. B., Mobasher, Z., Pilkington, M. M. & Hammer, M. F. Global patterns of human mitochondrial DNA and Y-chromosome structure are not influenced by higher migration rates of females versus males. Nat. Genet. 36, 1122–1125 (2004).

3 Theyab, J. B., Al-Bustan, S. & Crawford, M. H. The genetic structure of the Kuwaiti population: mtDNA inter- and intra-population variation. Hum. Biol. 84, 379–403 (2012).

4 Triki-Fendri, S., Alfadhli, S., Ayadi, I., Kharrat, N., Ayadi, H. & Rebai, A. Genetic structure of Kuwaiti population revealed by Y-STR diversity. Ann. Hum. Biol. 37, 827–835 (2010).

5 Alsmadi, O., Thareja, G., Alkayal, F., Rajagopalan, R., John, S. E., Hebbar, P. et al. Genetic substructure of Kuwaiti population reveals migration history. PLoS ONE 8, e74913 (2013).

6 Hellenthal, G., Busby, G. B., Band, G., Wilson, J. F., Capelli, C., Falush, D. et al. A genetic atlas of human admixture history. Science 343, 747–751 (2014).

7 Xu, S., Jin, W. & Jin, L. Haplotype-sharing analysis showing Uyghurs are unlikely genetic donors. Mol. Biol. Evol. 26, 2197–2206 (2009).

8 Teebi, A. S. & Teebi, S. A. Genetic diversity among the Arabs. Community Genet. 8, 21–26 (2005).

9 Pemberton, T. J., Absher, D., Feldman, M. W., Myers, R. M., Rosenberg, N. A. & Li, J. Z. Genomic patterns of homozygosity in worldwide human populations. Am. J. Hum. Genet. 91, 275–292 (2012).

Table 2 Functional annotation of the overlapped SNPs beyond 0.5% quantile in European or Asian ancestries within ARB and IRN populations

Category Term P-value Benjamini FDR

GOTERM_BP_FAT Sensory perception of smell 1.50E− 29 8.50E− 27 2.10E− 26

SP_PIR_KEYWORDS Olfaction 3.50E− 29 6.10E− 27 4.30E− 26

GOTERM_MF_FAT Olfactory receptor activity 4.70E− 28 8.30E− 26 5.80E− 25

GOTERM_BP_FAT Sensory perception of chemical stimulus 9.70E− 28 2.80E− 25 1.40E− 24

SP_PIR_KEYWORDS Sensory transduction 5.20E− 23 4.50E− 21 6.30E− 20

KEGG_PATHWAY Olfactory transduction 1.60E− 22 8.20E− 21 1.60E− 19

SP_PIR_KEYWORDS G-protein-coupled receptor 1.20E− 20 7.00E− 19 1.50E− 17

SP_PIR_KEYWORDS Transducer 1.60E− 19 7.00E− 18 2.00E− 16

GOTERM_BP_FAT Sensory perception 6.80E− 19 1.30E− 16 1.00E− 15

GOTERM_BP_FAT Cognition 4.50E− 17 6.50E− 15 6.60E− 14

Abbreviations: ARB, Arabian; FDR, false discovery rate; IRN, Iranian; SNP, single-nucleotide polymorphism.

Admixture and consanguinity in the Middle East X Yang et al

7

Journal of Human Genetics

10 Leutenegger, A. L., Sahbatou, M., Gazal, S., Cann, H. & Genin, E. Consanguinity around the world: what do the genomic data of the HGDP–CEPH diversity panel tell us? Eur. J. Hum. Genet. 19, 583–587 (2011).

11 Al-Kandari, Y. Y. & Crews, D. E. The effect of consanguinity on congenital disabilities in the Kuwaiti population. J. Biosoc. Sci. 43, 65–73 (2011).

12 International HapMap, C., Altshuler, D. M., Gibbs, R. A., Peltonen, L., Altshuler, D. M., Gibbs, R. A. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).

13 Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).

14 Price, A. L., Patterson, N. J., Plenge, R. M., Weinblatt, M. E., Shadick, N. A. & Reich, D. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

15 Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).

16 Genomes Project, C., Abecasis, G. R., Auton, A., Brooks, L. D., DePristo, M. A., Durbin, R. M. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

17 Genomes Project, C., Abecasis, G. R., Altshuler, D., Auton, A., Brooks, L. D., Durbin, R. M. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

18 Browning, B. L. & Browning, S. R. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am. J. Hum. Genet. 84, 210–223 (2009).

19 Browning, S. R. & Browning, B. L. Rapid and accurate haplotype phasing and missing- data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).

20 Kirin, M., McQuillan, R., Franklin, C. S., Campbell, H., McKeigue, P. M. & Wilson, J. F. Genomic runs of homozygosity record population history and consanguinity. PLoS ONE 5, e13996 (2010).

21 Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A., Bender, D. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

22 Patterson, N., Moorjani, P., Luo, Y., Mallick, S., Rohland, N., Zhan, Y. et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012).

23 Guan, Y. Detecting structure of haplotypes and local ancestry. Genetics 196, 625–642 (2014).

24 Jin, W., Wang, S., Wang, H., Jin, L. & Xu, S. Exploring population admixture dynamics via empirical and simulated genome-wide distribution of ancestral chromosomal segments. Am. J. Hum. Genet. 91, 849–862 (2012).

25 Price, A. L., Tandon, A., Patterson, N., Barnes, K. C., Rafaels, N., Ruczinski, I. et al. Sensitive detection of chromosomal segments of distinct ancestry in admixed popula- tions. PLoS Genet. 5, e1000519 (2009).

26 Huang, D. W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009).

27 Huang, D. W., Sherman, B. T. & Lempicki, R. A. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37, 1–13 (2009).

28 Al-Bustan, M., Majeed, S., Bitar, M. S. & Al-Asousi, A. Socio-demographic features and knowledge of diabetes mellitus among diabetic patients in kuwait. Int. Q. Community Health Educ. 17, 65–76 (1997).

29 Der Sarkissian, C., Balanovsky, O., Brandt, G., Khartanovich, V., Buzhilova, A., Koshel, S. et al. Ancient DNA reveals prehistoric gene-flow from siberia in the complex human population history of North East Europe. PLoS Genet. 9, e1003296 (2013).

30 Botigue, L. R., Henn, B. M., Gravel, S., Maples, B. K., Gignoux, C. R., Corona, E. et al. Gene flow from North Africa contributes to differential human genetic diversity in southern Europe. Proc. Natl Acad. Sci. USA 110, 11791–11796 (2013).

Supplementary Information accompanies the paper on Journal of Human Genetics website (http://www.nature.com/jhg)

Admixture and consanguinity in the Middle East X Yang et al

8

Journal of Human Genetics

View publication statsView publication stats

  • The influence of admixture and consanguinity on population genetic diversity in Middle�East
    • Introduction
    • Materials and methods
      • Samples and quality controls
      • Population structure analysis
      • Assessing genetic diversity
      • Assessing consanguinity
      • Testing population admixture
      • Inferring local ancestry
      • Linear regression analysis
    • Results
      • Population structure of ME populations
      • ME populations show higher genetic diversity than the other non-African populations
      • ME populations show higher consanguinity
      • Evidence of admixture in the ME populations
    • Figure 1 Population structure analysis.
      • The direction and magnitude of influences of admixture and consanguinity on genetic diversity
      • Genome-wide distribution of local ancestry in ME populations
    • Figure 2 Single-nucleotide polymorphism (SNP) level, individual level and haplotype level of genetic diversity obtained from 10 independent random samplings.
    • Discussion
    • Figure 3 Runs of homozygous fragments (ROHs).
    • Table 1 F3 test results
    • Figure 4 Mean ancestry contributions.
    • These studies were supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (CAS) (XDB13040100) and by the National Science Foundation of China (NSFC) grants (91331204; 31171218). This research was supported in part by the M
    • ACKNOWLEDGEMENTS
    • Table 2 Functional annotation of the overlapped SNPs beyond 0.5% quantile in European or Asian ancestries within ARB and IRN populations

nihms788970.pdf

Characterization of Greater Middle Eastern genetic variation for enhanced disease gene discovery

Eric M. Scott1,2,3, Anason Halees4,5,6, Yuval Itan1,7, Emily G. Spencer1,2,3, Yupeng He1,2,3, Mostafa Abdellateef Azab1,2,3, Stacey B. Gabriel8, Aziz Belkadi9,10, Bertrand Boisson8,9,10, Laurent Abel6,9,10, Andrew G. Clark11, Greater Middle East Variome Consortium1,2,3, Fowzan S. Alkuraya12,13, Jean-Laurent Casanova1,7,9,10,14, and Joseph G. Gleeson1,2,3

1Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA

2Department of Neurosciences, University of California, San Diego, La Jolla, CA 92093, USA

3Laboratory for Pediatric Brain Disease, The Rockefeller University, New York, NY 10065, USA

4Department of Biostatistics, King Faisal Specialist Hospital & Research Center, Riyadh, 11211, Saudi Arabia

5Department of Epidemiology, King Faisal Specialist Hospital & Research Center, Riyadh, 11211, Saudi Arabia

6Scientific Computing, King Faisal Specialist Hospital & Research Center, Riyadh, 11211, Saudi Arabia

7St. Giles Laboratory of Human Genetics of Infectious Diseases, The Rockefeller University, New York, NY, 10065, USA

8The Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA

Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms

Correspondence: [email protected]. #Full list of Consortium contributors provided in Acknowledgements

URLs ANNOVAR, http://annovar.openbioinformatics.org Kinship-based INference for Gwas (KING), http://people.virginia.edu/~wc9c/KING/ Plink, http://pngu.mgh.harvard.edu/~purcell/plink/ PolyPhen-2, http://genetics.bwh.harvard.edu/pph2/ PSEQ, http://atgu.mgh.harvard.edu/plinkseq/pseq.shtml SnpEff, http://snpeff.sourceforge.net/SnpEff_manual.html UCSC Genome Browser, http://genome.ucsc.edu 1,000 Genomes Browser, http://browser.1000genomes.org Consang.net, http://consang.net/index.php/Global_prevalence Denisovan to Human alignment (FTP), http://www.eva.mpg.de/denisova Neanderthal to Human alignment (FTP), http://cdna.eva.mpg.de/neandertal GME Variome, http://gme.igm.ucsd.edu

Author contributions E.M.S. performed analysis and generated all figures. A.H, Y.I, Y.H., M.A.A. consulted on analysis. E.G.S., A.B., B.B., A.A., F.S.A., J.-L.C., J.G.G. contributed subjects and jointly wrote and edited the manuscript. S.B.G. oversaw sequencing. A.G.C. consulted on population studies. GME Consortium identified subjects for study.

Competing financial interests The authors declare no competing financial interests

Published as: Nat Genet. 2016 September ; 48(9): 1071–1076.

H H

M I A

u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t

9Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM U1163, Necker Hospital for Sick Children, INSERM, Paris, France, EU

10Paris Descartes University, Imagine Institute, Paris, France, EU

11Department of Molecular Biology and Genetics, Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA

12Department of Genetics, King Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia

13Department of Anatomy and Cell Biology, College of Medicine, Alfaisal University, Riyadh, Saudi Arabia

14Pediatric Hematology-Immunology Unit, Necker Hospital for Sick Children, Paris, France, EU

Abstract

The Greater Middle East (GME) has been a central hub of human migration and population

admixture. The tradition of consanguinity, variably practiced in the Gulf region, North Africa, and

Central Asia 1–3, has resulted in an elevated burden of recessive disease4. Here we generated a

whole exome GME variome from 1,111 unrelated subjects. We detected substantial diversity from

sub-geographies, continental and subregional admixture, several ancient founder populations with

little evidence of bottlenecks. Measured consanguinity was an order-of-magnitude above that of

other sampled populations, and included an increased burden of runs of homozygosity (ROH), but

no evidence for reduced burden of deleterious variation due to classically theorized ‘genetic

purging’. Applying this database to unsolved GME recessive conditions reduced the number of

potential disease-causing variants by 4–7-fold. These results reveal the variegated GME genetic

architecture and support future human genetic discoveries in Mendelian and population genetics.

Keywords

Mutational load; whole exome sequencing; introgression; admixture; inbreeding coefficient; homozygous; derived allele frequency; consanguineous; selective pressure; runs of homozygosity

The Greater Middle East (GME), loosely defined as a large swath of Arab and non-Arab

Muslim countries from Morocco in the west to as far east as Pakistan 5, is home to

approximately 10% of the world’s population. Despite its invaluable contribution to our

understanding of the genetic causes of inherited conditions, especially recessive conditions,

and its critical hub as a crossroad to early civilizations, genetic architecture and extent of

rare genetic variation remains poorly defined 6–8.

To address this shortcoming, the GME Variome Consortium collected whole-exome data on

1,794 self-reported nationals from GME regions participating in on-going genetics studies.

In order to minimize selection bias or overrepresentation of disease alleles, we selected

primarily healthy individuals from families, and wherever possible, removed from datasets

the allele that brought the family to medical attention. Samples were jointly processed, and

filtered for quality and familial relation, leaving 1,111 high-quality unrelated individuals.

Scott et al. Page 2

Nat Genet. Author manuscript; available in PMC 2017 March 01.

H H

M I A

u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t

We grouped the 1,111 GME exomes into six different GME subregions: Northwest Africa

(NWA, 85 samples), Northeast Africa (NEA, 423 samples), Turkish Peninsula (TP, 140

samples), Syrian Desert (SD, 81 samples), Arabian Peninsula (AP, 214 samples), and Persia

and Pakistan (PP, 168 samples) (Fig. S1, Table S1), which represent historic groupings, then

compared with exomic data of nine established continental populations from 1000 Genomes

(1000G) 9. Unbiased identity-by-state clustering showed that samples largely grouped

according to the location of ascertainment, validating grouping criteria (Fig. S2).

To evaluate GME genetic substructure, we ran the unsupervised algorithm ADMIXTURE 10,

where K=6 clusters minimized cross-validation error (Fig. S3). We found some overlap with

the primary admixture components from Africa, Europe and East Asia at the edges of

geography, but also a large proportion not found in previous reference samples (Figs. 1a,

S4). The admixture results also aligned with publications reporting common variation 11–13.

The least admixed samples were found in NWA, AP, and PP, suggesting these were founder

populations, but showed inter-regional variation of GME-specific components suggesting

local admixture (Fig. 1b), and potentially supporting historic events. The NWA component

was found from west to east across North Africa, likely representing the presence of Berber

genetic background 14. The AP component likely represented ancestral Arab populations

and was observed in nearly all regions, possibly a result of the Arab conquests of the 7th

century coincident with the expansion of the Arabic language now spoken over much of the

region. Similarly, the Persian expansion into TP, SD, and parts of NEA in the 5th century

was the most likely contributor of PP signal.

Additional sources of human heterogeneity derive from ancient introgression. We found

similar patterns of Neanderthal introgression across all GME populations with the exception

of NWA, which clustered closer to Sub-Saharan Africans (Fig. S5) 15–17. These data

supports the reduced Neanderthal introgression observed in native African populations.

Patterns of human migration and drift were recapitulated using TreeMix among GME

subregions, based upon 1000G control populations (Fig. 1c) 18. The inferred tree with no

migration showed tight clusters of European and Asian populations, but much larger

apparent divergence among GME regions. The ordering of GME subregions from the root

corroborated much of the ‘out-of-Africa’ ordering of subsequent founder populations 13.

Within the GME, the distance from the root emulated the west-to-east organization of GME

samples, with PP showing the largest inferred drift parameter, supporting a west-to-east

trajectory of human migrations.

Assessment of Wright’s fixation index (Fst) demonstrated that the GME grouped with

European populations, agreeing with TreeMix results. This resulted in three distinct clusters

with a low degree of differentiation (Figs. 1d, S6). PP and NWA represented the extremes of

the identified subregions, and showed the highest degree of differentiation (Fst = 0.026) (>2x

compared to the distance between Finnish (FIN) and Toscani (TSI) but smaller than

intercontinental comparisons). Of the four measured 1000G European populations, GME Fst measurements were closest to TSI, especially SD and TP, consistent with higher levels of

European admixture in these populations. Despite the contribution of admixture, these

Scott et al. Page 3

Nat Genet. Author manuscript; available in PMC 2017 March 01.

H H

M I A

u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t

values suggested extended periods of isolation relative to 1000G populations within each

subregion.

Inter-subregion relationships were tested using principal component analysis (PCA). As

expected, the first two PCs separated along well-established geographic axes: PC1 separated

Sub-Saharan Africans from all other populations, and PC2 separated Eurasian populations

(Fig. S7). GME sub-regions fell between the 1000G African, East Asian, and European,

supporting recent admixture. PP and TP were closer to East Asian, while NEA, NWA, and

SD were closer to Sub-Saharan Africans. PC3 and PC4 separated samples along

topographical north-south and east-west gradients, while exhibiting largely distinct but

overlapping groups with a high-degree of inter-region diversity (Fig. 2a).

To test if these populations were subject to bottlenecks, we calculated the mean linkage

disequilibrium (LD) decay, as haplotypes should decay as a function of size more slowly

with increased bottleneck (Fig. 2b). LD for each GME population decayed faster than

European and East Asian but slower than African populations. LD decayed faster in NWA

and NEA compared with other GME regions, in agreement with our TreeMix results.

Diverse patterns of admixture across these regions suggested these trends were not

predominantly due to intermixing, but instead argued for a historic common ancient

bottleneck.

Between 20–50% of all GME marriages are consanguineous (compared with < 0.2% in the

Americas and Western Europe) 1–3, with the majority being first cousin. This roughly 100X

higher rate of consanguinity has correlated with roughly a doubling of the rate of recessive

Mendelian disease 19,20. European, African and East Asian 1000G populations all had

distributions of estimated inbreeding coefficients (F) ~0.005, whereas GME F values ranged

from 0.059 to 0.098, but with high variance within each population (Fig. 2c). Thus,

measured F was ~10–20X higher, reflecting the shared blocks common to all human

populations. F values were dominated by immediate family structure rather than historic or

population-wide data trends (Fig. S8) 21. Examining the larger set of 1,794 exomes that

included many parent-child trios also showed an overwhelming influence of immediate

family structure, in which offspring from first-cousin marriages displayed higher F values

compared with non-consanguineous marriages (Fig. 2d).

We expected that higher F values would correlate with an increased burden and length of

‘runs of homozygosity’ (ROH), defined as homozygous haplotypes as a function of

length 22. 1000G sub-Saharan Africa displayed the smallest total ROH as expected 23,

whereas the two other 1000G assessed populations were relatively similar to GME (Figs. 3a,

S9), probably reflecting similar lengths of short (<0.515 Mb) and medium (0.516–1.606 Mb)

ROH. Most striking was the increase in long ROH (>1.607 Mb), found nearly exclusively in

GME samples, especially for those over 4 Mb (Fig. 3b). In the GME, there was an

enrichment of rare and very rare variants (AF <.05, and AF <.01) in longer ROH, and of

common variants (AF >= .05) in shorter ROH (Fig. 3c), suggesting that the longer ROH

result from recent consanguinity 24

Scott et al. Page 4

Nat Genet. Author manuscript; available in PMC 2017 March 01.

H H

M I A

u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t

This increased ROH provided an opportunity to identify homozygous loss-of-function

variants (LOF) in healthy humans. While these variants are only putatively LOF until

experimentally verified, these exhibit the strongest signs of selective pressure and are the

first checked as disease candidates25. Recently, among 2,636 sequenced and 101,584 chip-

imputed Icelanders, 1,171 genes were predicted to be inactivated 26. From our 354 exomes

on verified healthy adults, we found 301 genes with rare homozygous putative LOF variants

(Table S2, S3), with only 50 genes overlapping with the Icelandic gene list. Similarly the

ExAC dataset on 60,706 sequenced individuals identified 2068 genes inactivated, of which

only 94 genes overlapped with our 301 genes. This suggests that the set of non-clinically

relevant LOF variants is far from being exhausted. The GME represents an optimal

population from which to identify homozygous variants due to the elevated consanguinity

rates.

Darwin observed that rare self-fertilized orchid strains exhibited surprisingly higher fitness

than founder strains, which he termed ‘hero strains’ 27. This led to the concept of ‘purging of

recessive alleles’ by Haldane 28, referring to increased loss of deleterious alleles due to

increased selective pressure in inbred populations. Purging was hypothesized to impact the

GME genome due to the higher rates of birth defects incompatible with future

reproduction 29, but has yet to be documented in humans. We compared the distribution of

derived allele frequencies (DAF) in GME and 1000G populations 30. Variants were divided

into 7 functional and PolyPhen-2 deleterious classes. We calculated mean DAFs using

chimpanzee (PanTro2) as the common ancestor (Figs. S10, S11) 31. Neither autosomal nor

X-linked variants showed significant differences (Fig. S12), arguing against a measurable

effect on overall variant burden resulting from consanguinity.

Numerous studies have relied on the increased power of GME-resident consanguineous

families to identify causes of recessive disease, but the lack of an accessible variome has

hindered progress. Efforts like the NHLBI GO Exome Sequencing Project (ESP) produced

variomes for European American (EA) and African American (AA) populations, but poor

correlation of DAFs between population pairs determined that neither were good estimators

for GME DAFs (Pearson’s r 0.7979 GME vs. EA, 0.385 GME vs. AA, 0.1447 EA vs. AA,

Figs. 4a, S13). Moreover, we found much of the GME variation to be poorly represented

outside the GME (Fig. 4b), with the majority of variants in the rarest DAF bin found only in

the GME.

In order to assess how well the GME Variome captured extant exome variation, we sub-

sampled the cohort for 100 iterations from 5 to 700 individuals, and for 8 variant classes (see

methods, Fig. 4c). There was decay in the number of unique variants and accumulation of

rare variants as sample size increased, due to a scaled ability to estimate prevalence. When

sampled near 1,000 individuals, the change in mean of these values was negligible as new

samples were added. Thus the GME Variome should allow accurate determination of

population-level DAFs for all but the rarest alleles.

In order to investigate the potential of the Variome to expedite the discovery of new disease

genes, we compared causal variant sets from GME families displaying recessive hereditary

spastic paraplegia (HSP), where we recently established 17 new genetic forms of disease 32.

Scott et al. Page 5

Nat Genet. Author manuscript; available in PMC 2017 March 01.

H H

M I A

u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t

For a disease like recessive HSP with a prevalence of 3–10 per 100,000 33 and where there

are more than 40 genetic forms and hundreds of individual genetic mutations known, the

expected allele frequency for any causative mutation should be <1:1000 (see methods).

Select individuals from 20 representative families underwent whole exome sequencing. For

each family, we calculated the number of alleles that passed standard filtering (i.e. LOF or

otherwise potential ‘high impact’) 34 and were unique, both without and with the DAFs of

the Variome (see website for Variome below). Using only exome data from the 20 families

and public sources, there were on average 56, 20, and 11 unique variants passing filters from

families with one, two or three sequenced affected members, respectively. In contrast, by

accessing the Variome there were on average 13, 5, and 4 unique variants (Fig. 4d, Table

S4), yielding a 4–7-fold reduction of the number of variants requiring further consideration.

Loosening the allowable AFs to <1:500, <1:333 or <1:250 also showed substantial reduction

in the number variants for consideration.

Here we have interrogated the fine-scale genomic structure across the GME, shaped by

prehistoric as well as historic migrations, conquests, and cultural traditions. The degree of

unique genetic variation represented in the GME was surprising given previous efforts to

capture diversity, and speaks to the value of sampling of understudied populations. The data

support records of migrations and conquests, but also suggest a previously unstudied GME

contribution.

Despite millennia of elevated consanguinity in the GME, we detected no evidence for

purging of recessive alleles. Instead, we detected large rare homozygous blocks, distinct

from the small homozygous blocks found in other populations, supporting recent

consanguineous matings, and allowing identification of genes harboring putatively high

impact homozygous variants in healthy humans from this population. Applying the Variome

to future sequencing projects for GME-originating subjects could aid in recessive gene

identification across all classes of disease. GME Variome is a publicly accessible resource

that will facilitate a broad range of genomic studies in the GME and globally.

Online Methods

1. Definition of the Greater Middle East

The term “greater Middle East” has been used to refer to a large swath of Arab and non-

Arab Muslim countries, stretching from Morocco in the west to as far east as Pakistan in

southeast Asia. However, no precise listing of designated countries has yet emerged. “U.S.

Working Paper for G8 Sherpas,” Al-Hayat, February 13, 2004. Available online at [http:// english.daralhayat.com/Spec/02-2004/Article-20040213-ac40bdaf-

c0a8-01ed-004e-5e7ac897d678/story.html] and [https://www.fas.org/sgp/crs/mideast/

RS22053.pdf]. Editable map of the Middle East was downloaded from [http://

www.presentationmagazine.com].

2. Exome Resequencing

2.1 Study sample—The 2,497 individuals used in the analysis were selected from samples ascertained across three labs and recruited with the help of clinicians that

Scott et al. Page 6

Nat Genet. Author manuscript; available in PMC 2017 March 01.

H H

M I A

u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t

constituted the GME Consortium. Although these individuals were not a random sample,

they were ascertained within a wide variety of distinct phenotypes such that cohort-specific

effects were not expected to bias patterns of variation. All study participants in each of the

component studies provided written informed consent for the use of their DNA in studies

aimed at identifying genetic risk variants for disease, and for broad data sharing.

Institutional certification was obtained for each sample to allow deposition of genotype data

in dbGaP and other purposes.

2.2 Exome resequencing, variant calling, and filtering—Blood DNA was extracted using Qiagen reagents, subjected to exome capture with the Agilent SureSelect Human All

Exome 50 Megabase (Mb) kit, sequenced on an Illumina HiSeq2000 instrument, resulting in

~94% target coverage at > 30X depth 35–37. FASTQ files were reprocessed and jointly called

to minimize batch effects and ensure consistent variant calling. using the GATK pipeline

(version 3.1–1) adhering to best practices 38, eliminating duplicate reads. Paired-end reads

were aligned to the human reference genome NCBI Build 37, using BWA (version 0.7.5) 39.

Principal component analysis (PCA) was run on the resultant set of variants to identify

potential batch effects between labs, sequencing centers, or collectively run groups of

samples, then samples eliminated until no batch effects were observed.

We calculated four quality control (QC) metrics for each sample using PSEQ and identified

statistical outliers. Metrics included: total number of variants, transition/transversion ratio,

number of sequenced positions, and number of singletons. Due to possible reference

distance bias, we considered samples grouped by geographic region independently. Samples

were identified as outliers using a cutoff of >5 standard deviations from the mean threshold

for each QC metric, removing 314 samples. The PCA based outlier analysis algorithm from

the EIGENSOFT software library was also run, but failed to find any additional samples

violating a standard deviation threshold of 5.0 40.

To ensure unbiased population structure statistics and allele frequency estimates, we

removed close and cryptic relationships from the dataset. Kinship estimation was generated

using KING, which calculated relatedness between all pairs of individuals and was robust to

population structure 41. Using the 182,967 LD filtered SNPs, we ran KING following

standard guidelines for a 3rd degree relationship (i.e. first cousins), using a kinship

coefficient of 0.04419. When a cluster of related individuals was identified, we preferentially

removed those to leave the largest number of samples. Of the remaining 2183 samples after

outlier filtering, 667 samples were removed to reduce dataset relatedness, leaving a final

cohort of 1516 non-related individuals. Remaining samples were rerun through the KING,

which identified no additional kinships. Final continental sample counts after filtering: Sub-

Saharan Africa: 19, America: 33, Europe: 378, Oceania: 1, and Middle East: 1111.

Coverage statistics were generated across all internal exome data sets using BEDTools, to

calculate the average coverage across each exon 42. Exons were filtered from the analysis if

greater than 5% of samples had less than 10x average coverage. Out of the initial 192,056

exons targeted by the Agilent SureSelect II capture kit, 170,032 exons were well covered in

at least 95% of samples. Variants were filtered if identified outside of these genomic regions,

Scott et al. Page 7

Nat Genet. Author manuscript; available in PMC 2017 March 01.

H H

M I A

u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t

leaving 32,967,859 bases under consideration (~1% of the human genome) within 17,800

genes.

Standard filters for variants that were called with posterior probability >99% (glfMultiples

SNP quality > 20), were at least 5 bp away from an indel detected in the 1000G Pilot

Project, were targeted in at least 95% individuals and had a total depth across samples

between 6,823 to 6,823,000 (~1–1000 reads per sample on average) 9. Variant positions

were filtered based on population statistics including a ‘missingness’ rate (referring to the

percent of samples where information was missing) of less than 5%, and Hardy-Weinberg

equilibrium (HWE) deviation p-value < 0.00005 43.

We generated a subset of variants in minimal linkage disequilibrium (LD) by pruning

variants exhibiting pairwise linkage disequilibrium (r2). Variants were filtered to exclude SNPs with minor allele frequency (MAF) <5%, and all indels. Remaining SNPs were pruned

adhering to a maximum threshold of 0.5 using PLINK’s ‘--indep-pairwise’ command 43. Of

the initial 578,231 variants, 182,967 SNPs passed filters. This LD pruned dataset was used

for population structure characterization including principal component analysis (PCA),

Wright’s fixation index (Fst), admixture analysis, KING relationship testing, and estimation

of inbreeding coefficient.

2.3 Geographic region assignment—Samples were recruited from 20 countries and territories across the GME and grouped into a set of six geographic regions: Northwest

Africa (85 Samples), Northeast Africa (423 Samples), Arabian Peninsula (214 Samples),

Syrian Desert (81 Samples), Turkish Peninsula (140 Samples), and Persia and Pakistan (168

Samples). Country boundaries were not used to group samples for two reasons: 1]

Inconsistent sampling left several countries with too few samples to accurately represent the

diversity of the population. Syria and Yemen, for example, were only represented by a few

samples, due to ongoing conflicts. 2] Current country borders frequently fail to accurately

separate ethnicities, due to a combination of recent migrations and recent political history.

For example, south-eastern Arabian Peninsula Bedouin tribes do not distinguish between the

relatively recently defined borders of Oman and the UAE.

Self-identified ethnicities were available for some samples, but incompleteness of this

annotation, and the great diversity of populations affiliating as “Arab”, prompted use of

geography for groupings. As much as possible we assigned location to the current residence,

rather than ancestral residence or location where samples were drawn. While some reference

GME ethnicities exist in public resources, such as the Human Genome Diversity Project

(HGDP) 44, we found both the breadth of ascertained ethnicities and sample size insufficient

to impute ethnicities where absent.

The original cohort was largely composed of samples from GME countries, but also

included samples of African, European, and East Asian decent. To ensure consistency in our

geographic designations we performed and linkage clustering, based on pairwise distances

between samples using Plink’s ‘--distance-matrix’ command 43. We performed hierarchical

clustering on all samples using Ward’s hierarchical clustering method (“ward.D2” option for

the “hclust” algorithm in R)45.

Scott et al. Page 8

Nat Genet. Author manuscript; available in PMC 2017 March 01.

H H

M I A

u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t

3. Population Structure of GME

3.1 Data integration—Population structure was analyzed in the context of continental populations from the 1000 Genomes Phase I (1000G) dataset 9. As 1000G samples were

generated from a combination of whole genome and exome sequencing, variants falling

outside of RefSeq exonic regions +/− 30 base pairs (bp) were filtered using BedTools and

merged with the GME cohort 46,47. Nine populations from 1000G data were used in

comparative analyses: African populations YRI and LWK; East Asian populations CHB,

CHS, and JPT; and European populations GBR, TSI, IBS, and FIN. Related 1000G samples

were filtered by a KING analysis as previously described. A total of 1821 samples remained

after filtering representing 15 geographic regions, 6 from the GME and 9 from 1000G.

3.2 Substructure analysis—To investigate the influence of admixture on the GME samples, we used the block relaxation algorithm implemented in ADMIXTURE to estimate

individual ancestry proportions given K ancestral populations 48. Unsupervised

ADMIXTURE was run using default settings (folds=5) on merged GME and 1000G samples

and iterations of K values from 2 to 14. Minimum squared error values calculated from

ADMIXTURE’s cross-validation procedure for evaluating fit of different values of K, found

an optimum K = 6 for just GME samples, and 7 including 1000G control data.

3.3 PCA and Wright’s Fixation Index (Fst)—Principal component analysis (PCA) was used to investigate the affinities within human populations and the relationships between

them. We performed PCA on GME and 1000G samples using the SmartPCA tool from the

EIGENSOFT software library and the first four principal components compared

graphically 40,49.

Wright’s fixation index (Fst) was used to explore the degree of differentiation between

populations. Fst values and standard error for all pairs of populations were calculated using

the estimator of Weir & Cockerham, also included in the EIGENSOFT software library. All

plots were generated using ggplot2 50.

3.4 LD decay—Pairwise linkage disequilibrium among pairs of SNPs is an indicator of the past history of recombination and genetic drift. To calculate LD, we tallied pairwise r2 for SNP pairs for all GME and control populations using the Plink “r2” option 43. Correlations

between all SNPs falling within each sliding-window of 70 kilobase (kb) were calculated

with no lower limit on r2 values. Pairwise correlations were binned by genomic distance between SNPs (up to 70kb), and averages calculated for each bin. Control samples followed

expected patterns of LD decay.

3.5 Estimation of inbreeding—The inbreeding coefficient of an individual (F) was used to represent the probability that two randomly chosen alleles at a homologous locus within

an individual were identical by descent (IBD) with respect to a base reference population in

which all alleles were independent. While the true inbreeding coefficient of an individual is

often unknown, several estimation methods have been shown to give a reasonable estimate.

F estimates were calculated using the Plink “het” algorithm on LD pruned variants following

authors guidelines 43. We compared results to the HMM algorithm Festim 51 and found the

Scott et al. Page 9

Nat Genet. Author manuscript; available in PMC 2017 March 01.

H H

M I A

u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t

two estimates were very similar (Pearson’s r: 0.874) but frequently Festim failed to return

results for samples with missing data. Negative F values were most likely the result of either

biased variant sampling, a high-degree interracial marriage, or due to recent intermixing of

previously disparate populations8.

3.6 Runs of homozygosity (ROH) estimation—To infer estimates of the autozygosity and relative recent population size, we estimated runs of homozygosity using the HMM

algorithm H3M2 52. H3M2 was run directly on aligned BAM files, following authors

recommendations for all parameters. Proportion of genome and exome falling within ROH

was calculated for each sampling using BedTools. ROH length classes were based on

published ranges 23, where the authors used machine learning to identify three ROH classes

including: Short (<0.515 Mb), Medium (0.156–1.606 Mb), and Long (>1.607 Mb). We

compared densities of ROH lengths from internal data and found a near identical distribution

as the published values used to identify these classes.

4. Variant Annotation and Classification

4.1 Variant annotation—Functional annotation was performed for genetic purging and loss of function analyses. Variants were annotated using the ANNOVAR suite of scripts

(version 2014Nov12) 53. ANNOVAR classified variants into eight coding region functional

groups including: “frameshift_deletion”, “frameshift_insertion”, “nonframeshift_deletion”,

“nonframeshift_insertion”, “nonsynonymous_SNV”, “stopgain”, “stoploss”, and

“synonymous_SNV”. Non-coding variants are classified as “unknown”. Splicing defects

were identified based on 2 base pair distance from the splice junction, either on the intronic

or exonic side. A predicted deleteriousness classification was generated for each missense

variant using PolyPhen-2 54. The functional designations for PolyPhen-2 include: B

(Benign), P (Possibly Damaging), D (Probably Damaging). We compared these annotations

to those generated by SNPEff 55, and while there were some differences, found distributions

of calls from each sample to be consistent.

4.2 Ancestral allele identification—We used the Chimpanzee genome as the closest assembled out-group genome. Ancestral allele estimates were obtained by UCSC pairwise

alignments between human reference hg19 and chimp references PanTro2 and PanTro4.

Systematic lookups for all GME and 1000G variants were performed using UCSC Genome

Browser tools and custom scripts to identify associated chimpanzee alleles. We compared

PanTro2 and PanTro4 to assess the difference in correcting the apparent reference bias, but

found both worked equally well.

Estimated ancestral alleles were used as the reference allele to calculate derived allele

frequencies (DAF). DAFs were not calculated for variants where the ancestral allele was not

present in the human germline.

4.3 Identity-by-state (IBS) distance to reference—To interrogate the potential biases that might result from reference selection we calculate the IBS distance between samples

and multiple different references including hg19, and chimpanzee. The distance represents

Scott et al. Page 10

Nat Genet. Author manuscript; available in PMC 2017 March 01.

H H

M I A

u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t

the proportion of positions that diverge from reference, and was calculated between all pairs

of samples and references.

The IBS distance, d, represented the number of differing alleles between the two samples divided by the total number of alleles compared. More formally, d, between the two n-length vectors p, q (in our case where p is the reference sample and q is the sample being compared) in a vector space v, where v ={0,1,2} encoding the homozygote for the human reference allele, the heterozygote, and the homozygote for the alternate allele, respectively.

For any two samples, we calculate d as:

where (p,q) are vectors such that p = (p1, p2,…, pn)and q = (q1,q2,…,qn)

Each vector represented all genotype calls between the two samples, excluding filtered sites

or missing positions.

The IBS distance was calculated for all GME and 1000G samples against the hg19 and

chimpanzee reference genomes. All genotypes from the merged VCF file were coded based

on a comparison to the hg19 reference. Variant positions were filtered to remove indels, due

to the possibility of alignment errors, and non-biallelic sites. When comparing to hg19,

vector p was represented by a vector of zeros.

4.4 Hereditary Spastic Paraplegia (HSP) candidate variant analysis—Samples from 20 consanguineous families displaying an autosomal recessive inheritance pattern of

HSP were selected from a previously analyzed cohort 32, selected from a total cohort of 55

families because in these 20 there was a single genetic causes identified. Families were

analyzed in adherence to published methods 32. Briefly, homozygous variants were filtered

based on family structure to ensure variants segregated with the disease phenotype. We

performed deleteriousness filtering using functional classes and GERP++ scores 56. All

candidate variants were potentially LOF (frameshift, stop, or perturbing splicing) or a coding

variant with a GERP score >4.

The maximum allele frequency for candidate variants were based on established rates of

disease prevalence, estimated at 1:10,000 for clinical presentations classified of HSP 57.

Approximately 50% of HSP is autosomal dominant, and of the remaining, about 50% is

explained by mutations by SPG11 58, leaving only 1:40,000 with recessive HSP caused by

other genes. At least 35 other genes are reported to cause recessive HSP. Thus, the

contribution to HSP disease prevalence for any given gene is unlikely to be more than

1:1,000,000. While prevalence of HSP mutations is not expected to be uniform, we expect

the maximum carrier frequency for any new causal variant to be no more than 1:1000,

assuming full penetrance and a classic recessive inheritance, and is actuality is likely to be

much rarer given allelic diversity.

Scott et al. Page 11

Nat Genet. Author manuscript; available in PMC 2017 March 01.

H H

M I A

u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t

With roughly 1000 individuals in our cohort, we calculated that variants with DAFs <1:1000

should not be observed commonly in our dataset, AFs <1:500 should be not be observed in

more than 1 individual, AFs <1:333 in not more than 2 individuals and AFs <1:250 in not

more than 3 individuals. Variants passing deleteriousness and allele frequency thresholds

were treated as candidates to calculate the usefulness of the Variome to limit the number of

deleterious variants considered as candidates.

5. Testing for the Influence of Genetic Purging

Consanguinity has been practiced in the GME for at least several centuries 59. Simulations

of GME like populations have found sufficient time has past for purging to have been

effective in reducing genetic load 29. Clinical studies aimed at comparing clinical rates of

birth defect rates, premature births or miscarriages, between communities that practice

consanguinity to those largely out-breeding populations have found all metrics have fallen

within range for the rate of immediate form of consanguinity 21,60,61. More recent genetics

studies investigating differential selective pressure across human populations focused on the

role of population bottlenecks, neglecting the potential influence of consanguinity, and

lacked representation from the GME 31,62. For these reasons, we sought to investigate the

possibility that genetic purging has influenced variant burden in the GME.

In order to approach the question of variable selective pressure across human populations,

we implemented a variation of the DAF comparison method 31. We assumed that any change

in the efficacy of natural selection should be evident across populations in the mean DAF

within each variant classes.

For all variants described across the GME and 1000G populations, we filtered for high

quality calls, identified ancestral alleles (described in “Ancestral Allele” section), annotated

variants for predicted function and PolyPhen-2 classes using ANNOVAR, down-sampled to

achieve an equivalent numbers of chromosomes across populations, and calculated DAFs for

all positions. Variants were grouped by class, and the DAF means were calculated for each

population. Standard-errors were calculated by bootstrapping DAF means for 1000

iterations.

Recent studies using PolyPhen-2 demonstrated a deflation of deleteriousness scores for

derived variants found in the hg19 reference, likely due to a training artifact 31,62. Before

using PolyPhen-2 classes, this bias was corrected for all derived reference positions. Bias

correction was implemented by grouping variants by DAF bins, and calculating the

proportions of each PolyPhen-2 class per bin for ancestral reference positions. Using these

proportions as expectations, and all derived reference positions were randomly reassigned a

new PolyPhen-2 class based on a hypergeometric distribution within each DAF bin. DAF

means across classes for all included 1000G and GME populations showed no deviation

outside the standard-error for any two populations61.

6. Neanderthal and Denisovan Introgression Analysis

Neanderthal-derived variants are often subjected to strong negative selection, thereby

making exome analysis inadequate for estimating age of introgression. Thus we calculated

the proportion observed between extant populations 15,63.

Scott et al. Page 12

Nat Genet. Author manuscript; available in PMC 2017 March 01.

H H

M I A

u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t

To estimate introgression in exome samples, we identified aligned consensus calls for all

human variant positions from the chimpanzee, Neanderthal and Denisovan reference

genomes. Alignments of Neanderthal and Denisovan genomes to 1000G variant positions

were downloaded from the Max Planck Institute for Evolutionary Anthropology FTP 15,64.

Neanderthal and Denisovan alleles were identified from the hg19-ancestor alignment files.

Chimpanzee alleles were identified as described in the “ancestral allele” section of these

methods.

We projected GME and 1000G control populations on the principal components calculated

using representative samples from Neanderthal, Denisova, and chimpanzee 16,65,66, and

aligned the human samples to these ancestral populations. Principal components were

computed using R’s “prcomp” function (see web resources), and projected vectors were

calculated for all 1000G and GME samples. Distance from the re-adjusted origin to each

species reflected the proportion of introgression observed in each sample. The limited

number of SNPs that were examined in this analysis compared to similar genotype-based

analysis likely inflated the sampling variance within populations, and limited the sensitivity

of our analysis to smaller introgression proportions. Centroids for all populations were

labeled with their abbreviated names. Similar to previous work, Europeans, East Asians, and

GME populations overlapped, and demonstrated larger proportions of Neanderthal than

African populations 65–67.

Supplementary Material

Refer to Web version on PubMed Central for supplementary material.

Acknowledgments

The authors thank Shamil Sunyaev and David Reich for help with PolyPhen-2 and DAF corrections, Michael Turchin for help with purging analysis, Joseph Pickrell for help with TreeMix, Vineet Bafna, Nicholas Schork, Stefano Bonissone for suggestions. Work was supported by grants from the National Institutes of Health (P01HD070494, R01NS048453), Qatari National Research Foundation (NPRP6-1463), Simons Foundation Autism Research Initiative (175303 and 275275) to JGG, the Yale Center for Mendelian Disorders (U54HG006504), the Broad Institute (U54HG003067), The Rockefeller University CTSA (5UL1RR024143-04), the Howard Hughes Medical Institute (to JGG and J-LC), Institut National de la Santé et de la Recherche Médicale, the St. Giles Foundation, and the Candidoser Association, R01AI088364, R37AI095983, P01AI061093, U01AI109697 (to J- LC), U01AI088685 to J-LC and LA, R21AI107508 (to E. Jouanguy), DHFMR Collaborative Research Grant and KACST 13-BIO1113-20 (to FSA).

Greater Middle Eastern Variome Consortium

Sohair Abdel Rahim, Sawsan Abdel-Hadi, Ghada Abdel-Salam, Ekram Abdel-Salam, Mohammed Abdou, Avinash Abhytankar, Parisa Adimi, Jamil Ahmad, Mustafa Akcakus, Guside Aksu, Sami Al Hajjar, Suliman Al Juamaah, Saleh Al Muhsen, Nouriya Al Sannaa, Salem Al Tameni, Jumana Al-Aama, Nasir Al-Allawi, Raidah Al-Baradie, Lihadh Al-Gazali, Amal Al-Hashem, Waleed Al-Herz, Deema Al-Jeaid, Asma Al-Tawari, Abdullah Alangari, Alexandre Alcais, Tariq S AlFawaz, Zobaida Alsum, Aomar Ammar-Khodja, Sepideh Amouian, Cigdem Arikan, Omid Aryani, Ayca Aslanger, Cigdem Aydogmus, Caner Aytekin, Matloob Azam, Boglarka Bansagi, Mohamed- Rhida Barbouche, Laila Bastaki, Tawfeg Ben-Omran, PS Bindu, Lizbeth Blancas, Stéphanie Boisson-Dupuis, Damien Bonnet, Omar Boudghene Stambouli, Aziz Bousfiha, Lobna Boussafara, Jeannette Boutros, Jacinta Bustamante, Huseyin Caksen, Yildiz Camcioglu, Emilie Catherinot, Fatma C Celik, Michael Ciancanelli, Funda E Cipe, Gary Clark, Aurélie Cobat, Sinan Comu, Angela Condie, Antonio Condino-Neto, Mukesh Desai, William Dobyns, Figen Dogu, Mohamed Domaia, Meltem Dorum, Odul Egritas, Safa El Azbaoui, Jamila El Baghdadi, Mona El Ruby, Ashraf El-Harouni, Reem A Elfeky, Gehad Elghazali, Eissa Faqeih, Elif Fenerci, Claire Fieschi, Cipe Funda, Iman Gamal, Umit Gelik, Fetah Genel, Alper Gezdirici, KM Girisha, Amy Goldstein, Padraic Grattan- Smith, Neerja Gupta, Jin Hahn, Nevin Hatipoglu, Raoul Hennekam, Massoud Houshmand, Philippe Ichai, Aydan Ikinciogullari, Samira Ismail, Chaim Jalas, Emmanuelle Jouanguy, Madhulika Kabra, Göknur Kalkan, Majdi Kara,

Scott et al. Page 13

Nat Genet. Author manuscript; available in PMC 2017 March 01.

H H

M I A

u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t

Neslihan Karaca, Kadri Karaer, Ariana Kariminejad, Hulya Kayserili, Melike Keser-Emiroglu, Sara S Kilic, Najib Kissani, Cristina Kokron, Roshan Koul, Necil Kutukculer, Fanny Lanternier, Alireza Mahdaviani, Nizar Malhaoui, Lobna Mansour, Davood Mansouri, Lucia Margari, Enza Maria Valente, Naima Marzouki, Amira Masri, Amina Megahed, Hisham Megahed, Najla Mekki, Mehrnaz Mesdaghi, Mohd Mikati, Faezeh Mojahedi, John Mulley, Sheela Nampoothiri, Carmen Navarrete, Tarek Omar, Azza Oraby, Ayse Pandaluz, Nima Parvaneh, Turkan Patiroglu, Zeynep Peker Koc, Isabelle Pellier, Capucine Picard, Anne Puel, Annick Raas-Rothschild, Anna Rajab, Didier Raoult, Ismail Reisli, Nima Rezaei, Ayoub Sabri, Yasin Sahin, Laila Saleem, Fadia Salem, Najla Sameer AlSediq, Ozden Sanal, Terry Sanger, Hanan Shakankiry, Lei Shang, Nabil Shehata, Nuri Shembesh, Vared Shkalim, Ameen Softah, Sameera Sogaty, Neveen Soliman, Fatma Sonmez-Aunaci, Laszlo Sztriha, Lynda Taibi-Berrah, Samia Temtamy, Hasan Tonekaboni, Doris Trauner, Beyhan Tuysuz, Beyhan Tuysuz, Ali Varan, Guillaume Vogt, Christopher Walsh, Geoffrey Woods, Gozde Yesil, Alisan Yildiran, Basak Yildiz, Adnan Yuksel, Maha Zaki, Shen- Ying Zhang

References

1. Anwar WA, Khyatti M, Hemminki K. Consanguinity and genetic diseases in North Africa and immigrants to Europe. Eur J Public Health. 2014; 24(Suppl 1):57–63. [PubMed: 25107999]

2. Al-Gazali L, Hamamy H, Al-Arrayad S. Genetic disorders in the Arab world. British Med J. 2006; 333:831–4.

3. Hussain R, Bittles AH. The prevalence and demographic characteristics of consanguineous marriages in Pakistan. J Biosoc Sci. 1998; 30:261–75. [PubMed: 9746828]

4. Sheffield VC, Stone EM, Carmi R. Use of isolated inbred human populations for identification of disease genes. Trends Genet. 1998; 14:391–6. [PubMed: 9820027]

5. Sharp, JM. The Broader Middle East and North Africa Initiative: An overview. CRS Report for Congress; 2005.

6. Hellenthal G, et al. A genetic atlas of human admixture history. Science. 2014; 343:747–51. [PubMed: 24531965]

7. Ravindranath V, et al. Regional research priorities in brain and nervous system disorders. Nature. 2015; 527:S198–206. [PubMed: 26580328]

8. Hunter-Zinck H, et al. Population genetic structure of the people of Qatar. Am J Hum Genet. 2010; 87:17–25. [PubMed: 20579625]

9. Consortium GP, et al. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012; 491:56–65. [PubMed: 23128226]

10. Moreno-Estrada A, et al. Reconstructing the population genetic history of the Caribbean. PLoS Genet. 2013; 9:e1003925. [PubMed: 24244192]

11. Botigue LR, et al. Gene flow from North Africa contributes to differential human genetic diversity in southern Europe. Proc Natl Acad Sci U S A. 2013; 110:11791–6. [PubMed: 23733930]

12. Li JZ, et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science. 2008; 319:1100–4. [PubMed: 18292342]

13. Henn BM, et al. Genomic ancestry of North Africans supports back-to-Africa migrations. PLoS Genet. 2012; 8:e1002397. [PubMed: 22253600]

14. Gerard N, Berriche S, Aouizerate A, Dieterlen F, Lucotte G. North African Berber and Arab influences in the western Mediterranean revealed by Y-chromosome DNA haplotypes. Hum Biol. 2006; 78:307–16. [PubMed: 17216803]

15. Green RE, et al. A draft sequence of the Neandertal genome. Science. 2010; 328:710–22. [PubMed: 20448178]

16. Sankararaman S, et al. The genomic landscape of Neanderthal ancestry in present-day humans. Nature. 2014; 507:354–7. [PubMed: 24476815]

17. Consortium STD, et al. Sequence variants in SLC16A11 are a common risk factor for type 2 diabetes in Mexico. Nature. 2014; 506:97–101. [PubMed: 24390345]

18. Pickrell JK, Pritchard JK. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 2012; 8:e1002967. [PubMed: 23166502]

19. Tadmouri GO, et al. Consanguinity and reproductive health among Arabs. Reprod Health. 2009; 6:17. [PubMed: 19811666]

Scott et al. Page 14

Nat Genet. Author manuscript; available in PMC 2017 March 01.

H H

M I A

u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t

20. Leutenegger AL, Sahbatou M, Gazal S, Cann H, Genin E. Consanguinity around the world: what do the genomic data of the HGDP-CEPH diversity panel tell us? Eur J Hum Genet. 2011; 19:583– 7. [PubMed: 21364699]

21. Bittles, AH.; Black, ML. Global patterns and tables of consanguinity. 2014. <http://consang.net>

22. Pippucci T, Magi A, Gialluisi A, Romeo G. Detection of runs of homozygosity from whole exome sequencing data: state of the art and perspectives for clinical, population and epidemiological studies. Hum Hered. 2014; 77:63–72. [PubMed: 25060270]

23. Pemberton TJ, et al. Genomic patterns of homozygosity in worldwide human populations. Am J Hum Genet. 2012; 91:275–92. [PubMed: 22883143]

24. Szpiech ZA, et al. Long runs of homozygosity are enriched for deleterious variation. Am J Hum Genet. 2013; 93:90–102. [PubMed: 23746547]

25. MacArthur DG, et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science. 2012; 335:823–8. [PubMed: 22344438]

26. Sulem P, et al. Identification of a large set of rare complete human knockouts. Nat Genet. 2015; 47:448–52. [PubMed: 25807282]

27. Jones, S. The Darwin Archipelago. Yale University Press; New Haven: 2011.

28. Haldane JBS. The effect of variation of fitness. Am Nat. 1937; 71:337–349.

29. Overall AD, Ahmad M, Nichols RA. The effect of reproductive compensation on recessive disorders within consanguineous human populations. Heredity. 2002; 88:474–9. [PubMed: 12180090]

30. Neale BM, et al. Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature. 2012; 485:242–5. [PubMed: 22495311]

31. Simons YB, Turchin MC, Pritchard JK, Sella G. The deleterious mutation load is insensitive to recent population history. Nat Genet. 2014; 46:220–4. [PubMed: 24509481]

32. Novarino G, et al. Exome sequencing links corticospinal motor neuron disease to common neurodegenerative disorders. Science. 2014; 343:506–11. [PubMed: 24482476]

33. Blackstone C, O’Kane CJ, Reid E. Hereditary spastic paraplegias: membrane traffic and the motor pathway. Nat Rev Neurosci. 2011; 12:31–42. [PubMed: 21139634]

34. MacArthur DG, et al. Guidelines for investigating causality of sequence variants in human disease. Nature. 2014; 508:469–76. [PubMed: 24759409]

35. Dixon-Salazar TJ, et al. Exome sequencing can improve diagnosis and alter patient management. Sci Transl Med. 2012; 4:138ra78.

36. Okada S, et al. IMMUNODEFICIENCIES. Impairment of immunity to Candida and Mycobacterium in humans with bi-allelic RORC mutations. Science. 2015; 349:606–13. [PubMed: 26160376]

37. Alsalem AB, Halees AS, Anazi S, Alshamekh S, Alkuraya FS. Autozygome sequencing expands the horizon of human knockout research and provides novel insights into human phenotypic variation. PLoS Genet. 2013; 9:e1004030. [PubMed: 24367280]

38. DePristo MA, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011; 43:491–8. [PubMed: 21478889]

39. Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010; 26:589–95. [PubMed: 20080505]

40. Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet. 2006; 2:e190. [PubMed: 17194218]

41. Manichaikul A, et al. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010; 26:2867–73. [PubMed: 20926424]

42. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010; 26:841–2. [PubMed: 20110278]

43. Purcell S, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007; 81:559–75. [PubMed: 17701901]

44. Cann HM, et al. A human genome diversity cell line panel. Science. 2002; 296:261–2. [PubMed: 11954565]

Scott et al. Page 15

Nat Genet. Author manuscript; available in PMC 2017 March 01.

H H

M I A

u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t

45. Behar DM, et al. The genome-wide structure of the Jewish people. Nature. 2010; 466:238–42. [PubMed: 20531471]

46. Danecek P, et al. The variant call format and VCFtools. Bioinformatics. 2011; 27:2156–8. [PubMed: 21653522]

47. Pruitt KD, et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014; 42:D756–63. [PubMed: 24259432]

48. Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009; 19:1655–64. [PubMed: 19648217]

49. Price AL, et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006; 38:904–9. [PubMed: 16862161]

50. Wickham, H. ggplot2: Elegant graphics for data analysis. Springer Science & Business Media; 2009.

51. Polasek O, et al. Comparative assessment of methods for estimating individual genome-wide homozygosity-by-descent from human genomic data. BMC Genomics. 2010; 11:139. [PubMed: 20184767]

52. Magi A, et al. H3M2: detection of runs of homozygosity from whole-exome sequencing data. Bioinformatics. 2014; 30:2852–9. [PubMed: 24966365]

53. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high- throughput sequencing data. Nucleic Acids Res. 2010; 38:e164. [PubMed: 20601685]

54. Adzhubei IA, et al. A method and server for predicting damaging missense mutations. Nat Methods. 2010; 7:248–9. [PubMed: 20354512]

55. Cingolani P, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin). 2012; 6:80–92. [PubMed: 22728672]

56. Davydov EV, et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++ PLoS Comput Biol. 2010; 6:e1001025. [PubMed: 21152010]

57. Erichsen AK, Koht J, Stray-Pedersen A, Abdelnoor M, Tallaksen CM. Prevalence of hereditary ataxia and spastic paraplegia in southeast Norway: a population-based study. Brain. 2009; 132:1577–88. [PubMed: 19339254]

58. Stevanin G, et al. Mutations in SPG11 are frequent in autosomal recessive spastic paraplegia with thin corpus callosum, cognitive decline and lower motor neuron degeneration. Brain. 2008; 131:772–84. [PubMed: 18079167]

59. Vardi-Saliternik R, Friedlander Y, Cohen T. Consanguinity in a population sample of Israeli Muslim Arabs, Christian Arabs and Druze. Ann Hum Biol. 2002; 29:422–31. [PubMed: 12160475]

60. Shami SA, Qaisar R, Bittles AH. Consanguinity and adult morbidity in Pakistan. Lancet. 1991; 338:954. [PubMed: 1681304]

61. Stoltenberg C, Magnus P, Lie RT, Daltveit AK, Irgens LM. Birth defects and parental consanguinity in Norway. Am J Epidemiol. 1997; 145:439–48. [PubMed: 9048518]

62. Do R, et al. No evidence that selection has been less effective at removing deleterious mutations in Europeans than in Africans. Nat Genet. 2015; 47:126–31. [PubMed: 25581429]

63. Consortium STD, et al. Association of a low-frequency variant in HNF1A with type 2 diabetes in a Latino population. JAMA. 2014; 311:2305–14. [PubMed: 24915262]

64. Meyer M, et al. A high-coverage genome sequence from an archaic Denisovan individual. Science. 2012; 338:222–6. [PubMed: 22936568]

65. Huerta-Sanchez E, et al. Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA. Nature. 2014; 512:194–7. [PubMed: 25043035]

66. Wang S, Lachance J, Tishkoff SA, Hey J, Xing J. Apparent variation in Neanderthal admixture among African populations is consistent with gene flow from Non-African populations. Genome Biol Evol. 2013; 5:2075–81. [PubMed: 24162011]

67. Lowery RK, et al. Neanderthal and Denisova genetic affinities with contemporary humans: introgression versus common ancestral polymorphisms. Gene. 2013; 530:83–94. [PubMed: 23872234]

Scott et al. Page 16

Nat Genet. Author manuscript; available in PMC 2017 March 01.

H H

M I A

u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t

Figure 1. Greater Middle East Variome as a hub of human genetics a. Map of GME sub-regions. Lines define borders for admixture analysis from East Asia, Europe, Sub-Saharan Africa and the novel GME contribution (NWA: Northwest Africa,

NEA: Northeast Africa, TP: Turkish Peninsula, SD: Syrian Desert, AP: Arabian Peninsula,

PP: Persia and Pakistan). Pie charts: admixture proportions of 1000 Genomes Project

(1000G) continental populations according to K=6 clusters.

b. Global ancestry proportions (K=6) for 1000G control populations with three distinct sources of contribution. 1000G population contributions: Africa (red), Europe (green) and

Scott et al. Page 17

Nat Genet. Author manuscript; available in PMC 2017 March 01.

H H

M I A

u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t

East Asia (blue). GME populations from west to east: NWA (purple), AP (orange), and PP

(yellow) derived from the GME.

c. TreeMix phylogeny of GME along with 1000G controls representing population divergence patterns. Length of the branch proportional to population drift. GME populations

grouped around the African branch, but showed a substantial divergence. YRI: Yoruba in

Ibadan, LWK: Luhya in Webuye Kenya, FIN: Finnish, GBR: Great Britain, TSI: Toscani,

CHS: Southern Han Chinese, CHB: Han Chinese in Beijing, JPT: Japanese in Tokyo.

d. Wright’s Fixation Index (Fst) values for all pairs of GME and 1000G European populations, showing a smaller distance between GME and European populations compared

with Sub-Saharan African populations. Greatest Fst value between any two GME

populations was 0.026 (i.e. a quarter of the distance between FIN and JPT).

Scott et al. Page 18

Nat Genet. Author manuscript; available in PMC 2017 March 01.

H H

M I A

u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t

Figure 2. Wide diversity and high inbreeding coefficients in GME substructure a. Principal component analysis (PCA) for individuals from GME and 1000G populations. Individuals projected along PC3 and PC4 axes. Persia and Pakistan (PP), Northwest Africa

(NWA) and Europe defined the limits from right, left, and top, as coinciding with geography.

Arab Peninsula (AP) defined the bottom limit, and was closest to Northeast Africa (NEA)

and Syrian Desert (SD).

b. GME populations had increased rates of linkage disequilibrium decay compared to 1000G European and East Asian populations. Mean variant correlations (r2) shown for each 1,000

basepair (bp) bin from 1,000–70,000 bp.

c. Inbreeding coefficient (F) distributions for GME and 1000G populations. GME populations (purple) showed elevated F values, consistent with increased rates of

consanguineous marriages. Box plots show median (horizontal line), 25%ile (45° angle),

75%ile (90° angle), minimum and maximum observations (whiskers).

d. F distributions for family structures for GME and European American (EA) trios. Mean F values correlated with expected for consanguineous offspring. Unk=unknown.

Scott et al. Page 19

Nat Genet. Author manuscript; available in PMC 2017 March 01.

H H

M I A

u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t

Figure 3. Distributions of short and long Runs of Homozygosity (ROH) correlates with patterns of bottlenecks and recent consanguinity a. Sample burdens of ROH grouped by length (Short: <0.155 Mb, Medium: 0.156–1.606 Mb, Long: >1.607 Mb). GME samples (purple) showed a unique contribution of long ROH

compared with other populations (*), with less in short and medium bins compared to

Europe and East Asia. Total ROH in GME sub-regions overlapped with European and East

Asian likely due to greater bottlenecks in these populations.

b. Histograms of long ROH for GME, Africa, Europe, and East Asia. GME samples more frequently harbored runs >4 Mb compared to other populations. ROH >15 Mb are binned

together (* peak unique to Middle East).

c. Longer GME ROH spans were enriched for rare variation, while shorter runs were enriched for more common variation. Proportion of variants binned by allele frequency for

different sized ROH, binned by 0.5 Mb intervals. Probability density function calculated for

each allele frequency class. Note that AFs for common alleles declined whereas AFs for rare

and very rare alleles rose as ROH increased in size (Common: AF > .05, Rare: AF 0.05–

0.01, Very Rare: AF < 0.01).

Scott et al. Page 20

Nat Genet. Author manuscript; available in PMC 2017 March 01.

H H

M I A

u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t

Figure 4. GME Variome facilitates the discovery of Mendelian disease genes a–b. Comparison of rare derived allele frequencies (DAF) between GME and Exome Sequencing Project (ESP). AA: African American, EA: European-American. Hexagonal

bins shaded by log number of variants within each bin. Pearson’s r suggests GME DAFs

were not accurately estimated by AA or EA populations.

b. The majority of variants in the rarest DAF bins were unique to the GME. AA: found only in GME and AA. EA: found only in GME and EA. All: found in GME, EA and AA. GME

Unique: found only in GME.

Scott et al. Page 21

Nat Genet. Author manuscript; available in PMC 2017 March 01.

H H

M I A

u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t

c. Change in per-individual burden of eight variant classes as a function of increasing the number of individuals incorporated into the GME Variome cohort. As sample size increased

there was a drop in the number of unique variants, along with more accurate estimation of

DAFs for rare variants. Bootstraps were sampled with replacement for 100 iterations to

calculate standard errors. “High impact”: variants meeting predicted deleteriousness

thresholds (see Methods).

d. Number of candidate variants for 20 families, meeting segregation and deleteriousness filtering criteria, using DAFs derived from Hereditary Spastic Paraplegia (HSP)-only

families (top) or also incorporating the GME Variome (bottom). Single, Duo, Trio: families

with one, two or three affected members. Colors: number of individuals sharing the variant.

“0”: no other individuals carried the allele, etc. Analysis was performed using this threshold

for the number of individuals sharing alleles (0,1,2,3). Note drop in number of segregating

variants for any given family after the GME Variome was applied.

Scott et al. Page 22

Nat Genet. Author manuscript; available in PMC 2017 March 01.

H H

M I A

u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t H

H M

I A u th

o r M

a n u scrip

t

  • Abstract
  • Online Methods
    • 1. Definition of the Greater Middle East
    • 2. Exome Resequencing
      • 2.1 Study sample
      • 2.2 Exome resequencing, variant calling, and filtering
      • 2.3 Geographic region assignment
    • 3. Population Structure of GME
      • 3.1 Data integration
      • 3.2 Substructure analysis
      • 3.3 PCA and Wright’s Fixation Index (Fst)
      • 3.4 LD decay
      • 3.5 Estimation of inbreeding
      • 3.6 Runs of homozygosity (ROH) estimation
    • 4. Variant Annotation and Classification
      • 4.1 Variant annotation
      • 4.2 Ancestral allele identification
      • 4.3 Identity-by-state (IBS) distance to reference
      • 4.4 Hereditary Spastic Paraplegia (HSP) candidate variant analysis
    • 5. Testing for the Influence of Genetic Purging
    • 6. Neanderthal and Denisovan Introgression Analysis
  • References
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4

40246_2018_Article_152.pdf

PRIMARY RESEARCH Open Access

Molecular characterization of exonic rearrangements and frame shifts in the dystrophin gene in Duchenne muscular dystrophy patients in a Saudi community Nasser A. Elhawary1,2*, Essam H. Jiffri3, Samira Jambi4, Ahmad H. Mufti1, Anas Dannoun1, Hassan Kordi1, Asim Khogeer5, Osama H. Jiffri3, Abdelrahman N. Elhawary6 and Mohammed T. Tayeb1

Abstract

Background: In individuals with Duchenne muscular dystrophy (DMD), exon skipping treatment to restore a wild- type phenotype or correct the frame shift of the mRNA transcript of the dystrophin (DMD) gene are mutation- specific. To explore the molecular characterization of DMD rearrangements and predict the reading frame, we simultaneously screened all 79 DMD gene exons of 45 unrelated male DMD patients using a multiplex ligation- dependent probe amplification (MLPA) assay for deletion/duplication patterns. Multiplex PCR was used to confirm single deletions detected by the MLPA.

Results: There was an obvious diagnostic delay, with an extremely statistically significant difference between the age at initial symptoms and the age of clinical evaluation of DMD cases (t value, 10.3; 95% confidence interval 5.95– 8.80, P < 0.0001); the mean difference between the two groups was 7.4 years. Overall, we identified 147 intragenic rearrangements: 46.3% deletions and 53.7% duplications. Most of the deletions (92.5%) were between exons 44 and 56, with exon 50 being the most frequently involved (19.1%). Eight new rearrangements, including a mixed deletion/duplication and double duplications, were linked to seven cases with DMD. Of all the cases, 17.8% had duplications with no hot spots. In addition, confirmation of the reading frame hypothesis helped account for new DMD rearrangements in this study. We found that 81% of our Saudi patients would potentially benefit from exon skipping, of which 42.9% had a mutation amenable to skipping of exon 51.

Conclusions: Our study could generate considerable data on mutational rearrangements that may promote future experimental therapies in Saudi Arabia.

Keywords: Duchenne muscular dystrophy, Dystrophin gene, Large rearrangements, Frame shift, MLPA, Saudi community

Background Dystrophinopathies are the most common form of muscu- lar dystrophy in childhood. They are caused by mutations in the dystrophin gene (DMD; OMIM #300377) [1, 2]. Du- chenne muscular dystrophy (DMD; OMIM #310200) is a severe form of muscular dystrophy, with an incidence of 1

in 3600–5000 male births [3]. Becker muscular dystrophy (BMD) is a milder form of DMD, with an incidence of 1 in 20,000 male births (BMD; OMIM # 300376) [4]. DMD is characterized by rapidly progressive degener-

ation and necrosis of the proximal muscles and calf pseudo-hypertrophy. Most DMD patients show muscle weakness at age 2 or 3, but it may be seen as early as in- fancy. Patients commonly lose independent ambulation by the age of 12 and die of dilated cardiomyopathy around the second or third decade. In comparison, pa- tients with BMD exhibit relatively minor pathological

* Correspondence: [email protected] 1Department of Medical Genetics, Medicine College, Umm Al-Qura University, P.O. Box 57543, Mecca 21955, Saudi Arabia 2Department of Molecular Genetics, Faculty of Medicine, Ain Shams University, Cairo 11566, Egypt Full list of author information is available at the end of the article

© The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Elhawary et al. Human Genomics (2018) 12:18 https://doi.org/10.1186/s40246-018-0152-8

symptoms, slower progression, later onset, and longer survival. Patients with an intermediate form of the dis- ease, intermediate muscular dystrophy (IMD), may con- tinue to walk until they are 16 years of age [4, 5]. The DMD gene is one of the largest known genes in

humans, with 79 exons (approximately 2.4 Mb of gen- omic DNA) [1] expressing a 427-kDa muscular protein that plays a fundamental role in stabilizing the sarco- lemma. It does so by using a complex of glycoproteins associated with dystrophin to link actin filaments within the cytoskeleton and the extracellular matrix. Lack of dystrophin breaks these connections, altering the plasma membrane and finally producing myofiber degeneration and necrosis [6]. Thus, according to the reading frame hypothesis [7], DMD mutations that destroy the reading frame result in a truncated, non-functional dystrophin protein associated with a “DMD” phenotype. These mu- tations frequently generate a premature stop codon that activates nonsense-mediated mRNA decay [8]. On the other hand, mutations that maintain the reading frame can permit semi-functional dystrophin protein and thus give rise to a “BMD” phenotype [4, 9]. Together, these two phenotype-genotype correlations explain more than 92% of all cases [7]. Different types of mutations have been reported in pa-

tients with DMD and BMD. These are mainly large rear- rangements (deletions in approximately 60–70% of patients and duplications in approximately 7–10%), with the remaining being point mutations (mainly nonsense mutations) and small deletions or insertions [10, 11]. Most gross deletions can be detected by multiplex PCR (mPCR) [12, 13] and are clustered in the proximal and central hot spot regions [14, 15]. Although a large pro- portion of the duplications were reported many years ago [16, 17], most laboratories do not systematically screen for these rearrangements. Duplication analysis and the determination of at-risk carrier status of the DMD gene require quantitative investigation, which is laborious and technically demanding [18, 19]. Previous studies have applied Southern blotting [16, 20], pulsed- field gel electrophoresis, quantitative mPCR [21–23], multiplex amplifiable probe hybridization [24], and com- parative genomic hybridization microarray [25]. Given that deletions and duplications of one or more

exons are found in the majority (70%) of patients, it is most cost-efficient and labor-efficient to check for these mutations first. A reliable and rapid technique, multiplex ligation-dependent probe amplification (MLPA), has been applied to cover the whole DMD gene to detect deletions and duplications and to identify exactly which exons are involved in deletions or duplications [18, 26–29]. This ap- proach reveals whether a given exon is present and allows the copy number of each exon to be calculated by com- paring relative peak heights. MLPA can detect both

deletions and duplications in patients as well as in female carriers. Compared to array comparative genomic hybridization, MLPA is a low-cost and technically uncom- plicated method. Although several studies have investigated exonic dele-

tions in different populations [11, 30–36], it is unknown where these deletions occur in the Saudi population. Al- though a few studies have described the molecular diag- nosis of DMD in Saudi patients, the large deletions associated with disease were examined in only some exons, and the studies were limited by small sample sizes [37, 38]. Molecular characterization of the large DMD gene has been proposed to address large intra- genic rearrangements in the whole exons of the DMD gene using an MLPA strategy covering nearly 75% of whole gene mutations. Thus, accurate molecular diagno- sis may provide information on eligibility for mutation- specific treatments. A plausible frame shift hypothesis suggests how one might reduce disease severity via exon skipping, for example, by correcting the fidelity of the translational reading frame with large DMD deletions or restoring the wild type with large DMD duplications. To our knowledge, the present study is the first study

using the MLPA strategy to identify genotype-phenotype correlations in DMD patients in a Saudi community. Our results will add valuable data on de novo mutations in this population and to the databases of different DMD web pages as well.

Methods Ethics statement and participants All participants were enrolled under a protocol approved by the Institutional Biomedical Ethics Committee at Umm Al-Qura University (ref. #HAPO-02-K-012). Par- ents of all participants gave written consent after being informed about the aim of the study. The study included 45 unrelated male patients with

DMD selected from 65 families from the western region of the Kingdom of Saudi Arabia (KSA), including Jeddah, Mecca, Taif, and Hada. Twenty additional eligible male pa- tients did not enroll because their parents refused to share their clinical data, their clinical profiles were incomplete, or their creatine phosphokinase assessments were missing. For each patient included in the study, a clinical data sheet was recorded in the database of the Molecular Genetics Laboratory in the Department of Medical Genetics at Umm Al-Qura University. Clinical information was inde- pendent of any molecular DNA data for the DMD gene or its protein. Dystrophin probands were diagnosed by clin- ical geneticists or pediatricians based on strict criteria in- cluding a clinical presentation expected for DMD, family history of X-linked muscular dystrophy, or muscle biopsy with a dystrophin analysis performed using immunohisto- chemistry. Clinical diagnosis of dystrophin probands

Elhawary et al. Human Genomics (2018) 12:18 Page 2 of 11

included age at onset, age at clinical evaluation, calf pseu- dohypertrophy, age at wheelchair confinement, cardiac function, and motor function. A histopathological study was performed before molecular DNA analysis if muscle biopsies were available. To avoid bias, we included only one case for each family. We categorized patients accord- ing to age at loss of ambulation: DMD ≤ 12 years, IMD 12–16 years, and BMD > 16 years. Cases with a family his- tory of autosomal recessive inheritance or with normal dystrophin protein were excluded.

DNA isolation Genomic DNA was isolated from buccal cells using the Oragene DNA-OGR-575 kit (DNA Genotek Inc., Ottawa, ON, Canada) according to the manufacturer’s protocol with some modifications. Briefly, the full buccal cells were collected within 30 min, and the Oragene tube was capped immediately. The cells were incubated with the OGR-lysis buffer in a water bath at 53 °C to release the DNA, which was then precipitated by ethanol and dissolved in elution buffer [39].

Multiplex polymerase chain reaction The genomic DNA of all DMD patients was subjected to multiplex PCR (mPCR) to screen for DMD deletions using 15 primer sets (Additional file 1: Table S1). The ol- igonucleotides included flanking sequences of exons 4, 8, 12, 17, 19, 44, 45, 48, and 51 [12] and of exons 6, 13, 47, 50, 52, and 60 [13]. We made some modifications to Chamberlain’s mPCR set by not adding dimethylsulfox- ide, which could result in a lower PCR yield. However, PCR cycling was programmed as initial denaturing at 95 °C for 6 min (1 round), then 94 °C for 30 s, annealing at 53 °C for 30 s, 65 °C for 4 min (repeated for 23 rounds), and final elongation at 65 °C for 7 min [12]. Hot-start mPCR was performed using Beggs’ PCR pro- gram [13]: 95 °C for 6 min (1 round) and 25 subsequent cycles including DNA denaturing at 95 °C for 30 s, an- nealing at 56 °C for 1 min, and elongation at 68 °C for 4 min. Amplification reactions were carried out on ther- mal cycler Engine Dyad (Bio-Rad Laboratories Inc., Hercules, CA). PCR products (10–15 μl) were separated on 3% NuSieve agarose (BMA Bioproducts, Rockland, ME). The gels were viewed using the Gel Documenta- tion and Analysis System (G-Box, SynGene, Frederick, MD, USA).

Multiplex ligation-dependent probe amplification We analyzed all DMD cases for large deletions and large duplications using MLPA SALSA P034/P035 DMD kits (http://www.mrc-holland.com) following the manufac- turer’s instructions. In brief, denaturation, hybridization, ligation, and amplification steps were performed on a DNA Engine Dyad thermal cycler (Bio-Rad Laboratories

Inc., Hercules, CA). Finally, PCR amplification was per- formed using SALSA MLPA PCR primers labeled with the FAM dye. A mixture of 0.7 μl of PCR product, 0.2 μl of 600 LIZ GS size-standard, and 9.0 μl of Hi-Di form- amide was incubated for 3 min at 86 °C and cooled at 4 °C for 2 min. The MLPA product mix was separated on a POP7 polymer (Applied Biosystems Inc., Life Tech- nologies, Foster City, CA) at 60 °C with the setting of 1. 6 kV for injection voltage, 18 s for injection time, 15 kV for run voltage, and 1800 s for run time.

Data analysis The raw data were analyzed using GeneMapper Software 5 (Applied Biosystems Inc., Life Technologies, Foster City, CA). The DNA of cases with single-exon deletions were re-examined using conventional PCR. Initial ana- lysis was performed with the naked eye to look for missed exon-specific peaks. For the remaining samples, the peak height of each exon was divided by the two nearest control peaks. The median ratio across all sam- ples for each peak was calculated and used as a reference for one copy. For the sake of accuracy, any normalized ra- tio below 0.3 was considered a possible deletion. A dupli- cation was considered if a normalized ratio was 1.8–2.0. If any single-exon deletion was identified, conventional PCR amplification was carried out to validate this deletion using primer sets and PCR conditions given in the Leiden Muscular Dystrophy pages (http://www.dmd.nl).

Databases and confirming mutations for the DMD gene We checked all mutations recorded in this study accord- ing to available databases established by the Leiden Mus- cular Dystrophy pages (http://www.dmd.nl) [40], the Leiden Open Variation Database 3.0 (http://www.lovd. nl/3.0/home) [41], and UMD-DMD (http://www.umd. be/DMD/) [31, 32]. Databases for exon skipping to re- store the DMD reading frame were found on sites devel- oped by Leiden University Medical Center (http://www. exonskipping.nl/?s=exon+skipping&submit=Go) and CureDuchenne (https://www.cureduchenne.org/cure/ edystrophin/).

Statistical analysis Hardy-Weinberg equilibrium (HWE) deviation was ex- amined for X-linked DMD cases in this study using the Online Encyclopedia for Genetic Epidemiology studies software (http://www.oege.org/software/hwe-mr-calc. shtml). We used the G*Power Software (http://www.psy- cho.uni-duesseldorf.de/abteilungen/aap/gpower3/down- load-and-register/) to estimate power analysis to determine adequate sample sizes to achieve an 80% power for t testing of point biserial model. “Priori” sam- ple size and “post hoc” power estimations were tested knowing our DMD sample size, a probability of α = 0.05,

Elhawary et al. Human Genomics (2018) 12:18 Page 3 of 11

and the effect size index “r” (the absolute value of the correlation coefficient in the population, 0 < “r” < 1). We used paired t test analysis to compare the significant dif- ference between the age at onset and the age of clinical evaluation for each DMD case. A two-sided P value less than 0.05 was considered to indicate statistical signifi- cance and 95% confidence interval (CI) for all analyses.

Results Clinical profile Among 45 unrelated patients, 21 were diagnosed with DMD, 10 with IMD, and 5 with BMD. The unassigned patients were defined as not determined (ND), as they were too young to permit a definitive diagnosis (n = 9). The median age at onset was 3.5 years (range 1.0–7. 0 years), while the median presenting age was 11.5 years (1.5–20 years) (Fig. 1). Most of the patients reported ini- tial symptoms between 1 and 3 years of age (71.1%, 32/ 45), followed by those reporting symptoms at 4–5 years (24.4%, 11/45). The age at clinical evaluation was most frequently between 10 and 12 years (35.6%, 16/45). We found an extremely statistically significant difference be- tween the age at initial symptoms and the age at clinical evaluation of DMD cases (t value, 10.3; 95% CI 5.95–8.8, P < 0.0001). The mean difference in age between the two groups was 7.4 years.

Hardy-Weinberg equilibrium All affected males were in HWE at the DMD gene dele- tions/duplications (χ2 = 1.00, P = 0.317), where the heterozygotes were absent in such X-linked recessive mode of inheritance.

Large-scale rearrangements Using mPCR, we identified 55 large deletions in the 45 unrelated DMD patients. MLPA detected 147 intragenic rearrangements, 68 (46.3%) of which were large

deletions and 79 (53.7%) of which were large duplica- tions. All deletions identified by mPCR were confirmed by the MLPA-based screening. The utility of MLPA assay for all exons is clear, as 13 (19%) of 68 deletions were detected using MLPA but were not detected by conventional mPCR analysis. The percentage of cases with deletions and duplications were 46.7% (21/45) and 17.8% (8/45), respectively. Table 1 includes the large re- arrangements that were identified in the present study and had been previously described.

New mutations in the DMD gene We also identified seven previously undescribed large DMD rearrangements from eight Saudi cases. These new mutations, based on the DMD databases [31, 32, 40, 41], included one mixed rearrangement (del 45–52 + dup 21–23), one large deletion (del 45–56), two large duplications (8–30 and 17–24), and three double dupli- cations (dup 2–4 + dup 18–19, dup 13 + dup 21–24, and dup 56–58 + dup 62–64). These large mutations were from eight (17.8%) of the Saudi patients (Table 2).

Distribution of rearrangements In this study, deletions did not have a random distribu- tion. We found that 92.5% (63/68) of hot spot deletions were linked to exons 44–56 (central region), whereas 7. 5% (5/68) of deletions were related to exons 10–20. Exon 50 was most frequently involved in deletions (19. 1%, 13/68), followed by exons 48 and 49 (each 11.8%, 8/ 68). The rate of deletions increased from a minimum in exon 44 to a maximum in exon 50 and then decreased until the 3′ end of the DMD gene, with no deletion in exons 57–79 (distal region) (Fig. 2). Moreover, we found that the number of cases with a deletion of only one exon was lower than the number with deletions of more than one exon (9/21, 42.8% versus 12/21, 57.1%). About half of the deletions (44.4%) were detected only once, in

Fig. 1 The age at onset and the age of clinical evaluation of DMD patients in this study. The analysis of DMD cases showed an apparent diagnostic delay

Elhawary et al. Human Genomics (2018) 12:18 Page 4 of 11

agreement with the high allelic heterogeneity of the DMD gene. Duplications were distributed in the proximal (68/79,

86.1%), central (5/79, 6.3%), and distal regions (6/79, 7. 6%) (Fig. 2). Unlike deletions, duplicated exons were more frequent in the proximal region (26 duplications) than in the central and distal regions (13 duplications). The most frequent duplications were of exons 21, 22, and 23 (7.6%, 6/79 each), followed by exons 18 and 19 (5.1%, 4/79 each). We did not find any duplications in exons 31–49 (central region) or exons 69–79 (distal re- gion) within our cases (Fig. 2). Similar to deletions, 42. 5% of exonic duplications (17/40) were observed only

once, revealing a considerable heterogeneity of duplications.

Reading frame shift and phenotype correlation Gene rearrangements (deletions and duplications) were correlated with clinical phenotypes in 28 unrelated cases: 11 (39.3%) with DMD, 7 (25%) with IMD, 3 (10.7%) with BMD, and 7 (25%) with ND. We also predicted the translational reading frame in 28 DMD cases with rear- rangements identified in this study, using the Leiden Muscular Dystrophy pages (http://www.dmd.nl). Apply- ing the reading frame rule revealed consistency with the frame shift rule for 90.9% (10/11) of the individuals with

Table 1 Previously described large rearrangements identified in this study and their reading frame shifts

Family no.

Phenotype Multiplex PCR

MLPA del/ dup

Exon(s) del/ dup

Codons del/ dup

Frame shift Amino acid changea cDNAa

DS-23 DMD No del Del 10–11 371 123 2/3 Stop at 323 p.His321PhefsX3 c.961_1331del

DS-1 DMD Del 19 Del 18–20 454 151 1/3 − 1 p.Arg723Lys874 c.2169_2622del

DS-37 IMD Del 44 Del 44 148 49 1/3 Stop at 2113

p.Arg2098AsnfsX16 c.6291_6438del

DS-34 ND Del 44–48 Del 44–48 808 269 1/3 − 1 p.Arg2098Gln2366del c.6291_7098del

DS-24 DMD Del 45 Del 45 176 58 2/3 Stop at 2163

p.Glu2147AlafsX17 c.6439_6614del

DS-8 IMD Del 45–50 Del 45–50 871 290 1/3 Stop at 2155

p.Glu2147LeufsX9 c.6439_7309del

DS-38 IMD Del 45–52 Del 45–52 1222 407 1/3 Stop at 2168

p.Glu2147LeufsX22 c.6439_7660del

DS-29 DMD Del 47–50 Del 47–50 547 182 1/3 Stop at 2263

p.Val2257LeufsX7 c.6763_7309del

DS-20 ND Del 47–50 Del 47–50 547 182 1/3 Stop at 2263

p.Val2257LeufsX7 c.6763_7309del

DS-48 BMD Del 48 Del 48 186 62 In-frame p.Val2305Gln2366del c.6913_7098del

DS-30 DMD Del 50 Del 49–50 211 70 1/3 Stop at 2375

p.Glu2367LeufsX9 c.7099_7309del

DS-31 IMD Del 50 Del 49–50 211 70 1/3 Stop at 2375

p.Glu2367LeufsX9 c.7099_7309del

DS-36 DMD Del 50 Del 50 109 36 1/3 Stop at 2409

p.Arg2401LeufsX9 c.7201_7309del

DS-32 IMD Del 50 Del 50 109 36 1/3 Stop at 2409

p.Arg2401LeufsX9 c.7201_7309del

DS-33 ND Del 50 Del 50 109 36 1/3 Stop at 2409

p.Arg2401LeufsX9 c.7201_7309del

DS-35 ND Del 50 Del 50 109 36 1/3 Stop at 2409

p.Arg2401LeufsX9 c.7201_7309del

DS-27 ND Del 50–52 Del 50–52 460 153 1/3 Stop at 2422

p.Arg2401LeufsX22 c.7201_7660del

DS-12 DMD Del 51 Del 51 233 77 2/3 Stop at 2469

p.Ser2437CysfsX33 c.7310_7542del

DS-18 DMD No del Del 55 190 63 1/3 Stop at 2700

p.Val2677ThrfsX24 c.8028_8217del

DS-11 ND No del Dup 50–51 343 114 In-frame p.Arg2401Lys2514dup c.7201-?_7542 +?dup

DMD Duchenne muscular dystrophy, IMD intermediate muscular dystrophy, BMD Becker muscular dystrophy, ND not determined aThese data are based on the Leiden Muscular Dystrophy Pages (http://www.dmd.nl/) and the UMD-DMD (http://www.umd.be/DMD/)

Elhawary et al. Human Genomics (2018) 12:18 Page 5 of 11

DMD phenotypes and 100% (7/7) of the individuals with IMD phenotypes. Likewise, the DMD genes in all cases with BMD phenotypes had in-frame functional effects on the DMD protein (cases #DS-48, #DS-50, and #DS- 52) (Tables 1 and 2). All previously described rearrange- ments we detected gave rise to a stop codon and thus a truncated protein, except for case #DS-48 with a BMD phenotype and case #DS-11 with ND, which gave rise to in-frame predictions (Table 2). Two cases identified in this study (#DS-53 and #DS-22) reflected both in-frame and reading frame shift predictions (Fig. 3). The complex rearrangement of case #DS-53 (del 45–52 + dup 21–23) was associated with an IMD phenotype, with transla- tional reading frame predictions with in-frame and frame shift patterns (Fig. 3). This phenotype may have occurred from the addition of exons 21–23 to the mRNA transcript lessening the damaging effect of del

45–52 on the functional protein. On the contrary, case #DS-22 could not have corrected for the harmful dup 56–58, giving rise to a DMD phenotype (Table 2).

Discussion The present study used a facile, reliable, and time- consuming MLPA strategy to identify large rearrange- ments covering all 79 exons of the DMD gene. Our re- sults showed the prevalence of 46.7 and 17.8%, respectively, for large deletions and large duplications in 45 Saudi patients with DMD. Unlike the hot spot dele- tions in exons 44–56 (92.5%), the hot spot deletions near the 5′ end of the gene were not distinctive, and no large hot spot duplications were found anywhere along the DMD gene. The presence of an unusual MLPA pattern in our Saudi sample, including non-contiguous duplica- tions as well as contiguous deletions combined with

Table 2 New DMD mutational rearrangements identified in this study and their predicted reading frame shifts

Case no. Phenotype Multiplex PCR MLPA del/dup Exon(s) del/dup Codons del/dup Frame shift Amino acid changea

DS-2 DMD No del dup 2–4 and dup 18–19 233; 212

77 2/3

70 2/3 + 2 + 2

p.Val89MetfsX15 b

p.Lys724Gly795fsX1

DS-14 DMD No del dup 8–30 3679 1226 1/3 + 1 p.Val218K1412fsX

DS-15 ND No del dup 8–30 3679 1226 1/3 + 1 p.Val218K1412fsX

DS-52 BMD No del dup 13 and dup 21–24 120 654

40 218

0 0

p.Val495Val535dup p.Asp875Lys1093dup

DS-50 BMD No del dup 17–24 1248 428 0 p.Ile665Lys1093dup

DS-53 IMD del 45–51 del 45–52 and dup 21–23 1222; 540

407 1/3

180 − 1c

0 p.Glu2147LeufsX22 C

p.Asp875Asn1055dup

DS-25 IMD del 45–51 del 45–56 1952 650 2/3 − 2 p.Glu2147Ser2798

DS-22 DMD No del dup 56–58 and dup 62–64 451 198

150 1/3

66 + 1 0

p.Ser2798Lys2891fsX4 p.Ser3056Asp3122

DMD Duchenne muscular dystrophy, IMD intermediate muscular dystrophy, BMD Becker muscular dystrophy, ND not determined aTheoretical amino acid change based on the database of the Leiden Muscular Dystrophy Pages (http://www.dmd.nl/) bPreviously described duplication at cDNA (c.6439_7660del) showing amino acid change (p.Glu2147LeufsX22) resulting in a termination transcript at codon 2168 cPreviously described deletion at cDNA (c.32_265dup) showing amino acid change (p.Val89MetfsX15) resulting in a termination transcript at codon 103

Fig. 2 The frequency of large mutational rearrangements for each exon of the DMD gene. A region with a high frequency of deletions was found in exons 44–56. No such region of frequency was detected for large duplications

Elhawary et al. Human Genomics (2018) 12:18 Page 6 of 11

non-contiguous duplications, suggests complex rear- rangements. Our findings regarding double, separate du- plications and complex rearrangements are consistent with some previous reports in Serbian and South African patients [18, 42]. Results from MLPA-related studies among different

ethnic populations are conflicting in terms of rates of large rearrangements within the DMD gene. Studies have found rates of deletions (and duplications) of 71.8– 79.0% (16.4–19.8%) in Chinese [29], 79.5% (6.5%) in In- dian [43], 60% (10.0%) in Japanese [44], 45.5–71.8% (16. 7%) in Korean [35], and 28.2% (20.5%) in Taiwanese [45] populations. When compared with our sample, a Turk- ish sample has also been shown to have a relatively higher rate of deletions within the DMD gene (63.7%)

[46], likely because of admixture with other European ethnicities. Rates of DMD deletions and duplications in some other Middle Eastern populations are more similar to what we found: Egyptian (51.3% deletions) [47], Iran- ian (51% deletions) [48], Moroccan (51% deletions) [49], and Syrian (49.0% deletions; 9.8% duplications) [50]. The majority of the reported DMD gene mutations in our Saudi data showed translational reading frame shifts (94. 4%), while 5.6% of the mutations did not follow the reading frame rule. This latter outcome is relatively con- sistent with the corresponding values in the TREAT- NMD DMD Global database (7%) [11], the UMD-DMD database (4%) [32], and the Leiden database (9%) [41]. The overall rate of consanguinity in KSA is 57.7%, ran-

ging from 34 to 80.6% [51], with lower rates in Mecca

Fig. 3 A schematic overview of new complex large rearrangements in the DMD gene. a The case #DS-53 with an unusual mixed rearrangement (dup 21–23 + del 45–52) leads to an out-of-frame shift giving rise to a severe DMD phenotype. b The case #DS2 with a double duplication (dup 2–4 + dup 18–19) results in out-of-frame shifts with a DMD phenotype. c The case #DS-52 with two in-frame shift due to double duplications (dup 13 + dup 21–24) giving rise to a BMD phenotype. d The case #DS-22 showed two double duplications within the mature mRNA giving an out-frame (dup 56–58), in-frame (dup 62–64) mutations giving rise to a DMD phenotype

Elhawary et al. Human Genomics (2018) 12:18 Page 7 of 11

(North Western region) than in Riyadh (Central region) (44.1% versus 62.8%) [51]. This may account for the in- creased deletions in patients of Riyadh (21/27, 77.8%) when compared with those in our study [37]. During Muslim immigration from the Levant, Africa, in Ancient Islamic times, much intermarriage reinforced gene flow of the DMD gene to the Saudi people. This has likely in- fluenced the prevalence of different Mendelian patterns, particularly X-linked types, exemplified by the consistency of data for DMD rearrangements between our study and a recent Spanish cohort study (46.1%, 131/284 for deletions and 56/284, 19.7% for duplica- tions) [52]. It is noteworthy that some populations have inherent

reproductive barriers that prevent interbreeding, which keeps them at native levels without merging (i.e., cryptic taxa). Other populations may lack inherent reproductive isolation. Therefore, admixture among different geo- graphical populations might increase genetic variations and perhaps create new genotypic combinations within non-isolated (or non-native) populations [53]. Thus, genetic variations among Gulf Arabs and some Middle Eastern individuals (e.g., Barbarians in North Africa, Kurdish, Upper Egyptian) [54, 55] should be handled with caution, as increased consanguinity, extensive re- productive isolation, and admixture with native source populations (e.g., Black Africans, South Eastern Asians, Caucasians) have had substantial roles in gene flow or founder effects in these populations. In our study, the analysis of DMD cases showed an ap-

parent diagnostic delay, as 69.8% of our patients showed their first symptoms at an early age (1–3 years), but 44% of these patients were 9–12 years old at first clinical examination. Other countries have also reported long delays in diagnosis of the disease, with a mean delay be- tween 1.6 and 2.5 years [56, 57]. In south China, the first symptoms occurred by 3 years of age, but the age at clinical evaluation was 6–8 years [36]. Numerous studies have advocated raising public awareness to identify early symptoms in DMD patients [47, 57, 58], as parents are usually the first to notice symptoms, which prompt them to visit a health professional. To further reduce diagnos- tic delay, creatine phosphokinase (CPK) testing should be emphasized in primary care and performed as a rou- tine test in children’s physical examinations. Earlier clinical trials reported the safety and biochem-

ical efficacy of intravenous or intramuscular administra- tion of antisense oligonucleotides (20-30 mer) to bring hope to DMD patients with large deletions [59]. There- fore, inducing exon 51 skipping to restore the open reading frame is an attractive therapeutic strategy that can be achieved with splice-switching oligomers. After the US Food and Drug Administration (FDA) acceler- ated approval of AVI-4658/eteplirsen (Exondys 51;

Sarepta Therapeutics Inc., Cambridge, MA, USA), tar- geting DMD exon 51 skipping, eteplirsen was approved and introduced in some countries [60–63]. Eteplirsen is useful for patients with amenable DMD deletions, end- ing at exon 50 and starting at exon 52 [64]. To date, ete- plirsen has not been approved by the Saudi FDA (https://www.sfda.gov.sa). Hence, numerous efforts have used antisense oligomers to target exon skipping of exon 53 (SRP4053, PRO053), exon 45 (DS-514b, SRP4045), and exon 44 (PRO044) (https://www.clinicaltrials.gov/ beta/home) [62, 65]. Based on our data for deletions, exon skipping could eventually apply to 81% (17/21) of DMD cases with large deletions. Among our Saudi pa- tients with DMD gene deletions, the exons most fre- quently skipped were exons 51 (42.9%, 9/21), 53 (14.3%, 3/21), 44 (9.5%, 2/21), 45 (4.8%, 1/21), 43 (4.8%, 1/21), and 50 (4.8%, 1/21). Wein et al. have recently reported the efficiency of exon skipping in the DMD gene, with each duplicated exon expressing a wild-type, full-length mRNA [66]. For more than one duplicated exon, several antisense oligomers can be delivered as a cocktail of drugs to skip larger regions of the transcript. Thus, for duplications in exons 45–55, therapeutic skipping can be applied to more than 60% of all DMD patients [67]. Although the power is conventionally utilized for poly-

genic disorders, the power under different monogenic model of inheritance has not been systematically consid- ered. This issue could be explained because of wide is- sues, for example, rate of background variation in disease-associated genes, mode of inheritance, extent of penetrance, and locus heterogeneity. In contrast to dominant model of inheritance, the in-

complete penetrance does not hold for the recessive model, and in consequence, much smaller sample sizes are needed under a recessive model, even in the pres- ence of high locus heterogeneity [68]. According to our priori sample size estimations at the effect size “r” = 0.3 (medium effect), or “r” = 0.5 (strong effect), we would need 64 or 21 sample sizes, respectively, to ensure a power detection of 80%. Thus, post hoc analysis using our DMD sample size data in this study (n = 45 cases) could achieve the power of 66.7% (r = 0.3) and 98.5% (r = 0.5). Pinning down the spectrum of mutations for DMD

has been difficult because of poor replication of studies. First, when compared with our study, some studies have had populations with admixed ethnicities, conflicted out- comes, or small sample sizes, which lessen the strength of the overall results. Second, various molecular tech- nologies have been utilized to examine DMD patients, resulting in a broad range of false-positive or false- negative results regarding rearrangements. Our study mainly used the DMD MLPA test, providing a cheap and straightforward DNA-based test that can screen for deletions and duplications and be performed in any

Elhawary et al. Human Genomics (2018) 12:18 Page 8 of 11

DNA laboratory. Third, insufficient communication be- tween clinicians and geneticists, because of difficulty accessing hospitals of interest, may result in underdiag- nosis of critical cases. However, precise coordination be- tween clinicians and geneticists may help promote and improve the genetic diagnosis of dystrophinopathies and ameliorate potential therapies in these cases.

Conclusions We detected nine previously undescribed exonic rear- rangements within the DMD gene, including one un- usual mixed rearrangement. MLPA or mPCR can be used to define the molecular characteristics of DMD re- arrangements and hence the effects of the frame shifts on genotype-phenotype correlations in Saudi patients. This information will also be important for future gene therapy targeting exon skipping of the DMD gene. Our clinical characteristics revealed a diagnostic delay, sug- gesting the need for more public awareness about early symptoms of disease. However, CPK testing should also be performed as a routine test in children’s hospitals and in primary care settings. In KSA, molecular testing of DMD patients should be covered by medical insurance, at least once in a lifetime. This single test could lead to genetic diagnosis of more patients. The large deletions and duplications we identified are predictive and intri- guing, but the study needs to be replicated in different ethnic populations of the Middle East, as well as in other Saudi governorates. Though the sample size for this study might not have been large enough to explore the DMD mutational mechanisms, extensive sequencing analyses will be needed to discover the DMD break- points at the nucleotide level. Ongoing analyses of whole-exome sequences for Saudi patients with DMD are being carried out to identify the small breakpoints within the DMD gene.

Additional file

Additional file 1: Table S1. Oligonucleotide Sequences of 15 multiplex PCR sets and amplification size fragments. (DOCX 17 kb)

Abbreviations BMD: Becker muscular dystrophy; CPK: Creatine phosphokinase; DMD: Duchenne muscular dystrophy; FDA: Food and Drug Administration; IMD: Intermediate muscular dystrophy; KSA: Kingdom of Saudi Arabia; MLPA: Multiplex ligation-probe dependent amplification; ND: Not determined

Acknowledgements The authors would like to thank the parents of the cases for their participation in this study. The authors also thank the Institute of Scientific Research at Umm Al-Qura University (Project #43309030) for financial support and the Faculty of Medicine, Cairo University-Giza, Egypt, for allowing ANE to assist the following up the clinical phenotypes of the DMD cases.

Funding This work was funded through grants from the Institute of Scientific Research at Umm Al-Qura University (Project #43309030).

Availability of data and materials The data sets analyzed during the current study are available from the corresponding author.

Authors’ contributions NAE and MTT designed the research; NAE, MTT, SJ, AD, EHJ, ANE, HK, and AK performed the research; NAE, MTT, and AHM analyzed the data; NAE, MTT, NB, KFA, and MR wrote the paper. Also, NAE and MTT initiated the grant funding through a contract with the Institute of Scientific Research at Umm Al-Qura University. All authors read and approved the final manuscript.

Ethics approval and consent to participate Written informed consent was obtained from the parents of all the participants enrolled in this project (#43309030), which was approved by the Institutional Biomedical Ethics Committee of Umm Al-Qura University. The study was performed by the declaration of the National Committee of Bio- medical Ethics at King Abdulaziz City for Sciences and Technology (KACST) (http://bioethics.kacst.edu.sa/About.aspx?lang=en-US).

Consent for publication Written informed consent was obtained from the parents of all study participants to publish the results.

Competing interests The authors declare that they have no competing interests.

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author details 1Department of Medical Genetics, Medicine College, Umm Al-Qura University, P.O. Box 57543, Mecca 21955, Saudi Arabia. 2Department of Molecular Genetics, Faculty of Medicine, Ain Shams University, Cairo 11566, Egypt. 3Department of Medical Laboratory Technology, Faculty of Applied Medical Sciences, King Abdul-Aziz University, Jeddah, Saudi Arabia. 4Department of Pediatrics, Al Hada Military Hospital, Al Hada, Saudi Arabia. 5Department of Plan and Research, General Directorate of Health Affairs, Mecca Region, Ministry of Health, Mecca, Saudi Arabia. 6Department of Pediatrics, Faculty of Medicine, Cairo University, Giza, Egypt.

Received: 27 January 2018 Accepted: 2 April 2018

References 1. Hoffman EP, Brown RH Jr, Kunkel LM. Dystrophin: the protein product of

the Duchenne muscular dystrophy locus. Cell. 1987;51(6):919–28. 2. Koenig M, Monaco AP, Kunkel LM. The complete sequence of dystrophin

predicts a rod-shaped cytoskeletal protein. Cell. 1988;53(2):219–28. 3. Emery AE. Population frequencies of inherited neuromuscular diseases—a

world survey. Neuromuscul Disord. 1991;1(1):19–29. 4. Bushby K, Finkel R, Birnkrant DJ, Case LE, Clemens PR, Cripe L, Kaul A,

Kinnett K, McDonald C, Pandya S, et al. Diagnosis and management of Duchenne muscular dystrophy, part 1: diagnosis, and pharmacological and psychosocial management. Lancet Neurol. 2010;9(1):77–93.

5. Jarmin S, Kymalainen H, Popplewell L, Dickson G. New developments in the use of gene therapy to treat Duchenne muscular dystrophy. Expert Opin Biol Ther. 2014;14(2):209–30.

6. Durbeej M, Campbell KP. Muscular dystrophies involving the dystrophin- glycoprotein complex: an overview of current mouse models. Curr Opin Genet Dev. 2002;12(3):349–61.

7. Monaco AP, Bertelson CJ, Liechti-Gallati S, Moser H, Kunkel LM. An explanation for the phenotypic differences between patients bearing partial deletions of the DMD locus. Genomics. 1988;2(1):90–5.

8. Hentze MW, Kulozik AE. A perfect message: RNA surveillance and nonsense- mediated decay. Cell. 1999;96(3):307–10.

Elhawary et al. Human Genomics (2018) 12:18 Page 9 of 11

9. Muntoni F, Torelli S, Ferlini A. Dystrophin and mutations: one gene, several proteins, multiple phenotypes. Lancet Neurol. 2003;2(12):731–40.

10. Flanigan KM, Dunn DM, von Niederhausern A, Howard MT, Mendell J, Connolly A, Saunders C, Modrcin A, Dasouki M, Comi GP, et al. DMD Trp3X nonsense mutation associated with a founder effect in North American families with mild Becker muscular dystrophy. Neuromuscul Disord. 2009; 19(11):743–8.

11. Bladen CL, Salgado D, Monges S, Foncuberta ME, Kekou K, Kosma K, Dawkins H, Lamont L, Roy AJ, Chamova T, et al. The TREAT-NMD DMD Global Database: analysis of more than 7,000 Duchenne muscular dystrophy mutations. Hum Mutat. 2015;36(4):395–402.

12. Chamberlain JS, Gibbs RA, Ranier JE, Nguyen PN, Caskey CT. Deletion screening of the Duchenne muscular dystrophy locus via multiplex DNA amplification. Nucleic Acids Res. 1988;16(23):11141–56.

13. Beggs AH, Koenig M, Boyce FM, Kunkel LM. Detection of 98% of DMD/BMD gene deletions by polymerase chain reaction. Hum Genet. 1990;86(1):45–8.

14. Forrest SM, Cross GS, Speer A, Gardner-Medwin D, Burn J, Davies KE. Preferential deletion of exons in Duchenne and Becker muscular dystrophies. Nature. 1987;329(6140):638–40.

15. Oudet C, Hanauer A, Clemens P, Caskey T, Mandel JL. Two hot spots of recombination in the DMD gene correlate with the deletion prone regions. Hum Mol Genet. 1992;1(8):599–603.

16. Den Dunnen JT, Grootscholten PM, Bakker E, Blonden LA, Ginjaar HB, Wapenaar MC, van Paassen HM, van Broeckhoven C, Pearson PL, van Ommen GJ. Topography of the Duchenne muscular dystrophy (DMD) gene: FIGE and cDNA analysis of 194 cases reveals 115 deletions and 13 duplications. Am J Hum Genet. 1989;45(6):835–47.

17. Hu XY, Ray PN, Murphy EG, Thompson MW, Worton RG. Duplicational mutation at the Duchenne muscular dystrophy locus: its frequency, distribution, origin, and phenotypegenotype correlation. Am J Hum Genet. 1990;46(4):682–95.

18. Lalic T, Vossen RH, Coffa J, Schouten JP, Guc-Scekic M, Radivojevic D, Djurisic M, Breuning MH, White SJ, den Dunnen JT. Deletion and duplication screening in the DMD gene using MLPA. Eur J Hum Genet. 2005;13(11):1231–4.

19. Elhawary NA, Shawky RM, Elsayed N. High-precision DNA microsatellite genotyping in Duchenne muscular dystrophy families using ion-pair reversed-phase high performance liquid chromatography. Clin Biochem. 2006;39(7):758–61.

20. Koenig M, Hoffman EP, Bertelson CJ, Monaco AP, Feener C, Kunkel LM. Complete cloning of the Duchenne muscular dystrophy (DMD) cDNA and preliminary genomic organization of the DMD gene in normal and affected individuals. Cell. 1987;50(3):509–17.

21. Ioannou P, Christopoulos G, Panayides K, Kleanthous M, Middleton L. Detection of Duchenne and Becker muscular dystrophy carriers by quantitative multiplex polymerase chain reaction analysis. Neurol. 1992;42(9):1783–90.

22. Kodaira M, Hiyama K, Karakawa T, Kameo H, Satoh C. Duplication detection in Japanese Duchenne muscular dystrophy patients and identification of carriers with partial gene deletions using pulsed-field gel electrophoresis. Hum Genet. 1993;92(3):237–43.

23. Yau SC, Bobrow M, Mathew CG, Abbs SJ. Accurate diagnosis of carriers of deletions and duplications in Duchenne/Becker muscular dystrophy by fluorescent dosage analysis. J Med Genet. 1996;33(7):550–8.

24. White S, Kalf M, Liu Q, Villerius M, Engelsma D, Kriek M, Vollebregt E, Bakker B, van Ommen GJ, Breuning MH, et al. Comprehensive detection of genomic duplications and deletions in the DMD gene, by use of multiplex amplifiable probe hybridization. Am J Hum Genet. 2002;71(2):365–74.

25. del Gaudio D, Yang Y, Boggs BA, Schmitt ES, Lee JA, Sahoo T, Pham HT, Wiszniewska J, Chinault AC, Beaudet AL, et al. Molecular diagnosis of Duchenne/Becker muscular dystrophy: enhanced detection of dystrophin gene rearrangements by oligonucleotide array-comparative genomic hybridization. Hum Mutat. 2008;29(9):1100–7.

26. Schouten JP, McElgunn CJ, Waaijer R, Zwijnenburg D, Diepvens F, Pals G. Relative quantification of 40 nucleic acid sequences by multiplex ligation- dependent probe amplification. Nucleic Acids Res. 2002;30(12):e57.

27. Schwartz M, Duno M. Improved molecular diagnosis of dystrophin gene mutations using the multiplex ligation-dependent probe amplification method. Genet Test. 2004;8(4):361–7.

28. Janssen B, Hartmann C, Scholz V, Jauch A, Zschocke J. MLPA analysis for the detection of deletions, duplications and complex rearrangements in the dystrophin gene: potential and pitfalls. Neurogenetics. 2005;6(1):29–35.

29. Chen C, Ma H, Zhang F, Chen L, Xing X, Wang S, Zhang X, Luo Y. Screening of Duchenne muscular dystrophy (DMD) mutations and investigating its mutational mechanism in Chinese patients. PLoS One. 2014;9(9):e108038.

30. Nobile C, Toffolatti L, Rizzi F, Simionati B, Nigro V, Cardazzo B, Patarnello T, Valle G, Danieli GA. Analysis of 22 deletion breakpoints in dystrophin intron 49. Hum Genet. 2002;110(5):418–21.

31. Cotton RG, Auerbach AD, Beckmann JS, Blumenfeld OO, Brookes AJ, Brown AF, Carrera P, Cox DW, Gottlieb B, Greenblatt MS, et al. Recommendations for locus-specific databases and their curation. Hum Mutat. 2008;29(1):2–5.

32. Tuffery-Giraud S, Beroud C, Leturcq F, Yaou RB, Hamroun D, Michel-Calemard L, Moizard MP, Bernard R, Cossee M, Boisseau P, et al. Genotype-phenotype analysis in 2,405 patients with a dystrophinopathy using the UMD-DMD database: a model of nationwide knowledgebase. Hum Mutat. 2009;30(6):934–45.

33. Mitsui J, Takahashi Y, Goto J, Tomiyama H, Ishikawa S, Yoshino H, Minami N, Smith DI, Lesage S, Aburatani H, et al. Mechanisms of genomic instabilities underlying two common fragile-site-associated loci, PARK2 and DMD, in germ cell and cancer cell lines. Am J Hum Genet. 2010;87(1):75–89.

34. Ankala A, Kohn JN, Hegde A, Meka A, Ephrem CL, Askree SH, Bhide S, Hegde MR. Aberrant firing of replication origins potentially explains intragenic nonrecurrent rearrangements within genes, including the human DMD gene. Genome Res. 2012;22(1):25–34.

35. Suh MR, Lee KA, Kim EY, Jung J, Choi WA, Kang SW. Multiplex ligation- dependent probe amplification in X-linked recessive muscular dystrophy in Korean subjects. Yonsei Med J. 2017;58(3):613–8.

36. Wang DN, Wang ZQ, Yan L, He J, Lin MT, Chen WJ, Wang N. Clinical and mutational characteristics of Duchenne muscular dystrophy patients based on a comprehensive database in South China. Neuromuscul Disord. 2017; 27(8):715–22.

37. Al-Jumah M, Majumdar R, Al-Rajeh S, Chaves-Carballo E, Salih MM, Awada A, Al-Shahwan S, Al-Uthaim S. Deletion mutations in the dystrophin gene of Saudi patients with Duchenne and Becker muscular dystrophy. Saudi Med J. 2002;23(12):1478–82.

38. Tayeb MT. Deletion mutations in Duchenne muscular dystrophy (DMD) in Western Saudi children. Saudi J Biol Sci. 2010;17(3):237–40.

39. Elhawary NA, Nassir A, Saada H, Dannoun A, Qoqandi O, Alsharif A, Tayeb MT. Combined genetic biomarkers confer susceptibility to risk of urothelial bladder carcinoma in a Saudi population. Dis Markers. 2017;2017:1474560.

40. Aartsma-Rus A, Van Deutekom JC, Fokkema IF, Van Ommen GJ, Den Dunnen JT. Entries in the Leiden Duchenne muscular dystrophy mutation database: an overview of mutation types and paradoxical cases that confirm the reading-frame rule. Muscle Nerve. 2006;34(2):135–44.

41. White SJ, den Dunnen JT. Copy number variation in the genome; the human DMD gene as an example. Cytogenet Genome Res. 2006;115(3–4):240–6.

42. Kerr R, Robinson C, Essop FB, Krause A. Genetic testing for Duchenne/Becker muscular dystrophy in Johannesburg. South Africa S Afr Med J. 2013;103(12 Suppl 1):999–1004.

43. Manjunath M, Kiran P, Preethish-Kumar V, Nalini A, Singh RJ, Gayathri N. A comparative study of mPCR, MLPA, and muscle biopsy results in a cohort of children with Duchenne muscular dystrophy: a first study. Neurol India. 2015;63(1):58–62.

44. Okubo M, Minami N, Goto K, Goto Y, Noguchi S, Mitsuhashi S, Nishino I. Genetic diagnosis of Duchenne/Becker muscular dystrophy using next- generation sequencing: validation analysis of DMD mutations. J Hum Genet. 2016;61(6):483–9.

45. Liang WC, Wang CH, Chou PC, Chen WZ, Jong YJ. The natural history of the patients with Duchenne muscular dystrophy in Taiwan: a medical center experience. Pediatr Neonatol. 2017.

46. Ulgenalp A, Giray O, Bora E, Hizli T, Kurul S, Sagin-Saylam G, Karasoy H, Uran N, Dizdarer G, Tutuncuoglu S, et al. Deletion analysis and clinical correlations in patients with Xp21 linked muscular dystrophy. Turk J Pediatr. 2004;46(4):333–8.

47. Elhawary NA, Shawky RM, Hashem N. Frameshift deletion mechanisms in Egyptian Duchenne and Becker muscular dystrophy families. Mol Cells. 2004;18(2):141–9.

48. Nouri N, Fazel-Najafabadi E, Salehi M, Hosseinzadeh M, Behnam M, Ghazavi MR, Sedghi M. Evaluation of multiplex ligation-dependent probe amplification analysis versus multiplex polymerase chain reaction assays in the detection of dystrophin gene rearrangements in an Iranian population subset. Adv Biomed Res. 2014;3:72.

49. Sbiti A, El Kerch F, Sefiani A. Analysis of dystrophin gene deletions by multiplex PCR in Moroccan patients. J Biomed Biotechnol. 2002;2(3):158–60.

Elhawary et al. Human Genomics (2018) 12:18 Page 10 of 11

50. Madania A, Zarzour H, Jarjour RA, Ghoury I. Combination of conventional multiplex PCR and quantitative real-time PCR detects large rearrangements in the dystrophin gene in 59% of Syrian DMD/BMD patients. Clin Biochem. 2010;43(10–11):836–42.

51. el-Hazmi MA, al-Swailem AR, Warsy AS, al-Swailem AM, Sulaimani R, al- Meshari AA. Consanguinity among the Saudi Arabian population. J Med Genet. 1995;32(8):623–6.

52. Vieitez I, Gallano P, Gonzalez-Quereda L, Borrego S, Marcos I, Millan JM, Jairo T, Prior C, Molano J, Trujillo-Tiebas MJ, et al. Mutational spectrum of Duchenne muscular dystrophy in Spain: study of 284 cases. Neurologia. 2017;32(6):377–85.

53. Lavergne S, Molofsky J. Increased genetic variation and evolutionary potential drive the success of an invasive grass. Proc Natl Acad Sci U S A. 2007;104:3883–8.

54. Rund D, Cohen T, Filon D, Dowling CE, Warren TC, Barak I, Rachmilewitz E, Kazazian HH Jr, Oppenheim A. Evolution of a genetic disease in an ethnic isolate: beta-thalassemia in the Jews of Kurdistan. Proc Natl Acad Sci U S A. 1991;88(1):310–4.

55. Jiffri EH, Bogari N, Zidan KH, Teama S, Elhawary NA. Molecular updating of β-thalassemia mutations in the upper Egyptian population. Hemoglobin. 2010;34(6):538–47.

56. Ciafaloni E, Fox DJ, Pandya S, Westfield CP, Puzhankara S, Romitti PA, Mathews KD, Miller TM, Matthews DJ, Miller LA, et al. Delayed diagnosis in duchenne muscular dystrophy: data from the Muscular Dystrophy Surveillance, Tracking, and Research Network (MD STARnet). J Pediatr. 2009; 155(3):380–5.

57. van Ruiten HJ, Straub V, Bushby K, Guglieri M. Improving recognition of Duchenne muscular dystrophy: a retrospective case note review. Arch Dis Child. 2014;99(12):1074–7.

58. Li X, Zhao L, Zhou S, Hu C, Shi Y, Shi W, Li H, Liu F, Wu B, Wang Y. A comprehensive database of Duchenne and Becker muscular dystrophy patients (0−18 years old) in East China. Orphanet J Rare Dis. 2015;10:5.

59. Cirak S, Arechavala-Gomeza V, Guglieri M, Feng L, Torelli S, Anthony K, Abbs S, Garralda ME, Bourke J, Wells DJ, et al. Exon skipping and dystrophin restoration in patients with Duchenne muscular dystrophy after systemic phosphorodiamidate morpholino oligomer treatment: an open-label, phase 2, dose-escalation study. Lancet. 2011;378(9791):595–605.

60. Mendell JR, Goemans N, Lowes LP, Alfano LN, Berry K, Shao J, Kaye EM, Mercuri E, Eteplirsen Study G, Telethon Foundation DMDIN. Longitudinal effect of eteplirsen versus historical control on ambulation in Duchenne muscular dystrophy. Ann Neurol. 2016;79(2):257–71.

61. Shimizu-Motohashi Y, Miyatake S, Komaki H, Takeda S, Aoki Y. Recent advances in innovative therapeutic approaches for Duchenne muscular dystrophy: from discovery to clinical trials. Am J Transl Res. 2016;8(6):2471–89.

62. Lee BL, Nam SH, Lee JH, Ki CS, Lee M, Lee J. Genetic analysis of dystrophin gene for affected male and female carriers with Duchenne/Becker muscular dystrophy in Korea. J Korean Med Sci. 2012;27(3):274–80.

63. Lim KR, Maruyama R, Yokota T. Eteplirsen in the treatment of Duchenne muscular dystrophy. Drug Des Devel Ther. 2017;11:533–45.

64. van Deutekom JC, van Ommen GJ. Advances in Duchenne muscular dystrophy gene therapy. Nat Rev Genet. 2003;4(10):774–83.

65. Mah JK. Current and emerging treatment strategies for Duchenne muscular dystrophy. Neuropsychiatr Dis Treat. 2016;12:1795–807.

66. Wein N, Vulin A, Findlay AR, Gumienny F, Huang N, Wilton SD, Flanigan KM. Efficient skipping of single exon duplications in DMD patient-derived cell lines using an antisense oligonucleotide approach. J Neuromuscul Dis. 2017; 4(3):199–207.

67. Aoki Y, Yokota T, Nagata T, Nakamura A, Tanihata J, Saito T, Duguez SM, Nagaraju K, Hoffman EP, Partridge T, et al. Bodywide skipping of exons 45- 55 in dystrophic mdx52 mice by systemic antisense delivery. Proc Natl Acad Sci U S A. 2012;109(34):13763–8.

68. Guo MH, Dauber A, Lippincott MF, Chan YM, Salem RM, Hirschhorn JN. Determinants of power in gene-based burden testing for monogenic disorders. Am J Hum Genet. 2016;99(3):527–39.

Elhawary et al. Human Genomics (2018) 12:18 Page 11 of 11

  • Abstract
    • Background
    • Results
    • Conclusions
  • Background
  • Methods
    • Ethics statement and participants
    • DNA isolation
    • Multiplex polymerase chain reaction
    • Multiplex ligation-dependent probe amplification
    • Data analysis
    • Databases and confirming mutations for the DMD gene
      • Statistical analysis
  • Results
    • Clinical profile
    • Hardy-Weinberg equilibrium
    • Large-scale rearrangements
    • New mutations in the DMD gene
    • Distribution of rearrangements
    • Reading frame shift and phenotype correlation
  • Discussion
  • Conclusions
  • Additional file
  • Abbreviations
  • Funding
  • Availability of data and materials
  • Authors’ contributions
  • Ethics approval and consent to participate
  • Consent for publication
  • Competing interests
  • Publisher’s Note
  • Author details
  • References

Arab genome-Health and Wealth-2016.pdf

Gene 592 (2016) 239–243

Contents lists available at ScienceDirect

Gene

journal homepage: www.elsevier.com/locate/gene

Review

The Arab genome: Health and wealth

Hatem Zayed College of Health and Sciences, Biomedical Sciences Department, Qatar University, PO Box 2713, Doha, Qatar

E-mail address: [email protected].

http://dx.doi.org/10.1016/j.gene.2016.07.007 0378-1119/© 2016 Published by Elsevier B.V.

a b s t r a c t

a r t i c l e i n f o

Article history: Received 21 June 2016 Accepted 3 July 2016 Available online 5 July 2016

The 22 Arab nations have a unique genetic structure, which reflects both conserved and diverse gene pools due to the prevalent endogamous and consanguineous marriage culture and the long history of admixture among dif- ferent ethnic subcultures descended from the Asian, European, and African continents. Human genome sequenc- ing has enabled large-scale genomic studies of different populations and has become a powerful tool for studying disease predictions and diagnosis. Despite the importance of the Arab genome for better understanding the dy- namics of the human genome, discovering rare genetic variations, and studying early human migration out of Africa, it is poorly represented in human genome databases, such as HapMap and the 1000 Genomes Project. In this review, I demonstrate the significance of sequencing the Arab genome and setting an Arab genome reference(s) for better understanding the molecular pathogenesis of genetic diseases, discovering novel/rare var- iants, and identifying a meaningful genotype-phenotype correlation for complex diseases.

© 2016 Published by Elsevier B.V.

Keywords: Arab countries Human genome sequencing Whole exome sequencing Consanguinity Endogamous marriage Novel genes Novel variants

Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 2. The Arab world. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240

2.1. Inbred Arab communities and rare variants discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 3. The Arab genome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

3.1. Discovery of novel disease-causing genes and the Arab genome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 3.2. Arab efforts in genome sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 3.3. The Arab genome and the “Out of Africa” theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 3.4. Benefits of sequencing the Arab genome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

4. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 Disclosure declaration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

1. Introduction

The completion of the Human Genome Project (HGP) in April 2003 provided a wealth of information to scientists and clinicians. Subse- quently, the world has witnessed rapid evolution in the field of human genetics and genomics (Lander et al., 2001; Venter et al., 2001). Initially, the focus of the HGP was to catalog the protein- expressing genes, which are now estimated to include approximately 20,000 to 25,000 coding genes (International Human Genome Sequencing Consortium, 2004). However, the hard work of decoding

the function of many genes and their precise genotype-phenotype cor- relation in disease development remains.

From the publication of the first draft of the human genome, there has been fierce competition to develop sequencing technologies that are faster, more efficient and cheaper and to make the price of human genome sequencing more affordable. Thus far, whole genome/exome sequencing has provided outstanding insights into the frequency and incidence of novel variants in the human genome that are associated with disease phenotypes. This information provides opportunities to different populations in the world to be able to map the sequence vari- ants that might be unique to their own individuals and that might be re- sponsible for genetic disorders in their specific populations. For this purpose, the HapMap (human haplotype mapping) Project was

240 H. Zayed / Gene 592 (2016) 239–243

launched in 2002 (International HapMap Consortium, 2003); this pro- ject has identified a considerable number of genetic variants, providing extensive catalogs for genetic variation. The HapMap Project has also served as the basis for genome-wide association studies (GWAS). In particular, the HapMap Project has contributed to the successful map- ping of more than 100 genomic regions that are associated with genetic diseases (International HapMap Consortium, 2003).

As an extension of the HapMap Project, the 1000 Genomes Project was launched in 2008 through international concerted efforts (Buchanan et al., 2012). This project aims to sequence the whole ge- nomes of 1000 unidentified individuals from Europe, America, Africa, and Asia, and will add information to the single-nucleotide polymor- phism (SNP) database already cataloged by the HapMap Project and provide a rich resource for both SNPs and structural variant haplotypes. Although this information will allow researchers to learn more about many genetic variants and genetic diseases, unfortunately, the Arab ge- nome is greatly under-represented in the international efforts of such genomic studies; specifically, it is not included in the HGP, HapMap Pro- ject, or 1000 Genomes Project. There is no doubt that the importance of the Arab genome sequencing is significant and that this genome thus should not be omitted from the diverse collections of genomes that have already been sequenced. Therefore, I am focusing this review on elaborating upon the importance of the Arab genome and the potential contribution of the Arab genome to the genomic sciences.

2. The Arab world

The Arab world includes 22 Arabic-speaking countries (Fig. 1). Ac- cording to the World Bank latest classification for 2015 (http://data. worldbank.org), the Arab countries include high-income countries (HICs) such as Bahrain, Kuwait, Oman, Saudi Arabia, Qatar, and the United Arab Emirates; middle-income countries (MICs) such as Algeria, Egypt, Iraq, Jordan, Lebanon, Libya, Morocco, Palestine, Sudan, Syria, and Tunisia; and low-income countries (LICs) such as Comoros, Djibouti, Mauritania, Somalia, and Yemen. These countries occupy a

Fig. 1. Arabic speaking countries accordi (Source: http://www.arabic-keyboard.o

large area that extends from the Atlantic Ocean in the west to the Arabi- an Sea in the east, and the Arab population is approaching 0.5 billion. This region has been extensively exposed to many successive invaders from Turkey, Rome, and Europe as well as to traders and immigrants, thus contributing to mixing of the ethnic demographics of the popula- tion. However, the HICs, which include countries with the highest Gross Domestic Product (GDP) per capita worldwide (http://data. worldbank.org), spend less than 0.2% of their GDP on scientific develop- ment (Giles, 2006). This phenomenon has led to the immigration of many Arab scientists into the West to look for better opportunities. However, recently, biomedical disease-based research has received spe- cial attention from Arab governments, with the aim of improving the understanding and treatment of common diseases afflicting the Arab population. Various attempts have been made by Saudi Arabia and Qatar in particular to establish a research infrastructure, but the prog- ress has been significantly slow relative to the amount of capital infused into such programs, and the benefits of such investments might take significant time to yield results. In this manuscript I will refer to the “Arab genome” as the genome of the 22 Arab countries.

2.1. Inbred Arab communities and rare variants discovery

There are 955 genetic diseases that have been identified in Arabs, of which 586 (60%) are reported to be recessive diseases (http://www. cags.org.ae). Arabs have one of the highest rates of consanguineous marriage worldwide, reaching up to ~70%, with an extreme prevalence of first-cousin marriage (Tadmouri et al., 2009), These factors, together with the endogamous marriage culture and large family sizes, are re- sponsible for the spread of genetic diseases in Arab countries, with a high prevalence of rare diseases (Teebi and Teebi, 2005). Endogamous marriages approach 100% in many Arab countries, and especially the Gulf States (i.e., Bahrain, Kuwait, Oman, Qatar, Saudi Arabia and the United Arab Emirates). For example, women in Saudi Arabia are prohibited from marrying men other than Arab men from the Gulf countries without special dispensation from the king (http://web.

ng to the latest WHO classification. rg/arabic).

241H. Zayed / Gene 592 (2016) 239–243

archive.org/web/20120614045804/http://travel.state.gov/travel/cis_ pa_tw/tw/tw_931.html), and men must acquire a government permit to marry a foreign woman. This law is applicable to the six Gulf States and is due to deeply entrenched, centuries-old traditions that strongly favor marriage within the same Arab subcultures. In addition, this mar- riage culture is still on the rise; for example, consanguineous marriage rates in Qatar increased from 41.8% to 54.5% in just one generation (Bener and Alali, 2006).

Although a large number of rare variants still have unknown clinical significance because of the limitations of current technologies, which can be attributed to the need of large number of individuals harboring these variants that are largely untested by high-density SNP arrays. Therefore, studying inbred communities such as Arab communities is an ideal scenario to understand the effect of genetic variants on the human genome. In this regard, genetic analysis of the Arab genome is considered to be a goldmine for genomic scientists who are looking for a more discernible correlation between the genotype and the pheno- type of genetic diseases, and particularly complex disorders and rare ge- netic disorders. The inbreeding nature of many Arab communities and the commonness of the conservative marriage culture might predict a wide class of complex disorders, especially if the causative variants are rare and the most identified genetic variants causing the complex dis- eases in humans are partially recessive (Bittles and Black, 2010; Rudan et al., 2003). In this regard, Arabs represent an ideal population for bet- ter understanding the pathogenesis and prognosis of recessive diseases, which are yet to be elucidated. Although the consanguineous, endoga- mous Arab culture seems to predict a conserved pool of genes among Arabs, the structure of the Arab genome became diversified over time, mainly due to admixing of the genome with those of different ethnic groups descended from Africa, Asia, and Europe (Teebi and Teebi, 2005), which provide another opportunity for understanding the dy- namic of the Arab genome and the “out of Africa” migration theory.

3. The Arab genome

Although the Arab region is considered to be a hot spot for medical and clinical genetic studies, (Nat. Genet., 2006) Arabs have been slow to explore their own genome. This reticence might be due to the follow- ing reasons: (1) in most Arab countries, it is not yet affordable to se- quence a genome, even for clinical diagnostic reasons, despite the continual diminishing costs of next-generation sequencing technolo- gies; (2) research is not considered to be a necessity in most Arab coun- tries, mainly due to economic reasons; and (3) there is a dearth of well- trained scientists in genomics. As a consequence, there is a lack of infor- mation related to molecular pathogenesis and poor knowledge of both the genotype-phenotype correlation of genetic diseases and the gene variants that are responsible for the spread of these diseases that are segregating in the Arab genome. This is the case even for the most dev- astating diseases, such as diabetes and cardiovascular disorders, which compromises the level of the health care provided to the Arab popula- tion. Therefore, Arab governments must prioritize seeking the means to understand the complexity and dynamics of the Arab genome, espe- cially in countries that are able to afford the costs of genome sequencing. Consistent with this concept, a genomic revolution has been ignited in the Arabian Peninsula, especially in the Gulf States of Saudi Arabia, Kuwait, and Qatar, as the US Encyclopedia of DNA Elements (ENCODE) project and the Arab genome initiatives, represented by the Saudi Human Genome Project (SHGP) (http://shgp.kacst.edu.sa/site), the Qatar Genome Project (QGP) (Al-Mulla, 2014), and the Kuwaiti Genome Project (KGP) (Thareja et al., 2015), aim to systematically and compre- hensively analyze and catalog the genetic variants and haplotypes that are associated with health and disease. These efforts are expected to help in the identification of novel disease associated gene variants. The initiatives also aim to derive reference genome(s) sequence for dif- ferent subpopulations of different ancestries in Kuwait. Although Arab scientists are a decade late in sequencing the Arab genome, this

sequencing is expected to contribute to knowledge related to migration genome ancestry, genome evolution, genome dynamics, mapping of rare disease-associated variants, and novel disease associated gene discovery.

3.1. Discovery of novel disease-causing genes and the Arab genome

Inbreeding is associated with an increased disease risk based on in- creased homozygosity at many genetic loci (Rudan et al., 2003) and leads to a high probability of shared ancestry between randomly select- ed Arab individuals and longer runs of homozygosity, this is an ideal way to map rare disease susceptibility loci among highly consanguine- ous families in inbred Arab communities. A representative example was provided by Verge et al. (1998), who analyzed an inbred Bedouin Arab community who has a long history of first-cousin marriage, they analyzed a large Arab family of 248 individuals living in Israel that had 19 relatives affected with type 1 diabetes who carried rare predisposing haplotypes to type 1 diabetes that were not found in other families. In- terestingly, the researchers discovered a novel susceptibility locus (IDDM17; MIM#603266) for type 1 diabetes, which was mapped to chromosome 10 (10q25.1). Another example is the identification of a novel locus that was defined by the TMEM107 mutation through se- quencing 25 families with the rare, ciliopathic Meckel-Gruber syndrome (Shaheen et al., 2015), and another study that successfully led to the discovery of six novel candidate genes which found to be associated with embryonic lethality in Saudi Arabian consanguineous families (Shamseldin et al., 2015).

The whole exome sequencing (WES) was also successful to reveal a long list of novel candidate genes among consanguineous Arab families, including, but not limited to, identifying 69 genes which are linked to recessive diseases in 143 multiplex Saudi fami- lies, which was not previously associated with genetic diseases (Alazami et al., 2015). Diagnostic WES has also been able to identify several novel disease-associated genes among 149 probands that be- long to highly consanguineous population in Qatar, with various Mendelian phenotypes but mainly neurocognitive (Yavarna et al., 2015). In a study of 18 consanguineous Arab families with Meckel– Gruber syndrome (MKS), WES revealed a likely pathogenic mutation in three novel candidate MKS disease-causing genes (C5orf42, EVC2, and SEC8) (Shaheen et al., 2013). The ARL6IP6 gene was identified as a novel candidate gene for a syndromic form of CMTC in a Saudi con- sanguineous family (Abumansour et al., 2015). Therefore, the Arab genome carries significant potential in advancing the fields of clinical and medical genetics.

3.2. Arab efforts in genome sequencing

The SHGP is a 5-year project launched in December 2013 that in- volves a partnership between the SHGP and Life Technologies (http:// shgp.kacst.edu.sa/site). The aim of the project is to sequence 100,000 Saudi genomes that represent both normal and disease conditions to identify Saudi-specific genetic variants that are linked to high- incidence genetic diseases in Saudi Arabia, such as diabetes, deafness, cardiovascular disorders, cancer, and neurodegenerative diseases (Abu-Elmagd et al., 2015). The SHGP's specific mission is to establish a genotype-phenotype correlation for genetic disease and to create a foundation for personalized medicine, in which treatment will be devel- oped based on the DNA blueprint of each Saudi individual. This ap- proach will reduce the cost of health care, as the health care expenses related to human genetic disease are greater than $30 billion annually in Saudi Arabia (http://shgp.kacst.edu.sa/site).

A few days after the SHGP announcement, Qatar announced its in- tention to launch the QGP and a plan to sequence the genomes of all Qatari citizens (~300,000) (Al-Mulla, 2014). Similarly to the SHGP, the QGP seeks the future protection of Qatari citizens from the spread of ge- netic diseases due to the deep-entrenched culture of endogamous and

242 H. Zayed / Gene 592 (2016) 239–243

consanguineous marriage by understanding the genomic make-up of the Qatari population, and integrating the sequencing information into clinical care for Qatari individuals. The data collected from the genome sequencing will be used as a platform for developing customized molec- ular diagnostics approaches to Arabs (Zayed and Ouhtit, 2016), help to create the foundation of personalized medicine in the Arabian Peninsu- la, and are expected to advance prenatal screening, genetic counseling for disease-carrying individuals in Qatar. QGP has already started its pilot phase by sequencing 3000 Qatari citizens (http://www.qatar- tribune.com/viewnews.aspx?d=20151214&cat=nation2&pge=5). Computational analyses aimed to decode the Qatari genome and map the genetic variants which are unique to the Qatari individuals, are sup- ported by generous competitive funding from Qatar Foundation (https://www.qf.org.qa). These sequencing data are kept in electronic medical records which will be an integral part of the Qatari National Health Service.

The KGP is an initiative to determine the genetic diversity of the main ethnic groups that constitute the Kuwaiti population, namely, Saudi Arabians, Bedouins, and Persians, ascribing their origin to dif- ferent regions of the Arabian Peninsula and West Asia (modern Iranians). Thus, this project is the first to report a reference genome resource for the population of Persian ancestry in Kuwait (Thareja et al., 2015).

3.3. The Arab genome and the “Out of Africa” theory

The modern Arab gene pool exhibits a very interesting genetic structure: it has numerous pockets of inbred communities due to the prevalence of consanguineous unions, conserved pools of ge- nomes due to widespread endogamous marriage, and a mixed gene pool due to the history of Arab nations and the admixture of the ge- nomes of different ethnic groups with those of people from Europe, Africa, and Asia. This diversity is important in terms of understand- ing genome evolution and dynamics, answering the “Out of Africa” human migration question, and providing insights into the migra- tion routes of early modern humans from Africa to Eurasia. The pri- mary African origin of all modern human populations is well known, but the routes of human migration out of Africa are still un- certain. One potential route is through Levant. Although the North African background is mainly stemmed from Near East/Arabian Pen- insula, the genomic ancestry of the Arabs of North Africa supports an African genome background due to the historical mixing with sub- Saharan African genome (Henn et al., 2012). Another potential route is to the South, across the Arabian Peninsula, which is a nexus of Asia, Africa, and Europe (Kopp et al., 2014). Interestingly, Fernandes et al. (Fernandes et al., 2012) focused in disentangling be- tween the impact of several waves of migration into Arabian Penin- sula in terms of contribution of African input and provided a proof that Arabian Peninsula could be the first staging post in the spread of modern humans from Africa to the rest of the world.

Interestingly, sequencing of just 13 exomes and 2 full genomes in Kuwait revealed ancestral genomic signature traces stemming from Asia, Europe and Africa (Alsmadi et al., 2014; Alsmadi et al., 2013). Egypt is an Afro-Asian Arab country that shares the Mediterranean Sea with European countries (Fig. 1), and it has been proposed as a potential source of the exodus of the African genome to Eurasia (Pagani et al., 2015) according to geographical, archaeological, and genetic evidence. African genomic components have been mapped (Pagani et al., 2015); however, most of the analyzed Egyptian haplotypes were genetically similar to those of modern non-Africans. The study concluded that Egypt was a potential gateway for the migration of the African genome to the rest of the world. Therefore, comparing the Egyptian genomes with European ones supports the exit route, where Ethiopian genomes compared with Arab genomes addresses southern route of the out-of- Africa migration.

3.4. Benefits of sequencing the Arab genome

Given the frequent spread of genetic diseases in Arab countries, reaching reference genome(s) reflecting the diversity and population structure of Arab countries will serve as an example for other communi- ties with comparable population structures and will have many bene- fits, including, but not limited to, (1) serving as a vital tool for the identification of novel variants; (2) serving as a baseline for further ge- nomic epidemiological studies in Arab nations; (3) serving as a useful foundation for cohort and case-control genetic studies that aim to char- acterize the genetic etiology of genetic diseases; (4) improving genetic counseling for individuals with genetic disorders; (5) serving as a plat- form for future GWAS; (6) advancing translational medicine in the fields of personalized medicine and pharmacogenomics, allowing med- ications to be individualized to Arab patients and Arab responses to drugs to become well understood; (7) allowing the study of inbred Arab communities, and specifically the Bedouin population, thus serv- ing as a valuable tool to facilitate the discovery of rare and novel gene variants and novel genes; this information is very important to better understand the molecular pathology of complex diseases/traits and is expected to shed light on other genetic risk factors related to gene- environment interactions and epistasis as well as many other genetic risk factors with major importance in genetic disease development, and (8) serve as a historical tracing tool for population migration.

The ultimate goal of the Arab genome is to create a database of the DNA variation in the Arab population and to make it available to clini- cians and researchers in Arab countries who seek to increase the power of disease prediction, to understand gene drug interactions, to study the Arab population substructures, to improve understanding of the nature of Arab genetic diversity, and to trace population migration. All of these endeavors will contribute to one major aim, which is to im- prove patients' quality of life by improving overall health care and sav- ing lives. However, translating the outcome of the results of the Arab genome into effective clinical practice is a challenging task that will re- quire concerted efforts by both policymakers and scientists to imple- ment effective strategies in the health care sector and to make funding available to allow such programs to continue.

4. Conclusion

Arabs are an ideal population for genetic studies, with a diverse genet- ic structure, ranging from inbred communities to a diverse gene pool that includes elements from Europe, Asia, and Africa. This feature renders the Arab population a rich source of information that would be of global benefit. This emphasizes the value of a consensus Arab genome reference(s) which will positively impact the future directions of person- alized medicine. Using genomic sequencing technologies, numerous rare variants and novel genes have been identified in Arab families, mainly with consanguineous marriage history. The outcome of the SHGP and QGP are soon to be released, which will pave the way of a future consen- sus Arab genome reference(s). Therefore, there is an urgent need for data sharing, both locally and internationally, which dictates the need for the development of mechanisms and standards to facilitate this sharing.

Disclosure declaration

Hatem Zayed declares no conflict of interest.

References

Abu-Elmagd, M., Assidi, M., Schulten, H.J., Dallol, A., Pushparaj, P., Ahmed, F., Scherer, S.W., Al-Qahtani, M., 2015. Individualized medicine enabled by genomics in Saudi Arabia. BMC Med. Genet. 8 (Suppl. 1), S3.

Abumansour, I.S., Hijazi, H., Alazmi, A., Alzahrani, F., Bashiri, F.A., Hassan, H., Alhaddab, M., Alkuraya, F.S., 2015. ARL6IP6, a susceptibility locus for ischemic stroke, is mutated in a patient with syndromic Cutis Marmorata Telangiectatica Congenita. Hum. Genet. 134, 815–822.

243H. Zayed / Gene 592 (2016) 239–243

Alazami, A.M., Patel, N., Shamseldin, H.E., Anazi, S., Al-Dosari, M.S., Alzahrani, F., Hijazi, H., Alshammari, M., Aldahmesh, M.A., Salih, M.A., Faqeih, E., Alhashem, A., Bashiri, F.A., Al-Owain, M., Kentab, A.Y., Sogaty, S., Al Tala, S., Temsah, M.-H., Tulbah, M., Aljelaify, R.F., Alshahwan, S.A., Seidahmed, M.Z., Alhadid, A.A., Aldhalaan, H., AlQallaf, F., Kurdi, W., Alfadhel, M., Babay, Z., Alsogheer, M., Kaya, N., Al-Hassnan, Z.N., Abdel-Salam, G.M.H., Al-Sannaa, N., Al Mutairi, F., El Khashab, H.Y., Bohlega, S., Jia, X., Nguyen, H.C., Hammami, R., Adly, N., Mohamed, J.Y., Abdulwahab, F., Ibrahim, N., Naim, E.A., Al-Younes, B., Meyer, B.F., Hashem, M., Shaheen, R., Xiong, Y., Abouelhoda, M., Aldeeri, A.A., Monies, D.M., Alkuraya, F.S., 2015. Accelerating novel candidate gene discovery in neurogenetic disorders via whole-exome sequenc- ing of prescreened multiplex consanguineous families. Cell Rep. 10, 148–161.

Al-Mulla, F., 2014. The locked genomes: a perspective from Arabia. Applied & Translation- al Genomics 3, 132–133.

Alsmadi, O., Thareja, G., Alkayal, F., Rajagopalan, R., John, S.E., Hebbar, P., Behbehani, K., Thanaraj, T.A., 2013. Genetic substructure of Kuwaiti population reveals migration history. PLoS One 8, e74913.

Alsmadi, O., John, S.E., Thareja, G., Hebbar, P., Antony, D., Behbehani, K., Thanaraj, T.A., 2014. Genome at juncture of early human migration: a systematic analysis of two whole genomes and thirteen exomes from Kuwaiti population subgroup of inferred Saudi Arabian tribe ancestry. PLoS One 9, e99069.

Bener, A., Alali, K.A., 2006. Consanguineous marriage in a newly developed country: the Qatari population. J. Biosoc. Sci. 38, 239–246.

Bittles, A.H., Black, M.L., 2010. Evolution in health and medicine Sackler colloquium: con- sanguinity, human evolution, and complex diseases. Proc. Natl. Acad. Sci. U. S. A. 107 (Suppl. 1), 1779–1786.

Buchanan, C.C., Torstenson, E.S., Bush, W.S., Ritchie, M.D., 2012. A comparison of cataloged variation between International HapMap Consortium and 1000 Genomes Project data. J. Am. Med. Inform. Assoc. 19, 289–294.

Fernandes, V., Alshamali, F., Alves, M., Costa, M.D., Pereira, J.B., Silva, N.M., Cherni, L., Harich, N., Cerny, V., Soares, P., Richards, M.B., Pereira, L., 2012. The Arabian cradle: mitochondrial relicts of the first steps along the southern route out of Africa. Am. J. Hum. Genet. 90, 347–355.

Giles, J., 2006. Islam and science: oil rich, science poor. Nature 444, 28. Henn, B.M., Botigue, L.R., Gravel, S., Wang, W., Brisbin, A., Byrnes, J.K., Fadhlaoui-Zid, K.,

Zalloua, P.A., Moreno-Estrada, A., Bertranpetit, J., Bustamante, C.D., Comas, D., 2012. Genomic ancestry of North Africans supports back-to-Africa migrations. PLoS Genet. 8, e1002397.

International HapMap Consortium, 2003. The International HapMap Project. Nature 426, 789–796.

International Human Genome Sequencing Consortium, 2004. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945.

Kopp, G.H., Roos, C., Butynski, T.M., Wildman, D.E., Alagaili, A.N., Groeneveld, L.F., Zinner, D., 2014. Out of Africa, but how and when? The case of hamadryas baboons (Papio hamadryas). J. Hum. Evol. 76, 154–164.

Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., Funke, R., Gage, D., et al., 2001. Initial sequencing and analysis of the human genome. Nature 409, 860–921.

Editorial, The germinating seed of Arab genomicsNat. Genet. 38, 851. Pagani, L., Schiffels, S., Gurdasani, D., Danecek, P., Scally, A., Chen, Y., Xue, Y., Haber, M.,

Ekong, R., Oljira, T., Mekonnen, E., Luiselli, D., Bradman, N., Bekele, E., Zalloua, P., Durbin, R., Kivisild, T., Tyler-Smith, C., 2015. Tracing the route of modern humans out of Africa by using 225 human genome sequences from Ethiopians and Egyptians. Am. J. Hum. Genet. 96, 986–991.

Rudan, I., Rudan, D., Campbell, H., Carothers, A., Wright, A., Smolej-Narancic, N., Janicijevic, B., Jin, L., Chakraborty, R., Deka, R., Rudan, P., 2003. Inbreeding and risk of late onset complex disease. J. Med. Genet. 40, 925–932.

Shaheen, R., Faqeih, E., Alshammari, M.J., Swaid, A., Al-Gazali, L., Mardawi, E., Ansari, S., Sogaty, S., Seidahmed, M.Z., AlMotairi, M.I., Farra, C., Kurdi, W., Al-Rasheed, S., Alkuraya, F.S., 2013. Genomic analysis of Meckel-Gruber syndrome in Arabs reveals marked genetic heterogeneity and novel candidate genes. Eur. J. Hum. Genet. 21, 762–768.

Shaheen, R., Almoisheer, A., Faqeih, E., Babay, Z., Monies, D., Tassan, N., Abouelhoda, M., Kurdi, W., Al Mardawi, E., Khalil, M.M., Seidahmed, M.Z., Alnemer, M., Alsahan, N., Sogaty, S., Alhashem, A., Singh, A., Goyal, M., Kapoor, S., Alomar, R., Ibrahim, N.,

Alkuraya, F.S., 2015. Identification of a novel MKS locus defined by TMEM107 muta- tion. Hum. Mol. Genet. 24, 5211–5218.

Shamseldin, H.E., Tulbah, M., Kurdi, W., Nemer, M., Alsahan, N., Al Mardawi, E., Khalifa, O., Hashem, A., Kurdi, A., Babay, Z., Bubshait, D.K., Ibrahim, N., Abdulwahab, F., Rahbeeni, Z., Hashem, M., Alkuraya, F.S., 2015. Identification of embryonic lethal genes in humans by autozygosity mapping and exome sequencing in consanguineous fami- lies. Genome Biol. 16, 116.

Tadmouri, G.O., Nair, P., Obeid, T., Al Ali, M.T., Al Khaja, N., Hamamy, H.A., 2009. Consan- guinity and reproductive health among Arabs. Reprod. Health 6, 17.

Teebi, A.S., Teebi, S.A., 2005. Genetic diversity among the Arabs. Community Genet. 8, 21–26.

Thareja, G., John, S.E., Hebbar, P., Behbehani, K., Thanaraj, T.A., Alsmadi, O., 2015. Sequence and analysis of a whole genome from Kuwaiti population subgroup of Persian ances- try. BMC Genomics 16, 92.

Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A., Gocayne, J.D., Amanatides, P., Ballew, R.M., Huson, D.H., Wortman, J.R., Zhang, Q., Kodira, C.D., Zheng, X.H., Chen, L., Skupski, M., Subramanian, G., Thomas, P.D., Zhang, J., Gabor Miklos, G.L., Nelson, C., Broder, S., Clark, A.G., Nadeau, J., McKusick, V.A., Zinder, N., Levine, A.J., Roberts, R.J., Simon, M., Slayman, C., Hunkapiller, M., Bolanos, R., Delcher, A., Dew, I., Fasulo, D., Flanigan, M., Florea, L., Halpern, A., Hannenhalli, S., Kravitz, S., Levy, S., Mobarry, C., Reinert, K., Remington, K., Abu-Threideh, J., Beasley, E., Biddick, K., Bonazzi, V., Brandon, R., Cargill, M., Chandramouliswaran, I., Charlab, R., Chaturvedi, K., Deng, Z., Di Francesco, V., Dunn, P., Eilbeck, K., Evangelista, C., Gabrielian, A.E., Gan, W., Ge, W., Gong, F., Gu, Z., Guan, P., Heiman, T.J., Higgins, M.E., Ji, R.R., Ke, Z., Ketchum, K.A., Lai, Z., Lei, Y., Li, Z., Li, J., Liang, Y., Lin, X., Lu, F., Merkulov, G.V., Milshina, N., Moore, H.M., Naik, A.K., Narayan, V.A., Neelam, B., Nusskern, D., Rusch, D.B., Salzberg, S., Shao, W., Shue, B., Sun, J., Wang, Z., Wang, A., Wang, X., Wang, J., Wei, M., Wides, R., Xiao, C., Yan, C., Yao, A., Ye, J., Zhan, M., Zhang, W., Zhang, H., Zhao, Q., Zheng, L., Zhong, F., Zhong, W., Zhu, S., Zhao, S., Gilbert, D., Baumhueter, S., Spier, G., Carter, C., Cravchik, A., Woodage, T., Ali, F., An, H., Awe, A., Baldwin, D., Baden, H., Barnstead, M., Barrow, I., Beeson, K., Busam, D., Carver, A., Center, A., Cheng, M.L., Curry, L., Danaher, S., Davenport, L., Desilets, R., Dietz, S., Dodson, K., Doup, L., Ferriera, S., Garg, N., Gluecksmann, A., Hart, B., Haynes, J., Haynes, C., Heiner, C., Hladun, S., Hostin, D., Houck, J., Howland, T., Ibegwam, C., Johnson, J., Kalush, F., Kline, L., Koduru, S., Love, A., Mann, F., May, D., McCawley, S., McIntosh, T., McMullen, I., Moy, M., Moy, L., Mur- phy, B., Nelson, K., Pfannkoch, C., Pratts, E., Puri, V., Qureshi, H., Reardon, M., Rodriguez, R., Rogers, Y.H., Romblad, D., Ruhfel, B., Scott, R., Sitter, C., Smallwood, M., Stewart, E., Strong, R., Suh, E., Thomas, R., Tint, N.N., Tse, S., Vech, C., Wang, G., Wetter, J., Williams, S., Williams, M., Windsor, S., Winn-Deen, E., Wolfe, K., Zaveri, J., Zaveri, K., Abril, J.F., Guigo, R., Campbell, M.J., Sjolander, K.V., Karlak, B., Kejariwal, A., Mi, H., Lazareva, B., Hatton, T., Narechania, A., Diemer, K., Muruganujan, A., Guo, N., Sato, S., Bafna, V., Istrail, S., Lippert, R., Schwartz, R., Walenz, B., Yooseph, S., Allen, D., Basu, A., Baxendale, J., Blick, L., Caminha, M., Carnes-Stine, J., Caulk, P., Chiang, Y.H., Coyne, M., Dahlke, C., Mays, A., Dombroski, M., Donnelly, M., Ely, D., Esparham, S., Fosler, C., Gire, H., Glanowski, S., Glasser, K., Glodek, A., Gorokhov, M., Graham, K., Gropman, B., Harris, M., Heil, J., Henderson, S., Hoover, J., Jennings, D., Jordan, C., Jordan, J., Kasha, J., Kagan, L., Kraft, C., Levitsky, A., Lewis, M., Liu, X., Lopez, J., Ma, D., Majoros, W., McDaniel, J., Murphy, S., Newman, M., Nguyen, T., Nguyen, N., Nodell, M., Pan, S., Peck, J., Peterson, M., Rowe, W., Sanders, R., Scott, J., Simpson, M., Smith, T., Sprague, A., Stockwell, T., Turner, R., Venter, E., Wang, M., Wen, M., Wu, D., Wu, M., Xia, A., Zandieh, A., Zhu, X., 2001. The sequence of the human genome. Science 291, 1304–1351.

Verge, C.F., Vardi, P., Babu, S., Bao, F., Erlich, H.A., Bugawan, T., Tiosano, D., Yu, L., Eisenbarth, G.S., Fain, P.R., 1998 Oct 15. Evidence for oligogenic inheritance of type 1 diabetes in a large Bedouin Arab family. J Clin Invest. 102 (8), 1569–1575.

Yavarna, T., Al-Dewik, N., Al-Mureikhi, M., Ali, R., Al-Mesaifri, F., Mahmoud, L., Shahbeck, N., Lakhani, S., AlMulla, M., Nawaz, Z., Vitazka, P., Alkuraya, F.S., Ben-Omran, T., 2015. High diagnostic yield of clinical exome sequencing in Middle Eastern patients with Mendelian disorders. Hum. Genet. 134, 967–980.

Zayed, H., Ouhtit, A., 2016. Accredited genetic testing in the Arab Gulf region: reinventing the wheel. J. Hum. Genet. http://dx.doi.org/10.1038/jhg.2016.22 (Epub ahead of print).

  • This link is http://www.qatar-tribune.com/viewnews.aspx?d=amp;catation2&amp;pge=,",
  • The Arab genome: Health and wealth
    • 1. Introduction
    • 2. The Arab world
      • 2.1. Inbred Arab communities and rare variants discovery
    • 3. The Arab genome
      • 3.1. Discovery of novel disease-causing genes and the Arab genome
      • 3.2. Arab efforts in genome sequencing
      • 3.3. The Arab genome and the “Out of Africa” theory
      • 3.4. Benefits of sequencing the Arab genome
    • 4. Conclusion
    • Disclosure declaration
    • References