advanced algorithm (python)

pablo90
thebarabasialbertmodel.pdf

Advanced Algorithms and Computational Models (module A)

The Barabasi-Albert Model

Giacomo Fiumara

giacomo.fiumara@unime.it

2015-2016

1 / 67

The Barabasi-Albert Model

Hubs represent the most striking di�erence between a random

and a scale-free network

The existence of hubs such as google.com or molecules like ATP (involved in a number of chemical reactions) raises two

questions:

� Why do so di�erent system as the WWW or the cell converge

to a similar scale-free architecture?

� Why does the random network model fail to reproduce the

hubs and the power laws observed in real networks?

2 / 67

Growth and Preferential Attachment

Why are hubs and power laws absent in random networks?

The answer was found in 1999 (Barabasi and Albert, 23944

citations)

Author highlighted two hidden assumptions of the Erdös-Rényi

model

� Networks expand through the addition of new nodes

� Nodes prefer to link to the more connected nodes

3 / 67

Growth and Preferential Attachment Networks expand through the addition of new nodes

� In 1991 the WWW had a single node, the �rst webpage built

by TBL, the creator of the Web

� Today the Web has over a trillion (1012) documents

� This number was reached through the continuous addition of

new documents by millions of individuals and institutions

4 / 67

Growth and Preferential Attachment Networks expand through the addition of new nodes

5 / 67

Growth and Preferential Attachment Nodes prefer to link to the more connected nodes

The random network model assumes that we randomly choose

the interaction partner of a node

Most real network nodes prefer to link to the more connected

nodes (preferential attachment)

6 / 67

Growth and Preferential Attachment Nodes prefer to link to the more connected nodes

� We are familiar with only a tiny fraction of the trillion web

documents. We are more likely yo link to a high-degree node

than to a node with only few links

7 / 67

Growth and Preferential Attachment Nodes prefer to link to the more connected nodes

� No scientist can attempt to read the more than a million

scienti�c papers published each year. The more cited is a

paper, the more likely that we hear about it and eventually

read it. We cite what we read, therefore our citations are

biased towards the more cited publications (the high-degree

nodes of the citation network)

8 / 67

Growth and Preferential Attachment Nodes prefer to link to the more connected nodes

� The more movies an actor has played, the more familiar is a

casting director with her skills. Hence, the higher the degree of

an actor in the actor network, the higher are the chances that

she will be considered for a new role

9 / 67

Growth and Preferential Attachment Nodes prefer to link to the more connected nodes

10 / 67

Growth and Preferential Attachment

The random network model di�ers from real networks in two

important characteristics:

� Growth

Real networks are the result of a growth process that

continuously increases N. In contrast, the random network model assumes that the number of nodes N is �xed

� Preferential Attachment

In real networks new nodes tend to link to the more connected

nodes. In contrast, nodes in random networks randomly

choose their interaction partners

11 / 67

The Barabasi-Albert Model

The Barabasi-Albert (BA) model is de�ned as follows:

We start with m0 nodes, the links between which are chosen arbitrarily, as long as each node has at least one link. The

network develops following two steps:

� Growth � At each timestep we add a new node with m (≤ m0) links that connect the new node to m nodes already in the network

� Preferential attachment � The probability Π(k) that a link of the new node connects to node i depends con the degree ki as

Π(ki ) = ki∑ j kj

12 / 67

The Barabasi-Albert Model

13 / 67

The Barabasi-Albert Model

� Preferential attachment is a probabilistic mechanism

� A new node is free to connect to any node in the network,

whether it is a hub or has a single link

� However, it is highly probable that a new node connects to a

high degree node

� After t timesteps, the BA model generates a network with N = t + m0 nodes and m0 + mt links

� The obtained network has a power-law degree distribution with

degree exponent γ = 3

14 / 67

The Barabasi-Albert Model

15 / 67

Degree Dynamics

To understand the emergence of the scale-free property it is

necessary to focus on the time evolution of the BA model

� In the model, an existing node can increase its degree each

time a new node enters the network

� This node will link to m of the N(t) nodes already present in the system

� The probability that one of these links connects to node i is

Π(ki ) = ki∑ j kj

16 / 67

Degree Dynamics

� Let us approximate the degree ki with a continuous real variable

� The rate at which an existing node i acquires links as a result of new nodes connecting to it is

dki dt

= mΠ(ki ) = m ki

N−1∑ j=1

kj

� The coe�cient m describes that each node arrives with m links

� Hence, node i has m chances to be chosen

� Moreover N−1∑ j=1

kj = 2mt −m = m(2t −1)

17 / 67

Degree Dynamics

Therefore dki dt

= m ki

m(2t −1) =

ki 2t −1

For large t the −1 term can be neglected in the denominator, obtaining

dki ki

= 1

2

dt

t

which can be integrated using the fact that ki (ti ) = m (node i joins the network at time ti with m links). We obtain:

ki (t) = m

( t

ti

)β Where β = 1

2 is called dynamical exponent

18 / 67

Degree Dynamics Some considerations

ki (t) = m

( t

ti

� The degree of each node increases following a power-law with

the same dynamical exponent β. Hence all nodes follow the same dynamical law

� The growth in the degrees is sublinear (β < 1). This is a consequence of the growing nature of the BA model: each

node has more nodes to link to than the previous node.

Hence, with time the existing nodes compete for links with an

increasing pool of other nodes

19 / 67

Degree Dynamics Some considerations

ki (t) = m

( t

ti

� The earlier node i was added, the higher is its degree ki (t). Hence, hubs are large because they arrived earlier, a

phenomenon called �rst-mover advantage in marketing and

business

� The rate at which the node i acquires new links is given by

dki (t)

dt =

m

2

1 √ tit

indicating that in each time step older nodes acquire more

links (as they have smaller ti). The rate at which a node acquires links decreases with time as t−1/2

20 / 67

Degree Dynamics Some considerations

21 / 67

Degree Distribution

� To calculate the degree distribution of the BA model in the

continuum approximation we �rst calculate the number of

nodes with degree smaller than k, i.e. ki (t) < k

� Using

ki (t) = m

( t

ti

)β we can obtain

ti > t

( m

k

) 1/β

� In the BA model, a node is added at equal time step.

Therefore, the number of nodes with degree smaller than k is

t

( m

k

) 1/β

22 / 67

Degree Distribution

� Altogether there are N = m0 + t nodes, which becomes N ≈ t in the large t limit

� Therefore the probability that a randomly chosen node has

degree k or smaller, which is the cumulative degree distribution, follows

P(k) = 1− ( m

k

) 1/β

� By taking the derivative we obtain the degree distribution

pk = ∂P(k)

∂k =

1

β

m1/β

k1/β+1 = 2m2k−γ

with γ = 1 β

+ 1 = 3

23 / 67

Degree Distribution

The continuum theory predicts the correct degree exponent,

but it fails to accurately predict the pre-factors. The exact

degree distribution of the BA model can be obtained using

other approaches and is

pk = 2m(m + 1)

k(k + 1)(k + 2)

24 / 67

Degree Distribution

pk = 2m(m + 1)

k(k + 1)(k + 2)

The degree exponent γ is independent of m, a prediction that agrees with the numerical results

m0 = m = 1 (blue), 3 (green), 5 (grey), 7 (orange)

25 / 67

Degree Distribution

pk = 2m(m + 1)

k(k + 1)(k + 2)

The degree distribution of the BA model is independent of

both t and N. This is in agreement with real networks that di�er in age and size

N = 50000 (blue), N = 100000 (green), N = 200000 (grey) 26 / 67

Absence of Growth or Preferential Attachment

The coexistence of growth and preferential attachment in the

BA model leads to a question: Are they both necessary for the

emergence of the scale-free property?

Is it possible to generate a scale-free network with only one of

the two ingredients?

27 / 67

Absence of Growth or Preferential Attachment Model A: absence of preferential attachment

Model A starts with m0 nodes and follows these steps:

� Growth � At each time step we add a new node with m (≤ m0) links that connect to m nodes added earlier

� Preferential Attachment � The probability that a new node

links to a node with degree ki is

Π(ki ) = 1

m0 + t −1

� This means that Π(ki ) is independent of ki, indicating that new nodes choose randmly the nodes they link to

28 / 67

Absence of Growth or Preferential Attachment Model A: absence of preferential attachment

� For Model A the continuum theory predicts that ki (t) increases logarithmically with time

ki (t) = mln

( e m0 + t −1 m0 + ti −1

) � Consequently, the degree distribution follows an exponential

p(k) = e

m exp

( −

k

m

) � An exponential function decays faster than a power law, hence

it does not support hubs

� The lack of preferential attachment eliminates the scale-free

character of hubs from network. Indeed, as all nodes acquire

links with equal probability, we lack a rich-get-richer process

29 / 67

Absence of Growth or Preferential Attachment Model A: absence of preferential attachment

m0 = m = 1 (circles), 3 (squares), 5 (diamonds), 7 (triangles)

30 / 67

Absence of Growth or Preferential Attachment Model B: absence of growth

Model B starts with N nodes and evolves following this step

� Preferential Attachment � At each time step a node is

selected randomly and connected to node i with degree ki already present in the network, where i is chosen with probability Π(k). Π(0) = 0, therefore nodes with k = 0 are assumed to have k = 1, otherwise they can not acquire links

� In Model B the number of nodes remains constant during the

evolution of the network, while the number of links increases

linearly with time. As a result:

ki (t) = 2

N t

31 / 67

Absence of Growth or Preferential Attachment Model B: absence of growth

� At early time, when there are only a few links in the network

(L � N) each new link connects previously unconnected nodes. In this stage, the evolution is indistinguishable from the

BA model with m = 1

� After a transient period, the node degree converge to the

average degree and the degree develops a peak. For

t → N(N −1)/2 the network becomes a complete graph in which all nodes have degree kmax = N −1, hence pk = δ(N −1)

32 / 67

Absence of Growth or Preferential Attachment Model B: absence of growth

t = N (circles), t = 5N (squares), t = 40N (diamonds)

33 / 67

Absence of Growth or Preferential Attachment

� The absence of preferential attachment leads to a growing

network with a stationary but exponential degree distribution

� The absence of growth leads to the loss of stationarity, forcing

the network to converge to a complete graph

� This failure of models A and B to reproduce the empirically

observed scale-free distribution indicates that growth and

preferential attachment are simultaneously needed for the

emergence of the scale-free property

34 / 67

Measuring Preferential Attachment

Growth and preferential attachment are jointly responsible for

the scale-free property

� Growth is easily detectable: all large networks have reached

their size by adding new nodes

� Preferential attachment is also present in real networks, and

can be detected experimentally

35 / 67

Measuring Preferential Attachment

Preferential attachment relies on two distinct hypotheses:

� The likelihood to connect to a node depends on that node's

degree k. This is in contrast with the random network model, for which Π(k) is independent of k

� The functional form of Π(k) is linear in k

Both hypotheses can be tested by measuring Π(k), which can be determined for systems for which we known the time at

which each node joined the network

36 / 67

Measuring Preferential Attachment

Consider a network for which we have two di�erent maps,

taken at time t and t + ∆t

37 / 67

Measuring Preferential Attachment

Π(ki ) = ki∑ j

kj

For nodes that changed their degree during the ∆t time frame we measure

∆ki = ki (t + ∆t) −ki (t)

The relative change is

∆ki ∆t ∼ Π(ki )

This approximation holds if ∆t is small, so that the changes in ∆k are modest. But ∆t must not be too small so that there are still detectable di�erences between the two networks

38 / 67

Measuring Preferential Attachment

The curve ∆ki/∆t can be very noisy

To reduce this noise the cumulative preferential attachment

function is used instead

π(k) = k∑

ki =0

Π(ki )

� In the absence of preferential attachment we have

Π(ki ) = constant, hence π(k) ∼ k � If preferential attachment is present, i.e. if Π(ki ) = ki, it follows π(k) ∼ k2

39 / 67

Measuring Preferential Attachment

40 / 67

Measuring Preferential Attachment

� In the previous Figure are shown the measured π(k) for four real networks

� For each system we observe a faster than linear increase in

π(k), indicating the presence of preferential attachment

� Π(k) can be approximated with

Π(k) ∼ kα

� For the Internet and the citation networks we have α ≈ 1, indicating that Π(k) depends linearly on k

� For the co-authorship and the actor networks the best �t

provides α ≈ 0.9±0.1, indicating the presence of a sublinear preferential attachment

41 / 67

Measuring Preferential Attachment

The theoretical results predict the existence of four scaling

regimes:

� No preferential attachment (α = 0) � The network has a simple exponential degree distribution. Hubs are absent and

the resulting network is similar to a random network

� Sublinear regime (0 < α < 1) � In this region fewer and smaller hubs are present than in a scale-free network. As

α → 1 pk follows a power law over an increasing range of degrees

� Linear regime (α = 1) �This corresponds to the BA model, hence the degree distribution follows a power law

� Superlinear regime (α > 1) � The high-degree nodes are very attractive. In this con�guration the earliest nodes become

super hubs and all subsequent nodes link to them.

42 / 67

The Origins of Preferential Attachment

The key role of preferential attachment poses another

question: where does it come from? From this question two

more detailed questions derive:

� Why does Π(k) depend on k?

� Why is the dependence of Π(k) linear in k?

43 / 67

The Origins of Preferential Attachment

Two philosophically di�erent answers emerged to these

questions:

� The �rst view preferential attachment as the interplay between

random events and some structural property of a network.

These mechanisms rely on random events and are therefore

called local or random mechanisms

� The second assumes that each new node or link balances

con�icting needs, hence they are preceeded by a cost-bene�t

analysis. These models assume familiarity with the whole

network and rely on optimization principles and are therefore

called global or optimized mechanisms

44 / 67

The Origins of Preferential Attachment Local mechanisms

The BA model postulates the presence of preferential

attachment

Yet, it is possible to build models that generate scale-free

networks apparently without preferential attachment. They

work by generating preferential attachment

Two models have been developed:

� Link Selection Model

� Copying Model

45 / 67

The Origins of Preferential Attachment Local mechanisms: Link Selection Model

The link selection model is a simple example of a local

mechanism that generates a scale-free network without

preferential attachment. It is de�ned as follows:

� Growth � At each time step we add a new node to the

network

� Link Selection � We select a link at random and connect the

new node to one of the two nodes at the two ends of the

selected link

This model requires no knowledge about the overall network

topology, hence is inherently local and random

Yet, it generates preferential attachment

46 / 67

The Origins of Preferential Attachment Local mechanisms: Link Selection Model

47 / 67

The Origins of Preferential Attachment Local mechanisms: Link Selection Model

The link selection model is a simple example of a local

mechanism that generates a scale-free network without

preferential attachment. It is de�ned as follows:

� Growth � At each time step we add a new node to the

network

� Link Selection � We select a link at random and connect the

new node to one of the two nodes at the two ends of the

selected link

This model requires no knowledge about the overall network

topology, hence is inherently local and random

Yet, it generates preferential attachment

48 / 67

The Origins of Preferential Attachment Local mechanisms: Link Selection Model

The probability qk that the node at the end of a randomly chosen link has degree k is

qk = Ckpk

� The higher the degree of the node, the higher the chance that

it is located at the end of the chosen link

� The more degree-k nodes are in the network, the more likely that a degree k node is at the end of the link

49 / 67

The Origins of Preferential Attachment Local mechanisms: Link Selection Model

� C can be calculated using the normalization condition∑ qk = 1

from which one get∑ qk = 1 =⇒ C

∑ kpk = 1

C〈k〉 = 1 =⇒ C = 1

〈k〉 and therefore

qk = kpk 〈k〉

which represents the probability that a new node connects to a

node with degree k

50 / 67

The Origins of Preferential Attachment Local mechanisms: Copying Model

The copying model mimics a simple phenomenon: the authors

of a new webpage tend to borrow links from other pages on

related topics

In each time step a new node is added to the network. To

decide where it connects we randomly select a node u

Then a two-step procedure is followed:

51 / 67

The Origins of Preferential Attachment Local mechanisms: Copying Model

� Random connection � With probability p the new node links to u, which means that we link to the randomly selected node

� Copying � With probability 1−p we randomly choose an outgoing link of node u and link the new node to the target of the link. In other words, the new node copies a link of node u and connects it to its target rather than connecting to node u directly

52 / 67

The Origins of Preferential Attachment Local mechanisms: Copying Model

53 / 67

The Origins of Preferential Attachment Local mechanisms: Copying Model

The probability of selecting a particular node in the �rst step is

1/N

The probability of selecting a node linked to a degree-k node through this copying step is

k

2L

for undirected networks. Therefore:

Π(k) = p

N +

1−p 2L

k

which is linear in k and therefore predicts a linear preferential attachment

54 / 67

The Origins of Preferential Attachment Local mechanisms: Copying Model

The copying model is popular due to its relevance to real

systems:

� Social Networks � The more acquaintances an individual has,

the higher is the chance that she will be introduced to new

individuals by her existing acquaintances. We copy the friends

of our friends

� Citation Networks � Authors decide what to read and cite by

copying references from the papers they have read. Papers

with more citations are more likely to be studied and studied

again

� Protein Interactions � Gene duplication, responsible for the

emergence of new genes in a cell, can be mapped into the

copying model, explaining the scale-free nature of protein

interaction networks

55 / 67

The Origins of Preferential Attachment Global mechanisms: Optimization

� According to an assumption of economics, humans make

rational decisions, balancing cost against bene�ts

� In other words, each individual aims to maximize its personal

advantage. Such rational decisions can lead to preferential

attachment

� Consider Internet, whose nodes are routers connected via

cables. Each new router will choose its link to balance access

to good network performance with the cost of laying down a

new cable

56 / 67

The Origins of Preferential Attachment Global mechanisms: Optimization

Let us consider a network. At each time step we add a new

node i and calculate the cost function

Ci = minj [δdij + hj ]

which compares the cost of connecting to each node j already in the network

� dij is the Euclidean distance between the new node i and the potential target j

� hj is the network-based distance of node j to the �center� of the network

57 / 67

The Origins of Preferential Attachment Global mechanisms: Optimization

58 / 67

The Origins of Preferential Attachment Global mechanisms: Optimization

Ci = minj [δdij + hj ]

Three distinct network topologies emerge, depending on the

value of the parameter δ and N

� Star Network (δ < (1/2)1/2)

� Random Network (δ ≥ N1/2) � Scale-free Network (4 ≥ δ ≥ N1/2)

59 / 67

The Origins of Preferential Attachment Global mechanisms: Optimization

Ci = minj [δdij + hj ]

Star Network (δ < (1/2)1/2)

For δ = 0 the Euclidean distances are irrelevant, hence each node links to the central node, turning the network into a star

We have a star con�guration whenever the term hj dominates over δdij

60 / 67

The Origins of Preferential Attachment Global mechanisms: Optimization

61 / 67

The Origins of Preferential Attachment Global mechanisms: Optimization

Ci = minj [δdij + hj ]

Random Network (δ ≥ N1/2)

For very large δ the contribution provided by the distance term δdij overwhelms hj

In this case each new node connects to the node closest to it

The resulting network will have a bounded degree distribution,

like a random network

62 / 67

The Origins of Preferential Attachment Global mechanisms: Optimization

63 / 67

The Origins of Preferential Attachment Global mechanisms: Optimization

Ci = minj [δdij + hj ]

Scale-free Network (4 ≥ δ ≥ N1/2)

For intermediate values of δ the network develops a scale-free topology

64 / 67

The Origins of Preferential Attachment Global mechanisms: Optimization

The origin of the power law distribution in this regime is due

to:

� Optimization � Each node has a basin of attraction, so that

nodes landing in this basin will always link to it. The size of

each basin correlates with hj of node j at its center, which in turn correlates with the degree kj of the node

� Randomness � We choose randomly the location of the new

node, ending in one of the N basins of attraction. The node with the largest degree has largest basin of attraction, hence

gains the most new nodes and links. This leads to preferential

attachment

65 / 67

The Origins of Preferential Attachment Global mechanisms: Optimization

66 / 67

The Origins of Preferential Attachment Conclusions

The mechanism responsible for preferential attachment can

have two fundamentally di�erent origins

� Random processes, like link selection or copying

� Optimization, when new nodes balance con�icting criteria as

they decide where to connect

Each of these mechanisms lead to linear preferential

attachment, as assumed in the BA model

Linear preferential attachment is present in so many and so

di�erent systems because it can come from both rational

choice and random actions

67 / 67