Graph Analysis

profilemaqi0912
302Transcript.docx

Slide 1: Greetings and Check-in

Hello everyone!

What a bummer you have to self-quarantine yourself for the next two weeks when the new semester just began. However, this decision is made for you and your safety most of all. So, please try to think of it positively, and let’s focus on our work in hand—doing well in your coursework. Because my family and I live in East Lansing, this self-quarantine affects us as well indirectly in many ways—like restaurants closing down, childcare centers adjusting their operation hours. I have to admit that it is cumbersome and inconvenient, but I also believe that this decision is for all of us who are in this area together. Let’s build a sense of community, accountability, and just concentrate on “one thing at a time.”

We are here to provide you with any support you need. Please feel free to contact the two co-instructors as well as me if you need any help. We have office hours three days a week—Monday by Olivia, Tuesday by me, and Thursday by Chris. Please take advantage of this large amount of virtual “face” time.

You had the first weekly assignments due yesterday at 5pm. This will be repeated each week. You will have quizzes from my lecture, exercise questions from your recitation on Thursday, and a discussion question each week. All of these are due by 5pm the following Monday. These assignments are assigned to ensure that each one of you actually follow through our recorded sessions, doing work on a regular basis determines your success in this asynchronous course, or any other course for that matter. We want to facilitate your learning by giving you weekly homework assignments which will encourage you to read the textbook, do the exercise questions to increase your comprehension and application of the learned concepts, and then finally build a sense of community, using the D2L Discussion Forum, amongst yourselves. It is difficult to have that kind of “interactions” among students in an asynchronous course. We want to give you that sense of belonging and interpersonal interactions by providing you with a discussion forum where you find a community of peers.

Slide 2: Recap of the last week’s lecture

Let’s start this week’s lecture by summarizing the key concepts discussed in the last class.

Last week, we talked about why it is important to understand a network, especially as a media and information major.

Slide 3: What is a social network?

A simple and common-sense-based answer is: “Because everything is connected,” as you can see in this example. Understanding how important objects and people are connected, exerting influence on each other, can enable you to grasp the real impact of your or anyone’s actions, not in isolation but as a whole. Now is a great time to understand this. My child goes to a preschool. What if another child who has just a mild fever were sent to school anyway? For this particular child, it is an individual family’s decision. Perhaps the dad was busy; the mom didn’t check the child’s temperature. But this small individual decision can have monumental impacts on other kids and families in the preschool. This is why we need to learn about “networks” that surround us, networks which we are part of, whether we perceive it or not.

Slide 4: What is a network?

By understanding networks, you will also be able to make your surroundings favorable to you. By understanding you are a part of a network, and a network exerts influences on you, and how it works to maintain its equilibrium, you will start to find ways in which you can make the network “work” for you. Exactly how? How about finding job opportunities earlier than other peers? How about getting connected to someone who is a powerful constituent of an institution where you want to establish yourself? How about understanding how institutions (both materialized and not materialized) arise and fall? Perhaps you can build these institutions along with other members of the network. Perhaps you can create social influence and exert it to others, institutions (which are made of groups of people), and the entire world (which is the entity of connected institutions).

This ability to understand the underlying structure and behaviors of a network becomes essential as our society is increasingly connected through digital media. These skills were always useful, but nowadays becoming even more important. In other words, “opportunities are everywhere to be noticed and exploited, by you.” Good examples are social networking sites, Google search engines, emails, enterprise solutions (e.g., organizational email such as MSU email, MSU employee system, registration system or perhaps student systems, which you use on a daily basis), websites, and so on. All these digital networks make your connections explicit, observable, and quantifiable, making it an essential asset to create your competitive edge.

Back go Slide 2:

Last week, we also discussed that we can understand a network from two different levels: structural level and behavioral level. First at the level of structure: How a network is structured and organized. What its components are. How the components are connected to one another. How distant each components are from each other? What is being disseminated through these connections. For this level of understanding, we resort to Graph Theory.

Second at the level of behaviors: How components exert influence on one another, defining the ways in which individuals within the network behave. How does the network reach the state of equilibrium where all the members of a network commit to a strategy and reach a point where everyone in the network gets the best outcome given the network? This is a fascinating topic. If one member gets the most out of a network, other members will be unhappy, disrupting the stability of the network. Thus, everyone should commit to a course of action that will satisfy everyone in the network, even though that solution is not ideal. For this level of understanding, we resort to Game Theory.

From today for the next two weeks, we will learn the first level of understanding a network—at the structural level. Thus, we will Graph theory for three weeks. [Animation] To do so, in this week, we first learn the basic concepts of Graph Theory and their applications to real-world problems. We will discuss basic concepts in today’s class, and then apply the advanced concepts of them in Thursday’s recitation. Are we ready? If so, let’s move onto the next slide.

Slide 5: Basic Concepts of the graph theory

· A graph is a way of specifying relationships among a collection of items. For instance, look at this example on the right. This is the depiction of a relationship.

· A node is an object that is connected in the graph. In this example, A, B, C, D are objects. These nodes can be computers and telephones, as well as people. They can be institutions, such as companies and business units, and they can be units of information, such as articles and research projects.

· We call the two connected nodes as “neighbors.”

Slide 6: Edge

· An edge is a link that connects a pair of nodes. This also called as ties, arcs, lines, and links. It denotes different types of social relations, such as friendship, kindship, advice (giving and receiving), hindrance (blockage of relationships), and sexual relationships. We will see later on an example of romantic relationships depicted in a graph.

· An edge allows for flows of a wide variety of things, including messages, money, and diseases (of course, novel coronavirus, as well as any communicable diseases).

· An edge can contain a direction when a direction is asymmetric as shown in this example. In this example of a directed graph, there are 4 nodes with directions. Compare this graph against the one before. The prior one does not have directions.

Slide 7: Paths and Cycles

· Path is a sequence of nodes with the property that each consecutive pair in the sequence is connected by an edge.

· containing not just the nodes but also the sequence of edges linking these nodes

Slide 8: Roles of Graph

· Look at this Figure 2.2 and 2.3 as examples. Figure 2.2 is the physical network structure of the Internet in 1950’s in the US. Figure 2.3 is a graphical depiction of this physical network. In other words, this graph is a mathematical model of a physical network structure, and represents how things are either physically (in this case) or logically (in other cases) linked to one another in a network structure.

· What is a path in this example? Recall that a path is a sequence of nodes that each consecutive pair in the sequence is connected by an edge. By this definition, the sequence of nodes mit, bbn, rand, ucla is a path in the Internet graph, as is the sequence case, lincoln, mit, utah, sri, ucsb.

Back to Slide 7: Paths and Cycles

· A cycle is a “ring” structured path as shown in the example below. A path in this example is from Node 1, Node 2, Node 3, and Node 4. In comparison, the example on the top has many distinct paths: (1) Node 1, HUB, Node 3, (2) Node 2, HUB, Node 3, and so on.

In this ring structure, if Node 1 wants to reach Node 4, it should go through 2, 3, and finally 4. But in the star network in the top example, Node 1 can go to Node 4 via Hub. Thus, a star network is more efficient than a ring structure, which has some redundancy.

In fact, this redundancy was designed intentionally and deliberately.

· Why do you think this redundancy was necessary? Please watch this video clip about how Internet was developed and what purpose it was built for. By understanding the historic background of the Internet development, you’ll notice why engineers intentionally built in “redundancy” of the Internet structure. In communication and transportation networks are often present to allow for redundancy — they provide for alternate routings that go the “other way” around the cycle.

· Play the video

· As you saw in this video, the Internet was first developed to enhance national defense during the cold war. Thus, engineers needed to design the network in such a way that it would withstand the atomic attacks from the Soviet Union. A star network would collapse, thus jeopardizing the national security, if the HUB is attacked and demolished. However, a ring structure would hold even though one or two nodes are destroyed due to its redundancy. In the example here, even though Node 1 is destroyed, Node 2, 3, and 4 are still connected and function.

· Now please answer the question, “What is the advantage of a ring structure as opposed to a star structure?” Answer this question in the context of Internet development in the US in 1950. What purpose did the Internet serve when it was first developed? Why is this redundancy particularly important to serve this purpose?

· This is your first quiz question of the day. Please answer your Quiz Question 1. I will give you 1 minute. You can either pause this video to answer this question NOW or jot down your answer on the piece of paper and enter it later on D2L.

Slide 9: Connectivity

· Connectivity refers to the state of a graph in which every pair of nodes is connected via a path. The example on the right (Figure 2.6) is built from the collaboration graph at a biological research center.

Here, nodes denote researchers

Edge denote co-authored publications

Do you want to see my network of co-authors? This is just a snapshot of my current co-authors, and do not include everyone. However, this gives you a sense of this co-authors’ network.

https://wwwprofargyris.net

Go to, Research Team, to see a small set of my current faculty collaborators

Then, go to Current Projects to see how these collaborators are allocated to each project that I am currently carrying out. Just like this example of my real-world collaborator network, this figure 2.6 shows collaborators’ networks in a research center.

· In this example, what is the most prominent node? Obviously the one in the center. This shows the degree of centrality, which we will discuss next week.

· But how do you know which node is most central to the network? What criteria did you apply. I am sure that almost everyone who’s watching this video easily identified the central node to this network. But let’s step back and ask yourself this question to define the centrality more formally. How did you figure it out?

Perhaps you thought about, who knows the most actors? Who has the shortest distance to the other actors? Who controls knowledge flows? What happens if this most prominent central node is removed?

I believe the answer is that if this central node is removed, the component will break into three separate components, which will be detrimental to the knowledge and information flow within this research center. Thus, a central node is critical for the utility and functions of a network. I just said the word, “a component.” Let’s discuss components in the next slide.

Slide 10: Connected components (or simply called “components)

· What are connected components? A subset of the nodes such that: (i) every node in the subset has a path to every other; and (ii) the subset is not part of some larger set with the property that every node can reach every other.

· (1) means that components are internally connected, even though there are no direct path between each and every node.

· (ii) means that a component is a free-standing on its own right. Not a part of a larger network.

Go back to Slide 9:

· Now, time for your second quiz. In this example in Figure 2.6 in Slide 9, not the one on the right in slide 10, How many connected components do you see?

· Demonstration Slide 9

Go back to Slide 10:

· Demonstration Slide 10

· Please answer your Quiz Question 2. I will give you 1 minute. You can either pause this video to answer this question NOW on D2L or jot down your answer on the piece of paper and enter it later on D2L.

Slide 11: Giant Components

· A giant component refers to a connected component that contains a significant fraction of all the nodes. In Figure 2.6, what is a giant component? Each of the two examples below shows Figure 2.6. Ignore the one on the right, and focus on the one on the left. In this graph, which one is a giant component? Demonstration. Obviously, the one in the middle.

· When a network contains a giant component, it almost always contains only one. In other words, you rarely see two separate giant components in one graph. Why do you think it is the case?

· To see why, let’s consider an example of the global friendship network. Let’s say that the following example shows such a global friendship network. I know it is an example of collaborators in a research center, but let’s assume that it is a network of global friendship for now. This network depicts all the friendships in the whole world if it is possible to depict such a large network. Then, try imagining that there were two giant components, each with hundreds of millions of friends. All it would take is a single edge from someone in the first of these components to someone in the second, and the two giant components would merge into a single component. Demonstration

· Just a single edge — in most cases, it’s essentially inconceivable that some such edge wouldn’t form, and hence two co-existing giant components are something one almost never sees in real networks.

Slide 12: Two giant components’ coexistence in history

· There were a few instances in human history where two giant components co-existed, but that state did not last long and collapsed in such a way where one dominates the other.

· This is excellently described in the book, titled, Guns, Germs, and Steel, the Fate of Human Societies by the author, Jared Diamond.

· According to Diamond, two such giant components exist: One in Americas and the other in Europe. Then, Europeans “discovered” the Americas, and then quickly dominated the natives. How did they do that quickly? The author explained that the European component separately developed technologies and armors (hence the title, Guns and Steel) while native Americans were not able to do so (given the separation from the rest of the world). Were they all? No. Eurasians also had been exposed to germs through interactions with each other for thounsands of years, and hence developed their immunity. However, native Americans had not been exposed to these germs. As a result, the landing of Europeans in the Americas led to the deaths of millions of people.

· Let’s watch the video shortly. It’s rather a long video, so we will play only a part of it. When you watch this video, try to answer this question: When the two giant components clashed, what happened? (Play the video)

· Now, answer your Quiz Question 3. When the two giant components clashed, what happened? Did the two giant components continue to co-exist? Or did one dominate the other? Based on this video, explain why it is very rare, if not impossible, to see two giant components coexist. I’ll give you one minute. You can either pause this video to answer this question NOW on D2L or jot down your answer on the piece of paper and enter it later on D2L.

Slide 13: Giant component and its influence

· Now let’s talk about the influence and implications of a giant component. Figure 2.7 in your textbook shows the romantic relationships in an American high school over an 18-month period. It is not a snapshot of relationships at one point, but over 18 months.

· As you can see, a giant component exists with some small components. There may be what we call, “popular kids” circles. Or mean girls. (Animation) This is VERY common in graphs. You see one giant component in a graph.

· The fact that this graph contains such a large component is significant when one thinks about the spread of sexually transmitted diseases

· One high-school student may have had a single partner over this time period and nevertheless — without realizing it — be part of this large component and hence part of many paths of potential transmission.

Slide 14: Distance

· Now let’s discuss the concept of distance: Distance denotes the length of a path. That is the number of steps it contains from beginning to end

In order to understand this concept, a technique called “Breadth-First Search,” is helpful.

1. You first declare all of your actual friends to be at distance 1.

2. (2)  You then find all of their friends (not counting people who are already friends of yours), and declare these to be at distance 2.

3. (3)  Then you find all of their friends (again, not counting people who you’ve already found at distances 1 and 2) and declare these to be at distance 3.

(...) Continuing in this way, you search in successive layers, each representing the next distance out. Each new layer is built from all those nodes that (i) have not already been discovered in earlier layers, and that (ii) have an edge to some node in the previous layer.

This technique is called breadth-first search, since it searches the graph outward from a starting node, reaching the closest nodes first. In this example, you can clearly see the concept of distance.

· Distance between the two nodes is defined as the length of the shortest path between them. Why is it important? Because there can be more than one path from a node to another. (Demonstrate) Distance denotes the shortest path between the two nodes, not any path.

Slide 15: Small World Phenomenon

· Why do you think defining the distance as the shorted path between the two nodes matters to us? What implications does distance have to us, and our daily lives?

· It shows how many layers of relationships we should go through to reach our target—whatever it may be. A long distance implies that reaching the target may be impossible or takes too much time and effort, while a short distance implies the goal attainment is feasible, because it is easier for you to get to the goal or the target.

· In fact, many prior studies have shown that our world is small. The average distance is <7 (about .66 to be exact). To reach someone in any part of the world, you need to go through only fewer than 7 nodes. Isn’t it fascinating?

· How about Hollywood movie stars? It seems impossible to become popular movie stars, but in fact the average distance from Kevin Bacon to any other actors was less than 3.

· Using cast lists from the Internet Movie Database (IMDB), it is possible to compute Bacon numbers for all performers via breadth-first search. The average Bacon number, over all performers in the IMDB, is approximately 2.9, and it’s a challenge to find one that’s larger than 5.

· This is unfortunately very clear with the spread of the novel corona virus. The virus travels from a part of the world we have never been to, because we are all connected only through less than 7 degrees of separation.

Slide 16: How do we measure “real” social networks?

· This short world phenomenon should make you feel excited about graphs. How nice would it be if we know exactly who to connect to in order to get that job we want? If we can tell the exact path from us to the target, we can perhaps strategize ourselves better. However, it is awfully difficult, if not impossible, to construct the entire social graph. Let’s discuss thelimitations and challenges of constructing graphs.

· First of all, interpersonal relationships, intensity of relationships, subtle acquaintance – impossible to measure, because we need to define a link in an observable way (e.g., email exchanged, business transactions made, ”friends’ lists” on FB, Instant Messages exchanged). And these quantifications of a relationship often comes down to oversimplification of human’s complicated relationships.

· What are some other limitations?

· Lack of participation: What if no one wants to disclose their social networks?

· Dishonest answers: What if someone pretends that they knew more people than they actually know?

· The size and scope of network: It becomes huge, multidimensional, and complicated as shown in this hand-written graph on the left, and becomes impossible to depict.

· Therefore, we need help from digital tracing and computational method to measure the networks. And these technologies are growing these days. This explains why graph theory has grown very fast in the past few years

Slide 17: Growth of social network analyses since late 1990.

· As you can see social network analyses went through the roof especially around 1990. What happened in 1990? WWW was introduced in 1980, and social networks were introduced in 1990.

Slide 18: Main sources of large-scale network data.

· In order to generate a graph we need to collect large amounts of data. Where can we get this large amount of data without jeopardizing participants’ privacy as well as intruding their daily functions?

· Collaborative graphs are an option because we can record who works with whom in a specific setting. As I explained briefly above, we can read their published articles to figure out co-authorships among scientists. The Wikipedia collaboration graph (connecting two Wikipedia editors if they’ve ever edited the same article) [122, 246] and the World- of-Warcraft collaboration graph (connecting two W-o-W users if they’ve ever taken part together in the same raid or other activity) are just two examples.

· Who talks to Whom graphs is another option. The Microsoft IM graph is a snapshot of a large community engaged in several billion conversations over the course of a month. In this way, it captures the “who-talks-to-whom” structure of the community. Similar datasets have been constructed from the e-mail logs within a company [6] or a university [259], as well as from records of phone calls: researchers have studied the structure of call.

· How about Information linkage graphs? Snapshots of the Web are central examples of network datasets; nodes are Web pages and directed edges represent links from one page to another.

· Lastly, think about technological network. Although the Web is built on a lot of sophisticated technology, it would be a mistake to think of it primarily as a technological network: it is really a projection onto a technological backdrop of ideas, information, and social and economic structure created by humans. But as we noted in the opening chapter, there has clearly been a convergence of social and technological networks over recent years, and much interesting network data comes from the more overtly technological end of the spectrum — with nodes representing physical devices and edges representing physical connections between them. Examples include the interconnections among computers on the Internet [155] or among generating stations in a power grid [411].

· Now the last quiz question of the day: Pick one of the examples above and describe what problems, challenges, and limitations you would encounter if you were to generate a graph. For instance, if you were to draw a collaboration graph among all MSU faculty members, what problems would you encounter? I’ll give you one minute. You can either pause this video to answer this question NOW on D2L or jot down your answer on the piece of paper and enter it later on D2L.

Slide 19: Exercises—Recitation Session on Thursday

We are going to explain further concepts, such as these.

So, I’d recommend that you look into the exercise questions in your textbook before the recitation session and be prepared to create some of these using NetLogo.

Thank you and hope you guys stay safe and healthy. And always…stay positive!