The two model types that are commonly used for performance are task graph and product-form queuing network.
Graph models are commonly used to study the behavior of programs or processes that contain concurrency and/or
probabilistic branching. The models assumed no contention for resources. The queueing network model is useful
for examining performance in systems where limited resources must be shared. As the size of a queueing network
increases, the state spaces of the associated Markov chain grows much faster. However, there is a useful class of
queueing networks that have product form, which means that the joint probability of the queue sizes in the network
is a product of the probabilities for the individual service centers. Product-form networks can be analyzed for
steady-state measures without having to generate the underlying state space for the whole network.

-----------------------------------------------------------
Product-form queuing networks:
----------------------------------------------------------


we looked at graph models, which are useful for examining performance from a program's point of view.  
The models assumed no contention for resources. Now we will look at a kind of model that is useful for 
examining performance in systems where limited resources must be shared; this model is the queueing 
network.
 
In this chapter, we introduce queueing networks definitions and terminology, discuss analysis methods, 
particularly for networks having product form.
 
*******************************
Queueing Terminology: 
*******************************
 
A queueing network consists of service centers and customers (often called jobs). A service center 
consists of of one or more servers and one or more queues to hold customers waiting for service.

When a customer arrives at a service center, it enters one of the servers or one of the queues. If there
 are customers waiting when a busy server becomes idle, the next customer to be served is selected 
according to a scheduling discipline (also called a queueing discipline.) The queueing delay is the 
amount of time a customer waits in a service center before it enters one of the servers.  The response 
time is the amount of time a customer spends at the service center, including the queueing delay and 
the service time.
 
A service center is described by
  . the sequence of arrival times of customers, 
  . the service times, the number of servers,
  . the storage (or buffer) size of the service center, 
  . the number of possible customers,
  . and the order in which customers are served.
 
The time between successive customer arrivals is called the {\em interarrival time}. Usually, we make 
the simplifying hypothesis that interarrival times are statistically independent random variables with 
the same distribution function. The process that determines the interarrival times is called the 
arrival process. It is also common to assume that that interarrival times are exponentially distributed
 random variables; in that case the arrival process is a Poisson process. Empirical studies have shown 
that the Poisson arrival process model agrees well with observed data. 
  
The service time for a customer depends on how much work the customer needs and how fast the server is 
able to do the work. We usually make the assumption that all of the customers need the same amount of 
work or that the customers are divided into classes with all customers in the same class needing the 
same amount of work. This means that when we talk about service times, we usually think of them in 
terms of just the speed of the server. Service times are commonly assumed to be independent, identi-
cally distributed random variables. We usually assume that all of the servers at a service center have 
the same service time distribution for each class of customers. An assumption of exponentially distri-
buted service times agrees fairly well with observed data and its memoryless property means that the 
state of a service center is completely described by the number of customers in it; the amount of 
service received by jobs in process does not matter.
 
It is possible to consider more general service time distributions. The most common method involves 
decomposing the general distribution into a series-parallel cascade of exponential stages. For a 
treatment of non-homogeneous servers (not all having the same service time distribution). 
 
The buffer size of a service center is the total number of customers allowed inside the service center.
 The buffer size includes customers waiting for service and those receiving service. It is possible to 
assume an infinite buffer size as well as a limited one.
 
The population size is the total number of potential customers that may want to enter the service center
. When this value is large enough, it is usually assumed to be infinite. 
 
A queueing discipline is the algorithm that determines the order in which customers are served. We 
assume that a server is never idle when there are customers in the service center waiting to be served.
We define a queueing discipline as preemptive when the service being provided to a customer can be 
suspended if some higher priority customer arrives, and nonpreemptive otherwise. In a nonpreemptive 
discipline (or a low priority customer in the preemptive case), any customer that arrives when all 
servers are full enter a queue.

Some commonly recognized queueing disciplines are:

 - FCFS (first-come-first-served): 
     a nonpreemptive discipline. Waiting customers are served according to the order in which they 
     entered the queue.

 - priority: 
     customers are assigned fixed priorities that determine the order in which they are served.  
     The discipline can be preemptive or non-preemptive.

 - RR (round robin): 
     each customer in turn receives service for a small time quantum. If the job is not completed by 
     the end of this time quantum, it is placed at the rear of the queue, eventually to be allotted 
     another time quantum. This is perhaps the most common CPU scheduling discipline in large 
     time-sharing systems. 

 - PS (processor sharing): 
     all customers at a service center are considered to be moving through the server simultaneously 
     with the server speed being equally divided among all jobs, and hence there is no queue. This is 
     the limiting case of RR as the specified time quantum approaches 0.
     Though PS does not represent an actual scheduling discipline, it can be extremely useful as an 
     analytically tractable approximation of RR. 

 - LCFSPR (last-come, first-served, preemptive resume): 
    if one customer is being served when another arrives, the customer being served is preempted and 
    pushed onto a stack of waiting customers. Upon completion of a customer, the top customer on the 
    waiting stack is returned to service. 
 

Service centers are often described using the notation X/Y/Z/K/L/D. X and Y identify the arrival and 
service processes, respectively.  A value of M for X or Y denotes the exponential (memoryless) 
distribution.  Other common values are
	. E for Erlang,
  	. H for hyperexponential, 
	. D for deterministic,  
  	. G for general.

  . Z represents the number of servers at the service center,
  . K the buffer size,
  . L the population size, and
  . D the queueing discipline. 

Often the last two letters are omitted; in this case L is assumed to be infty and D is FCFS. 
 
A queueing network in which customers arrive from an external source, spend time in the network, and 
depart is said to be open; a queueing network in which there is no external source of customers and no 
departures is said to be closed. In a closed queueing network it is implicitly assumed that the number 
of customers circulating indefinitely among the service centers is constant. If the behavior of all 
customers is the same regarding workload and the order in which service centers is visited, the network
 is said to be a single-chain queueing network. Otherwise customers that behave alike are considered 
to be in the same chain and the network is a multi-chain queueing network.  

It is possible that a queueing network can be open for some chains and closed for others.
