Discrete Events Simulation Methodology and Learning Notes

23 Feb 2020

Reading time ~30 minutes

This notes and learnings are collected during Georgia Tech’s CSE6730 Sim&Modeling Class (Professor Richard Fujimoto). Most of the documented masterial below is notes taken from the class slides, I rearranged partially the lecture slides structures with my understanding, in combination with my class works/projects notes. The collection is for my self learning documentation purpose only.

Chapter 01
Chapter 02
Chapter 03
Chapter 04
Chapter 05
Chapter 06
Chapter 07
Chapter 08
Chapter 09
Chapter 10
Chapter 11
Chapter 12
Chapter 13
Chapter 14
Chapter 15 Additional Resources
Traffic Simulation DES project

01. Intro

What is Model?

A model is a simplified representation of system. It can be conceptual, verbal, diagrammtic, physical or formal (mathematical).

we can only observe that a model is consistent with experiment; But this does not “prove” it is correct.
We can disprove a model.

Terminologies:

System Under Investigation (SUI) or Pnysical System: The real or imagined system that is to be modeled.

Dynamic Systems: systems that exhibit behavior over time, it is a collection of interacting entities that produces some form of behavior that can be observed over an interval of time.

Computer Simulation: is a computer program that uses computation to construct a representation of the behavior of a model of a particular system over time. Computer simulations build on purely mathematical models in science, engineering and social sciences.

Complex (Adaptive) Systems: Systems composed of many interacting components, exhibit nontrivial emergent and self-organizing behaviors.

Why Simulate?

Observing an operational system may be:

expensive, dangerous, disruptive, time consuming, not possible, morally or ethically unacceptable, parts of the system may not be observable.

Major use of Simulations:

system analysis
virtual enviroments
test and evaluation of physical devices

Simulation Taxonomy

	Real Machines	Simulated Machines
Real People	Live	Virtual
Simulated People	? (hardware in the loop)	Constructive

Types of Simulation Models

System Model

Deterministic
- continuous
- discrete (Discrete event simulation)
Stochastic
- continuous
- discrete (Discrete event simulation)

Stochastic VS. Deterministic

Stochastic simulation: contains random (probabilistic) elements, eg. inter-arrival time, serving time … etc.

output is random quantity (multiple runs required to analyze output) Deterministic simulation: contains no random elements. eg. Simulation of digital circuit, chemical reaction based on equations… etc.
output is deterministic for a given set of inputs.

Continuous VS. Discrete

Discrete Time:

State of the system viewed as changing at discrete points in time
An event is associated with each state transition, contain timestamp.
Time stepped vs. Event driven time advance.

Continuous Time:

State of the system viewed as changing continuously across time
System typically described by a set of differential equations.

However, Disinction is conceptual: in practice, continuous simulations discretize time.

DES (Discrete Event Simulation)

State changes can occur at irregular points in time
In practice, one can always discretize the time scale and map a discrete event simulation to a time stepped model.

Simulation VS. Machine Learning

Correlation is not causation:
- Machine learning focuses on using observational data to infer properties or behavior of a system
- Does nnot include rules concerning how the system operates(e.g. causal relationships)
When simulation models are needed:
- Sufficient data not available
- “What-If” analysis
- Deeper understanding of system behavior needed. (causal relationship)
Big data and Modeling & Simulation are synergistic:
- Data analysis ussed to characterize inpuits to the simulation (input analysis)
- Data analysis used to make inferences concerning simulation results (output analysis)

Back to Table of Content

02. Modeling and Simulation Process

Understanding of the system under investigation, build the simulation program

Step 1. Project Description

Statement of goals of the project
Description of behavioral features of SUI relevant to project goals

Step 2. Conceptual Model

An abstract representation of the system
Specifies what included and what left out
- level of detail
- Often a variation on standard abstraction
- Languages
Inputs ? Outputs? Metrics produced by the model?
Verify conceptual model is consistent with the problem description.

It is a representation of the SUI using some type of formalism (mathematic/symbolic in nature)

Step 3. Simulation Model/Program

Representation of the model as a computer program that can be executed. (DES, continuous, agent-based, parallel/distributed sim …)

Step 4. Verification and Validation

Verification (build the model right?)

Does the simulation model/program match the conceptual model?
debugging
model validation

Validation (build the right model?)

Does the simulation model/program match the actual system (problem description)?
Typically compare against:
- Measurements of actual system
- Analytic (mathematical) model of the system
- Another simulation model
Always an incomplete activity:
- Often can only validate portions of the model
- If can validating 100% why build the simulation?
- Goal to improve credibility and confidence in model

Validation Method:

Face Validation:
- Observed behavior appears correct by means of animation, graphical displays or the values of designated variables.
- More qualitative than quantitative
- Useful when the judgment of domain experts are included
Behavior Validation:
- Use a collection of anticipated known behaviors to validate
- E.g. Enginer failure of aircraft leafs to decrease in altitude
Replicative Validation:
- Collect input-output behaviour of SUI in a database
- Compare input-output behaviour of model to observed SUI behavior
Validation of the data model as important as validation of the simulation.

Step 5. Experimentation and Presentation of Results

Design of experiments
- Clear questions
- Develop experiment plan
Output analysis
- Statistical analysis of results
- Sensitivity analysis
- etc.
Presentation
- keep customer informed throughout modeling process to build confidence
- effective visualization may trump model quality

Challenges to Modelling and Simulation

Inappropriate statement of goals
Inappropriate granularirty of model
Ignoring unexpected behavior
Inappropriate mix of essential skills
Inadequate flow of information to the client

Back to Table of Content

03. Conceptual Models

Conceptual Model

A non-software specific description of the computer simulation model (that will be, is, or has been developed), describing the objectives, inputs, outputs, content, assumptions and simplifications of the model. (Robinson, “Conceptual Modeling for Simulation,” Winter Simulation Conference, 2013.Birta and Arbez, ch 4.)

Objectives: questions to be answered
Inputs: independent variables varied during experimentation
Outputs: metrics obtained from the model
Assumptions: based on beliefs concerning the model, necessitated by lack of data or other information
Content: describes model entities and their behaviors (activities)
Simplifications: decisions to simplify model, to reduce development time

Distinct from simulation software
Addresses the question of what to include in the model and what to omit.

Issues

The decision of what to leave out impact the model along several dimensions:

validuty: produce sufficiently accurate results
Credibility: believed by clients using the results of the simulation study
Feasibility: able to be built within the time and const constraints of the study
Usefulness: easy to use, flexible, reasonable data requirements, fast execution

Queueing Models

Queueing models are a natural abstraction for modeling systems that include:

Customers competing to use limited resources
Customers must wait (queue) to use the resource
Service discipline (e.g., first-come-first-serve) Typical applications:
Manufacturing, supply chains, transportation systems, service centers, communication networks, computer systems … etc.

Elements of a Queueing Model:

Server: Serial (one at a time) reusable resource
- Numebr of servers
- Assume service time for successive customers are independent and identically distributed (iid), specified by a probability distribution (service time distribution)
Customer: use resource, one at a time
- often assume inter-arrival time of customers are iid, specidfied by a probability distribution (interarraival time distribution)
Queue:
- Holds customers waiting to use server
- Finite vs. unlimited capacity
- Service discipline (First come first serve, last in first out, priority … etc. )

Conceptual Model Elements:

Stucture (Entitities, Attributes): Describes the componenets making up the model
- Entities: model components
  - resources : provide service
  - consumer : seek service
  - queue : an ordered collection of entities
  - group : an unordered collection of entities
    
    Exogenous(external to model) vs. Endogenous entities
- Attributes:
  - characteristics of model entities: state information of entity
Behavior (Activities, Events) Describes what the model components do over time
- Activity: a signle atomic unit of behavior spanning some amount of (simulation time)
  - Each activity begins and ends with an event.
  - Activities characterize behavior of entity over a time period
  - Scheduled activity: starts when an event occurs
  - Conditional activity: starts when an condition becomes satisfied, usually based on state variables.
    - State varibales can only be changed by an event
    - State variables must be initiated indirectly by some other event
- Event: an action occuring in the model at one instant of time
  - contains a timestamp indicating when the event occurs
  - the state of the model can only change when an event occurs

ABCMod: Activity Based Conceptual Modeling Framework

Tabular format documenting the conceptual model.
Structure:

Constants and parameters
Entities and their attributes Behavior:
Activities
Actions

Back to Table of Content

04. Discrete Event Simulation

Discrete Event Simulation:
Computer model for a system where changes in the state of the system occur at discrete points in simulation time.
Fundamental Concepts:

System State (state variables)
State transitions (events)

Event-oriented:
Focus on events and event computations, view the simulation as a sequence of event computations where each event has a (simulation time) time stamp indicating when it occurs.

Each event computation can:

Modify state variables
Schedule new events

Characteristics:

Execution is a sequence of event computations,
Events are processed in time stamp order
Unprocessed events are stored in a future event list
Simulation time viewed as continuous (this is different from a time stepped simulation where simulation time discretized into integer time steps)
Simulation time advances from one event to the next (irregular time advances)

Discrete Event Simulation Software

Simulation Application
- State variables
- code modeling system behavior (event procedures)
  - modify state variables
  - schedule new events
- I/O and user interface software Component 1 is model of the physical system.
  It contains the event handler procedures.
Simulation Executive (Engine)
- future event list data structure
- event processing loop
- manage advances in simulation time Component 2 is independent of the simulation application.
  It contains the event processing loop.

Event Scheduling

Simulation Engine includes a priority queue data structure (FEL) that holds all unprocessed events.
Main Scheduling loop removes the smallest timestamped event from FEL on each iteration.
An event computation may schedule new events that are added to the FEL.

A good example is Airport Simulation.

Future Event List (FEL)

A data Structure is needed to store events that have been scheduled, but not yet processed(FEL)
Required data structure is called a priority queue.
Usually, the amount of computation within in the event handler function is small, the time to access the FEL will dominate the execution time if the FEL holds many event.

Priority Queue Operations:

Required operations:

Insert(FEL,event,timestamps) : add event with time to event list FEL
Event = Delete(FEL) : remove smallest timestamps event and return pointer to it
DeleteArbitrary(FEL,event) : Un schedule an event, delete an arbitrary event (not neccessarily smallest timestamped event)

Constraints for a general purpose simulation engine:

maximum number of events required at one time unknown –> memory allocation issues
Sequence of Insert and Delete operations unknown
Distribution of timestamps on successive Insert operations unknown

Data Structures:

linked lists
- Single linked list
- Double linked list (DeleteArbitrary Fast)
- Sorted vs. unsorted list
Tree based
- Heap
- Splay tree
Hashing schemes
- Calendar queue
- Ladder queue
Hybrid
- Henricksen’s algorithm

Performance Comparasion

Linear linked list fastest for small sized list
Tree-based data structures (e.g., heap, splay tree) give predictable (O(log N)) performance, though often not the fastest available approach
Hashing-based priority queues can yield constant time (O(1)) insertion and deletion operations
- Calendar Queue data structure offer best performance (constant time insert/delete) for many applications, but can yield surprisingly poor performance in some cases
- Ladder Queue addresses unstable property of calendar queue
More efficient data structure needed for large list
Results vary for different machine architectures

Back to Table of Content

05. World Views for Discrete Event Simulation

Simulation Model World Views:

The world view provides a perspective and development approach for the simulation model over the same set of discrete events

Event-oriented –> simple, efficient
- Simulation program written as a set of event handlers
- Expressed in terms of future events (events schedule new events)
- Uses Future Event List to advance simulation time
- Somewhat “lower level” abstraction than other world views
Process-oriented –> simplifies model development, maintainence and modification of simulation code –> requires threading, additional complexity and computation overhead to suspend and resume simulation processes
- Simulation program written as set of processes (usually consumer entities)
- Each process a series of activities
- Uses Future Event List to advance simulation time
- Used in many simulation languages, especially in U.S.•
Activity scanning –> can be inefficient; used but not as common
- Simulation program is written as a set of activities
- Simplest implementation involves incrementing time in small steps, testing the pre-conditions of activities at each time step
- More complex implementation uses Future Event List (FEL)
- Popular in Europe

Back to Table of Content

06. Creating Random Numbers

Random vs. Pseudo Random Numbers

Truly random numbers can be generated by hardware devices, however, it is not repeatable.
For most simulations, pseudo random number generators are used.
- Deterministic procedure to generate streams of random numbers
- Output looks like random
- Stream satisfies statistical tests for randomness

Properties (random number used in simulation)

Randomness: uniformly distributed over (0,1) without correlation among the samples. Produce output satisfying statistical tests of randomness.
Repeatability: ability to reproduce
Portability: easy to move RNG to new machines
Efficiency: time and memory requirements
Multiple streams: can be used to generate multiple, uncorrelated streams of random numbers
Documentation: theoretical analysis and extensively tested

Lehmer’s algorithm, a special case of Linear Congruential Generator (LCG)

\(x_{i+1} = (a*xi + c) mod m\)
when c is zero –> Lehmer’s Algorithm –> known as multiplicative LCG

Lehmer can not produce value 0, if it did, all subsequent numbers in sequence will be zero.
Quality of the RNG is dependent on the good choice for a and m.

A good explanation of pseudo rng LCG can be found here

Check the quality of RNG

Output should be independent and uniformly distributed over (0,1), select parameters to improve quality, can pass statistical test (e.g Chi-square), can pass software suites tests, (e.g battery pf tests)

However, passing statitical tests do not prove the RNG is valid, it only gives confidence of validity.
Generators such as the Mersenne Twister can produce very long periods, but require more memory
Many streams needed for some applications, e.g., when using parallel processing

Back to Table of Content

07. Random Variates

Random Variate:

an algorithmically generated realization of a random variable with pdf p().

Generation Method:

Inverse distribution method applies in general for any distribution where a closed form equation for the CDF can be determined.
- discrete random variates:
  - E.g. Equilikely(a,b), Bernoulli(p), Geometric(p) …
- Continuous random variates:
  - Concepts for discrete random variate carry over to continuous
- Generation process:
  1. u = random(u)
  2. return \(F^{-1}(u)\), \(F^{-1}\) is the inverse distribution function.
In many cases the inverse cdf is not known, Rejection-Acceptance Method may be used if probability density function is known.
- Generate two samples, u1 and u2from the uniform distribution on [0,1].
- Compute x’ = a + u1 (b – a)
- Compute y’ = c u2
- If y’ ≤ f(x’) then accept x’ as a valid sample, otherwise repeat the process with the generation of two new samples
  - How many rejected trials: Dependent on the relative portion of the abc rectangle filled in by the density function.
  - Probability a draw will be accepted is 1 / [c (b – a)] since the area under f(x) is 1
Empirical distributions useful if data does not fit a known distribution

Rejection-Acceptance Method

For some distributions, inverse distribution functions may not be readily available, but the probability density function (f(x)) is known.
Rejection Acceptance Method offers an alterative approach.
Shares similar features to the Monte-Carlo methods, e.g., for evaluating integrals
Assumption:

Probability density function f(x) is available
Density function bounded so a ≤ x ≤ b and f(x) ≤ c

Back to Table of Content

08. Verification and Validation

Definitions:

Verification :
- Ensuring that the computer program of the computerized model and its implementation are correct.
Validation:
- Substantiation that a model within its domain of applicability possesses a satisfactory range of accuracy consistent with the intended application of the model.
Credibility:
- Ensure sufficient confidennce in the model to use it and act based on predictions made by the model.
- Developing in the interested parties sufficient confidence that they believe in the model and information derived from the model.
Usability:
- Determining that the model and its user instructions are easy to use.

V&V in Model Development:

Compared with problem entity (System) or involving data: Validation Process
- Conceptual Model VS. Problem Entity : Conceputual Moel Validation (Analysis and Modeling)
- Computerized Model VS. Problem Entity: Operational Validation (Experimentations)
- Data Validity is a process links all three systems. (Problem entity, Computerized Model, Conceptual Model)
Computerized Model VS. Conceptual Model: Verification Process (Computerized model verification)

Verification:

Did I build the model right?
Concerned with the “correctness” of the simulation program
Compares the simulation model with the conceptual model
Involves software engineering, software debugging

Simulation Languages Greatly affects the time and effor for verification (including development time, model execution speed)

Some types and examples:

Special purpose simulation languages:
- VHLD(circuits), AutoMod(Manufacturing), Opnet,ns3(telecommunications), …
General purpose simulation languages:
- Simula, Arena, Modelica, …
Genral purpose programming languages:
- C, FORTAN, …

Verification Techniques:

Static techniques:
- Structured walk through, by development team
- Program verification, formal methods, proof
Dynamic techniques:
- Simulation languages: verification of model components, RNG
- GP langugage: structured programming, module testing
- Consistensy tests within software (e.g. counters)
- Test/Validate simple configurations (e.g. queueing network with known analytic solutons)
- Trace analysis
- Model animation

Validation

highlights:

within its domain of applicability
satisfactory range of accuracy
intended application of the model

Thus,
Did I build the right model? –> depends on the goals of the simulation study.

1. Data Validation

Data needs:
support three validation processes (mentioned earlier, linked all three systems, conceptual, computerized(simulation model), problem entity)

Support Conceptual model validation (e.g. test assumptions)
Support Simulationn model validation
Experimentation (e.g. defining typical scenarios)

Issues:

Collecting data is time consuming, expensive
Data always has errors
Are they valid?
Check for consistency among data
Missing data is a major problem in simulation studies.

2. Conceptual Model Validation (Conceptual VS. Problem entity)

Determine that the theories and assumptions underlying the conceptual model are correct
- Independence assumptions, stationary assumptions, test parameters, test input distribution , Statistical results
Determine that the model’s representation of the problem entity and the model’s structure, logic, and mathematical and causal relationships are reasonable for the intended purpose of the model
- Face validation (reviewed by subject experts), Strutured walk through, Traces( trace entities (e.g. customers) throughout the simulaiton)

3. Operational Validation (Computerized vs Problem entity)

Determining that the model’s output behavior has a satisfactory range of accuracy for the model’s intended purpose over the domain of the model’s intended applicability.
Decision Approach:

Subjective: (Behavior Explore)
1. Observable System: Explore model behavior, Comparison using graphical displays
2. Non-Observable System: Explore model behavior, Comparison with other models
Objective: (Statistical tests)
1. Observable System: Comparison with physical system using statistical tests
2. Non-Observable System: Comparison with other models using statistical tests Here Observable or Non-Observable refer the problem entity’s type.

How to explore the model behavior?

Qualitative: Directions of outputs are examined and also possibility whether the magnitudes are ‘reasonable.’
Quantitative: Both the directions and the precise magnitudes of the outputs are examined.
SUBJECT MATTER EXPERTS: Usually know directions and often the ‘general values’ of the magnitudes of the outputs.
- they predict outputs and examine the models outputs.
PARAMETER VARIABILITY – SENSITIVITY ANALYSIS:
- Use graphs to display data
Use Animation and Graphics of system operation
Use numerous sets of experimental conditions.

Back to Table of Content

09. Output Analysis

Stochastic Simulations (random variables):

Result of each simulation run is one sample of a random variable.
Multiple runs are needed to get meaning ful information from the simulation.
- Statistics: mean of output metric
- Confidence Interval Bounded Horizon Studies (end time pre-specified)
  Steady State Studies (end time is a parameter)

Output Variables

Point set output variables (PSOV)
- Time variables (e.g. Individual wait times)
- Sample variable (entity attributes)
Derived scalar output variables (DSOV)
- Scalar derived from the PSOV’s (e.g. average wait time)
- Output variables are random variables, thus a simulation run provides a sample value

Output Data from Multiple Runs

The output statistic (DSOV) is a random variable with some probability distribution
Perform n simulations runs; each produces a sample of the DSOV

Bounded Horizon Study:

Characteristics:

Observation interval is well defined (implicit and explicit)
Transients in the stochastic processes are common
The simulation is run n times to produce n independent values for the DSOV

Objectives: Determine

Point Estimate : Mean of the DSOV output values
Interval Estimate: Confidence Interval for the point estimate or margin of error for the point estimate

Adjust n to get desired “confidence” of estimate statistics.

Point Estimate:

output sample mean for n runs of simulation.
Observation: The point estimate is itself a random variable provided the simulation runs are independent (different initial seeds for the random number generators used for each simulation run)

Confidence Interval:

a statement on the relationship between the sample mean and the true mean. More samples yield a smaller CIMost values lie outside the CI !

Central Limit Theorem:

The sample mean random variable \(\bar{y}(n)\) is normally distributed with mean μ and variance \(σ^2/n\) as \(n -> ∞\) independent of the distribution of \(y_i\).
The goal is to determine an interval such that the probability the true mean μ lies within that interval is C .

Don’t know μ and \(σ^2\) (could use the sample mean/variance to approximate)
n must be large; depends on underlying distribution of Y.

Student’s T-Distribution

“T statistic” Works better than sample mean method when n is small.
\(T(n) = \frac{\bar{y}(n)-\mu}{\sigma/\sqrt{n}}\)

Use sample mean for \(\mu\)
Use sample variance for \(\sigma^{2}\)
Random variable \(T(n)\) will have the students t-distribution with n-1 degree of freedom.

T distribution less peaked and has longer tails than normal distribution.
Yields wider confidence intervals than normal.

Steady State Studies:

Warm-up Period : transient period from initialization until simulated system reaches steady state (Typically simulation doesn’t start collecting output statistics until the warm up period is over)
Can use Welch’s Method one approach to define the warm-up period, to determine when output stochastic process has reached steady state.

Welch’s Moving Average Method - Windows:

Rather than simply plotting \(\bar{a_i}\):
plot average of:

\(\bar{a_i}\),
the previous \(\bar{a_i}\) values,
the next window \(\bar{a_i}\) values

Can avoid choppy plots, and visually observe the warm-up period ends.

CI for steady State Studies

Can increase the number of replications to try to reduce the size of the confidence interval
- Delete warm up period for each run
Alternatively, can increase the length of each run
Define a right-hand boundary to collect sufficient data for providing meaningful DSOV
Apply techniques used in bounded horizon study, except increase run length rather than number of replications to control CI size

More Efficient Approach: Batch Means Method:

Replication approach: warm up period required for each replication
Use one single long run instead (reduces compute time)
Divide the long run into time cells sufficiently long so that values for output variable are IID

Each time cell referred to as a batch

Back to Table of Content

10. Parallel and Distribution Simulation - CMB Algorithm

Motivation:

requires too much time to complete a single simulation run
too much time to complete a single simulation run
Requires more memory than available on a single computer
Requires geographically distributed people or resources
Need to interconnect separately developed simulators, e.g., to model systems-of-systems

Time terminology

physical time: time in the physical system
- Noon, December 31, 1999 to noon January 1, 2000
simulation time: representation of physical time within the simulation
- floating point values in interval [0.0, 24.0]
wallclock time: time during the execution of the simulation, usually output from a hardware clock
- 9:00 to 9:15 AM on September 10, 1999

Synchronization Algorithms

Conservative synchronization: avoid violating the local causality constraint (wait until it’s safe) but prone to deadlock
- deadlock avoidance using null messages (Chandy/Misra/Bryant)
- deadlock detection and recovery
- synchronous algorithms (global synchronization points)
Optimistic synchronization: allow violations of local causality to occur, but detect them at runtime and recover using a rollback mechanism
- Time Warp (Jefferson)
- numerous other approaches

Chandy/Misra/Bryant CMB algorithm (avoid deadlock) by using “NUll Message”

Assumptions:

logical processes (LPs) exchanging time stamped events (messages)
static network topology, no dynamic creation of LPs
messages sent on each link are sent in time stamp order
network provides reliable delivery, preserves order

Observations:
The above assumptions imply the time stamp of the last message received on a link is a lower bound on the time stamp (LBTS) of subsequent messages that will later be received on that link

lookahead

is a constraint on the timestamp of new messages the LP can send, Lookahead is necessary to enable conservative algorithms to achieve good performance.
Has large effect on performance: essential for concurrent processing of events for conservative algorithms

Avoid deadlock:

each LP send “null” messages indicating a lower bound on the timestamp of future messages.

Livelock will be caused if lookahead = 0

un-ending cycle of null messages where no LP can advance its simulation time

How to avoid time creep?

Each LP guarantees that if no additional events are delivered to the LP with TS < T, all subsequent messages that LP produces have a time stamp at least T + L (L = lookahead) where T is the timestamp of the LP’s next local event.(Use time of next event to avoid lookahead creep )

Back to Table of Content

11. Performance Bounds and Synchronous Algorithms -YAWNS Algorithm

YAWNS Assumption:

Unlimited number of processors are available
Sending and receiving a message requires no time, and communication latency is zero (instantaneous communications) “Zero overhead” assumption
Each event computation requires one unit of wallclock time (not to be confused with simulation time!) to complete
A single event computation cannot be distributed across multiple processors Why we do this: Useful to determine if it is worth trying to parallelize

Critical path | minimum execution time

The longest path(s) is referred to as the critical path, and the length of this path is called the critical path length Minimum execution time = critical path length

YAWNS is a Conservative Synchronization with the cons:

The dependence graph is not known in advance We do not know what new events will be created We do not know what events will be later received from other LPs
Conservative algorithms devise ways to constrain what new events can be created (lookahead) in order to determine which events have no predecessors
In general, conservative algorithms cannot fully exploit the parallelism available in the computation because one cannot immediately determine that an event has no predecessor events; they block LPs until we are sure

Implementing Barriers - LBTS

Barrier: Using a Centralized Controller: O(n), Controller must send and receive N messages
Broadcast Barrier O(n): One step approach; no central controller Each process: Broadcast message when barrier primitive is invoked Wait until a message is received from each other process N (N-1) messages O(n)
Tree Barrier Performance: Requires 2 * log2 N steps to complete the barrier
Requires 2 * (N-1) messages
Butterfly Barrier:logN steps including notification barrier achieved N (logN) messages

General Approach with YAWNS

WHILE (unprocessed events remain)   
    store messages received in previous iteration in FEL   
    LBTS = mini (Ni + LAi)      // global minimum
    process events with time stamp ≤ LBTS   
    barrier synchronization   
endDO

LBTS = a lower bound on the time stamp of any messages an LP might receive in the future

Back to Table of Content

12. Optimistic Synchronization : Time Warp (Part I) :Local Control (Jefferson)

Assumption:

Logical processes (LPs) exchanging time stamped events (messages);
Dynamic network topology, dynamic creation of LPs OK;
Messages sent on each link need not be sent in time stamp order;
Network provides reliable delivery, but need not preserve order;
Assume each event has a unique timestamp (implies the timestamp of a new event is strictly greater than that of the event scheduling it)

Basic idea:

detect out-of-order event execution and use a rollback mechanism to recover

Process events w/o worrying about messages that will arrive later;
Detect out of order execution, recover using rollback.

Local Control:

Do not discard processed events
add rollback When rollback initiated
Restore to prior state(2)send anti-message

About anti-message:

Used to cancel a previously sent message
Each positive message sent by an LP has a corresponding anti-message•Anti-message is identical to positive message, except for a sign bit
When an anti-message and its matching positive message meet in the same queue, the two annihilate each other (analogous to matter and anti-matter)
To undo the effects of a previously sent (positive) message, the LP need only send the corresponding anti-message
Message send: in addition to sending the message, leave a copy of the corresponding anti-message in a data structure in the sending LP called the output queue

Processing incoming messages (Three cases):

Case I: Corresponding message has not yet been processed
–>: Annihilate message/anti-message pair Case II: Corresponding message has already been processed
–>: Roll back to time prior to processing message (secondary rollback) –>: Annihilate message/anti-message pair Case III: Corresponding message has not yet been received
–>: Queue anti-message –>: Annihilate message/anti-message pair when message is received

Property 1 Rollback Propagation:

Rollback always forward, as A–>B, T(B)>T(A), if A canceled or rollback, B canceled or rollback. Secondary rollback always cancel future events.

Property 2 Forward Progress:

Unprocessed events at smallest TS will not be later rolled back. –>progress forward, but does not guarantee termination

Rollback thrashing:

a behavior where Time Warp spends most of its time executing and then rolling back computations

Back to Table of Content

13. Time Warp (Part II): Global Control GVT - Enhancement in addition to the Local Control

Why GVT needed?

Besides rollback mechanism in local control, also want to realize (1) Reclaim memory for old states/events (2) Perform irrevocable operations.
Provide the lower bound time stamp of future rollback can solve these.

GVT Definition:

Global Virtual Time (GVT) is defined as the minimum time stamp of any unprocessed (or partially processed) message or anti-message in the system.
GVT provides a lower bound on the time stamp of any future rollback.

Storage for events and state vectors older than(less than timestamp) GVT can be reclaimed
I/O operations with time stamp less than GVT can be performed.

Synchronous vs. Asynchronous GVT Algorithms：

Synchronous GVT algorithms : LPs stop processing events while a GVT computation is taking place LPs resume once GVT computation complete (e.g., new GVT value is distributed)
Asynchronous GVT algorithms: LPs can continue processing events and schedule new events while the GVT computation proceeds “in background” Asynchronous is preferable from performance standpoint.

Alg1: Difficulties, and Samadi Alg with “acknowledgment”:

Difficulties:

Transient message problem: messages sent but not yet received must be considered in computing GVT.
Simultaneous reporting problem: different processors report their local minima at different points in wallclock times, leading to an incorrect GVT value.

Samadi Algorithm:

Send an ack for each event messages & anti-messages received
“Mark” acks sent after the processor has reported its local minimum until a new GVT value is received

Algorithm:

Controller broadcasts “start GVT” message
Each processor reports minimum time stamp among (1) local messages, (2) unacknowledged sent messages, (3) marked acks that were received
Controller computes global minimum as GVT value, broadcasts new GVT

Characteristics:

Correctly computes GVT values
Requires a central controller, Limits scalability to large number of processors
Requires acknowledgment messages–In practice, communications is expensive (time consuming)

Alg2: Distributed Snapshot (A better way):

Computing GVT would be trivial if we could capture an instantaneous global snapshot of the computation.

consistent cut:

cut point: divides an LP computation into past and future parts
cut: set of cut points, one per LP
cut message: a message that was sent in the past, and received in the future part of the computation
consistent cut: a cut where all messages crossing the cut are cut messages

cut value and GVT:

cut value: minimum among (1) local minimum of each LP at its cut point and (2) time stamp of cut messagesThe cut value is a valid value for GVT

GVT Algorithm Overview：

Step 1: Each LP sets a cut point - Compute local minimum of the timestamp of unprocessed messages within that LP
Step 2: Compute a global minimum
Step 3: If a cut message is later received, recompute global minimum
Step 4: Done when all cut messages have been received

GVT never goes back:

Proof:
Recall the definition of GVT(W): the minimum timestamp unprocessed or partially processed message or anti-message in the system at wallclock time W. To prove this, show that if GVT has a value of T, no new messages will be later created with timestamp less than T. GVT is based on the smallest timestamped event. This event could be a (1) a positive message with timestamp T in an LP’s input queue, (2) a transient positive message with timestamp T, or (3) a transient (not yet processed) anti-message with timestamp T. (1) Processing a positive message with timestamp T can only result in the generation of another positive messages with timestamp > T. (2) A transient positive message with timestamp T could cause a rollback to time T, but the antimessages sent, and the positive messages that now become unprocessed events due to the rollback must have timestamp larger than T, so again no new messages with timestamp < T are created. (3) An anti-message at time T will cancel an event with timestamp T, and could cause a rollback to a time greater than T. But again, just as in the previous case no new positive message or anti-message that is created will have a timestamp less than T. In all three cases, no new event can be created with timestamp < T, so any subsequent GVT values must be greater than or equal to T.

Back to Table of Content

14. A comparision with conservative and optimistic synchronization methods

Conservative Algorithms:

Pro:

Good performance reported for many applications containing good lookahead (queueing networks, communication networks, wargaming)
Relatively easy to implement 3.Well suited for “federating” autonomous simulations, provided there is good lookahead
Con:
Cannot fully exploit available parallelism in the simulation because one must protect against a “worst case scenario”
Lookahead is essential to achieve good performance
Writing simulation programs to have good lookahead can be very difficult or impossible; changes in model may greatly reduce lookahead, leading to code that is “fragile” (difficult to maintain)

Optimistic Algorithms:

Pros:

Good performance reported for a variety of application;
Potential for a general, application independent parallel simulation engine; Cons:
In practice, application development cumbersome;
Overheads (state saving, rollback) may degrade performance (many solutions exist, but no guarantees);
When federating existing simulations, rollback capability must be added to the simulator

Back to Table of Content

15. System-of-Systems Simulationand the High Level Architecture

High Level Architecture (HLA)

The HLA consists of:

Rules thatsimulations (federates) must follow to achieve proper interaction during a federation execution
Object Model Template (OMT) defines the format for specifying the set of common objects used by a federation (federation object model), their attributes, and relationships among them
Interface Specification (IFSpec) provides interface to the Run-Time Infrastructure (RTI), that ties together federates during model execution

HLA Time Management

Federates have different local time advance mechanisms, A federate should not need not know what time advance mechanism other federates are using (time stepped, event driven, parallel – optimistic or conservative).
Approach: