This notes and learnings are collected during Georgia Tech’s CSE6730 Sim&Modeling Class (Professor Richard Fujimoto). Most of the documented masterial below is notes taken from the class slides, I rearranged partially the lecture slides structures with my understanding, in combination with my class works/projects notes. The collection is for my self learning documentation purpose only.
Table of Contents
- Chapter 01
- Chapter 02
- Chapter 03
- Chapter 04
- Chapter 05
- Chapter 06
- Chapter 07
- Chapter 08
- Chapter 09
- Chapter 10
- Chapter 11
- Chapter 12
- Chapter 13
- Chapter 14
- Chapter 15 Additional Resources
- Traffic Simulation DES project
01. Intro
What is Model?
A model is a simplified representation of system. It can be conceptual, verbal, diagrammtic, physical or formal (mathematical).
-
we can only observe that a model is consistent with experiment; But this does not “prove” it is correct.
-
We can disprove a model.
Terminologies:
System Under Investigation (SUI) or Pnysical System: The real or imagined system that is to be modeled.
Dynamic Systems: systems that exhibit behavior over time, it is a collection of interacting entities that produces some form of behavior that can be observed over an interval of time.
Computer Simulation: is a computer program that uses computation to construct a representation of the behavior of a model of a particular system over time. Computer simulations build on purely mathematical models in science, engineering and social sciences.
Complex (Adaptive) Systems: Systems composed of many interacting components, exhibit nontrivial emergent and self-organizing behaviors.
Why Simulate?
Observing an operational system may be:
- expensive, dangerous, disruptive, time consuming, not possible, morally or ethically unacceptable, parts of the system may not be observable.
Major use of Simulations:
- system analysis
- virtual enviroments
- test and evaluation of physical devices
Simulation Taxonomy
Real Machines | Simulated Machines | |
---|---|---|
Real People | Live | Virtual |
Simulated People | ? (hardware in the loop) | Constructive |
Types of Simulation Models
System Model
- Deterministic
- continuous
- discrete (Discrete event simulation)
- Stochastic
- continuous
- discrete (Discrete event simulation)
Stochastic VS. Deterministic
Stochastic simulation: contains random (probabilistic) elements, eg. inter-arrival time, serving time … etc.
- output is random quantity (multiple runs required to analyze output) Deterministic simulation: contains no random elements. eg. Simulation of digital circuit, chemical reaction based on equations… etc.
- output is deterministic for a given set of inputs.
Continuous VS. Discrete
Discrete Time:
- State of the system viewed as changing at discrete points in time
- An event is associated with each state transition, contain timestamp.
- Time stepped vs. Event driven time advance.
Continuous Time:
- State of the system viewed as changing continuously across time
- System typically described by a set of differential equations.
However, Disinction is conceptual: in practice, continuous simulations discretize time.
DES (Discrete Event Simulation)
- State changes can occur at irregular points in time
- In practice, one can always discretize the time scale and map a discrete event simulation to a time stepped model.
Simulation VS. Machine Learning
- Correlation is not causation:
- Machine learning focuses on using observational data to infer properties or behavior of a system
- Does nnot include rules concerning how the system operates(e.g. causal relationships)
- When simulation models are needed:
- Sufficient data not available
- “What-If” analysis
- Deeper understanding of system behavior needed. (causal relationship)
- Big data and Modeling & Simulation are synergistic:
- Data analysis ussed to characterize inpuits to the simulation (input analysis)
- Data analysis used to make inferences concerning simulation results (output analysis)
02. Modeling and Simulation Process
Understanding of the system under investigation, build the simulation program
Step 1. Project Description
- Statement of goals of the project
- Description of behavioral features of SUI relevant to project goals
Step 2. Conceptual Model
- An abstract representation of the system
- Specifies what included and what left out
- level of detail
- Often a variation on standard abstraction
- Languages
- Inputs ? Outputs? Metrics produced by the model?
- Verify conceptual model is consistent with the problem description.
It is a representation of the SUI using some type of formalism (mathematic/symbolic in nature)
Step 3. Simulation Model/Program
Representation of the model as a computer program that can be executed. (DES, continuous, agent-based, parallel/distributed sim …)
Step 4. Verification and Validation
Verification (build the model right?)
- Does the simulation model/program match the conceptual model?
- debugging
- model validation
Validation (build the right model?)
- Does the simulation model/program match the actual system (problem description)?
- Typically compare against:
- Measurements of actual system
- Analytic (mathematical) model of the system
- Another simulation model
- Always an incomplete activity:
- Often can only validate portions of the model
- If can validating 100% why build the simulation?
- Goal to improve credibility and confidence in model
Validation Method:
- Face Validation:
- Observed behavior appears correct by means of animation, graphical displays or the values of designated variables.
- More qualitative than quantitative
- Useful when the judgment of domain experts are included
- Behavior Validation:
- Use a collection of anticipated known behaviors to validate
- E.g. Enginer failure of aircraft leafs to decrease in altitude
- Replicative Validation:
- Collect input-output behaviour of SUI in a database
- Compare input-output behaviour of model to observed SUI behavior
- Validation of the data model as important as validation of the simulation.
Step 5. Experimentation and Presentation of Results
- Design of experiments
- Clear questions
- Develop experiment plan
- Output analysis
- Statistical analysis of results
- Sensitivity analysis
- etc.
- Presentation
- keep customer informed throughout modeling process to build confidence
- effective visualization may trump model quality
Challenges to Modelling and Simulation
- Inappropriate statement of goals
- Inappropriate granularirty of model
- Ignoring unexpected behavior
- Inappropriate mix of essential skills
- Inadequate flow of information to the client
03. Conceptual Models
Conceptual Model
A non-software specific description of the computer simulation model (that will be, is, or has been developed), describing the objectives, inputs, outputs, content, assumptions and simplifications of the model. (Robinson, “Conceptual Modeling for Simulation,” Winter Simulation Conference, 2013.Birta and Arbez, ch 4.)
- Objectives: questions to be answered
- Inputs: independent variables varied during experimentation
- Outputs: metrics obtained from the model
- Assumptions: based on beliefs concerning the model, necessitated by lack of data or other information
- Content: describes model entities and their behaviors (activities)
- Simplifications: decisions to simplify model, to reduce development time
Distinct from simulation software
Addresses the question of what to include in the model and what to omit.
Issues
The decision of what to leave out impact the model along several dimensions:
- validuty: produce sufficiently accurate results
- Credibility: believed by clients using the results of the simulation study
- Feasibility: able to be built within the time and const constraints of the study
- Usefulness: easy to use, flexible, reasonable data requirements, fast execution
Queueing Models
Queueing models are a natural abstraction for modeling systems that include:
- Customers competing to use limited resources
- Customers must wait (queue) to use the resource
- Service discipline (e.g., first-come-first-serve) Typical applications:
- Manufacturing, supply chains, transportation systems, service centers, communication networks, computer systems … etc.
Elements of a Queueing Model:
- Server: Serial (one at a time) reusable resource
- Numebr of servers
- Assume service time for successive customers are independent and identically distributed (iid), specified by a probability distribution (service time distribution)
- Customer: use resource, one at a time
- often assume inter-arrival time of customers are iid, specidfied by a probability distribution (interarraival time distribution)
- Queue:
- Holds customers waiting to use server
- Finite vs. unlimited capacity
- Service discipline (First come first serve, last in first out, priority … etc. )
Conceptual Model Elements:
- Stucture (Entitities, Attributes): Describes the componenets making up the model
- Entities: model components
- resources : provide service
- consumer : seek service
- queue : an ordered collection of entities
- group : an unordered collection of entities
Exogenous(external to model) vs. Endogenous entities
- Attributes:
- characteristics of model entities: state information of entity
- Entities: model components
- Behavior (Activities, Events) Describes what the model components do over time
- Activity: a signle atomic unit of behavior spanning some amount of (simulation time)
- Each activity begins and ends with an event.
- Activities characterize behavior of entity over a time period
- Scheduled activity: starts when an event occurs
- Conditional activity: starts when an condition becomes satisfied, usually based on state variables.
- State varibales can only be changed by an event
- State variables must be initiated indirectly by some other event
- Event: an action occuring in the model at one instant of time
- contains a timestamp indicating when the event occurs
- the state of the model can only change when an event occurs
- Activity: a signle atomic unit of behavior spanning some amount of (simulation time)
ABCMod: Activity Based Conceptual Modeling Framework
Tabular format documenting the conceptual model.
Structure:
- Constants and parameters
- Entities and their attributes Behavior:
- Activities
- Actions
04. Discrete Event Simulation
Discrete Event Simulation:
Computer model for a system where changes in the state of the system occur at discrete points in simulation time.
Fundamental Concepts:
- System State (state variables)
- State transitions (events)
Event-oriented:
Focus on events and event computations, view the simulation as a sequence of event computations where each event has a (simulation time) time stamp indicating when it occurs.
Each event computation can:
- Modify state variables
- Schedule new events
Characteristics:
- Execution is a sequence of event computations,
- Events are processed in time stamp order
- Unprocessed events are stored in a future event list
- Simulation time viewed as continuous (this is different from a time stepped simulation where simulation time discretized into integer time steps)
- Simulation time advances from one event to the next (irregular time advances)
Discrete Event Simulation Software
- Simulation Application
- State variables
- code modeling system behavior (event procedures)
- modify state variables
- schedule new events
- I/O and user interface software Component 1 is model of the physical system.
It contains the event handler procedures.
- Simulation Executive (Engine)
- future event list data structure
- event processing loop
- manage advances in simulation time Component 2 is independent of the simulation application.
It contains the event processing loop.
Event Scheduling
Simulation Engine includes a priority queue data structure (FEL) that holds all unprocessed events.
Main Scheduling loop removes the smallest timestamped event from FEL on each iteration.
An event computation may schedule new events that are added to the FEL.
A good example is Airport Simulation.
Future Event List (FEL)
A data Structure is needed to store events that have been scheduled, but not yet processed(FEL)
Required data structure is called a priority queue.
Usually, the amount of computation within in the event handler function is small, the time to access the FEL will dominate the execution time if the FEL holds many event.
Priority Queue Operations:
Required operations:
- Insert(FEL,event,timestamps) : add event with time to event list FEL
- Event = Delete(FEL) : remove smallest timestamps event and return pointer to it
- DeleteArbitrary(FEL,event) : Un schedule an event, delete an arbitrary event (not neccessarily smallest timestamped event)
Constraints for a general purpose simulation engine:
- maximum number of events required at one time unknown –> memory allocation issues
- Sequence of Insert and Delete operations unknown
- Distribution of timestamps on successive Insert operations unknown
Data Structures:
- linked lists
- Single linked list
- Double linked list (DeleteArbitrary Fast)
- Sorted vs. unsorted list
- Tree based
- Heap
- Splay tree
- Hashing schemes
- Calendar queue
- Ladder queue
- Hybrid
- Henricksen’s algorithm
Performance Comparasion
- Linear linked list fastest for small sized list
- Tree-based data structures (e.g., heap, splay tree) give predictable (O(log N)) performance, though often not the fastest available approach
- Hashing-based priority queues can yield constant time (O(1)) insertion and deletion operations
- Calendar Queue data structure offer best performance (constant time insert/delete) for many applications, but can yield surprisingly poor performance in some cases
- Ladder Queue addresses unstable property of calendar queue
- More efficient data structure needed for large list
- Results vary for different machine architectures
05. World Views for Discrete Event Simulation
Simulation Model World Views:
The world view provides a perspective and development approach for the simulation model over the same set of discrete events
- Event-oriented –> simple, efficient
- Simulation program written as a set of event handlers
- Expressed in terms of future events (events schedule new events)
- Uses Future Event List to advance simulation time
- Somewhat “lower level” abstraction than other world views
- Process-oriented –> simplifies model development, maintainence and modification of simulation code –> requires threading, additional complexity and computation overhead to suspend and resume simulation processes
- Simulation program written as set of processes (usually consumer entities)
- Each process a series of activities
- Uses Future Event List to advance simulation time
- Used in many simulation languages, especially in U.S.•
- Activity scanning –> can be inefficient; used but not as common
- Simulation program is written as a set of activities
- Simplest implementation involves incrementing time in small steps, testing the pre-conditions of activities at each time step
- More complex implementation uses Future Event List (FEL)
- Popular in Europe
06. Creating Random Numbers
Random vs. Pseudo Random Numbers
- Truly random numbers can be generated by hardware devices, however, it is not repeatable.
- For most simulations, pseudo random number generators are used.
- Deterministic procedure to generate streams of random numbers
- Output looks like random
- Stream satisfies statistical tests for randomness
Properties (random number used in simulation)
- Randomness: uniformly distributed over (0,1) without correlation among the samples. Produce output satisfying statistical tests of randomness.
- Repeatability: ability to reproduce
- Portability: easy to move RNG to new machines
- Efficiency: time and memory requirements
- Multiple streams: can be used to generate multiple, uncorrelated streams of random numbers
- Documentation: theoretical analysis and extensively tested
Lehmer’s algorithm, a special case of Linear Congruential Generator (LCG)
\(x_{i+1} = (a*xi + c) mod m\)
when c is zero –> Lehmer’s Algorithm –> known as multiplicative LCG
- Lehmer can not produce value 0, if it did, all subsequent numbers in sequence will be zero.
- Quality of the RNG is dependent on the good choice for a and m.
A good explanation of pseudo rng LCG can be found here
Check the quality of RNG
Output should be independent and uniformly distributed over (0,1), select parameters to improve quality, can pass statistical test (e.g Chi-square), can pass software suites tests, (e.g battery pf tests)
- However, passing statitical tests do not prove the RNG is valid, it only gives confidence of validity.
- Generators such as the Mersenne Twister can produce very long periods, but require more memory
- Many streams needed for some applications, e.g., when using parallel processing
07. Random Variates
Random Variate:
an algorithmically generated realization of a random variable with pdf p().
Generation Method:
- Inverse distribution method applies in general for any distribution where a closed form equation for the CDF can be determined.
- discrete random variates:
- E.g. Equilikely(a,b), Bernoulli(p), Geometric(p) …
- Continuous random variates:
- Concepts for discrete random variate carry over to continuous
- Generation process:
- u = random(u)
- return \(F^{-1}(u)\), \(F^{-1}\) is the inverse distribution function.
- discrete random variates:
- In many cases the inverse cdf is not known, Rejection-Acceptance Method may be used if probability density function is known.
- Generate two samples, u1 and u2from the uniform distribution on [0,1].
- Compute x’ = a + u1 (b – a)
- Compute y’ = c u2
- If y’ ≤ f(x’) then accept x’ as a valid sample, otherwise repeat the process with the generation of two new samples
- How many rejected trials: Dependent on the relative portion of the abc rectangle filled in by the density function.
- Probability a draw will be accepted is 1 / [c (b – a)] since the area under f(x) is 1
- Empirical distributions useful if data does not fit a known distribution
Rejection-Acceptance Method
For some distributions, inverse distribution functions may not be readily available, but the probability density function (f(x)) is known.
Rejection Acceptance Method offers an alterative approach.
Shares similar features to the Monte-Carlo methods, e.g., for evaluating integrals
Assumption:
- Probability density function f(x) is available
- Density function bounded so a ≤ x ≤ b and f(x) ≤ c
08. Verification and Validation
Definitions:
- Verification :
- Ensuring that the computer program of the computerized model and its implementation are correct.
- Validation:
- Substantiation that a model within its domain of applicability possesses a satisfactory range of accuracy consistent with the intended application of the model.
- Credibility:
- Ensure sufficient confidennce in the model to use it and act based on predictions made by the model.
- Developing in the interested parties sufficient confidence that they believe in the model and information derived from the model.
- Usability:
- Determining that the model and its user instructions are easy to use.
V&V in Model Development:
- Compared with problem entity (System) or involving data: Validation Process
- Conceptual Model VS. Problem Entity : Conceputual Moel Validation (Analysis and Modeling)
- Computerized Model VS. Problem Entity: Operational Validation (Experimentations)
- Data Validity is a process links all three systems. (Problem entity, Computerized Model, Conceptual Model)
- Computerized Model VS. Conceptual Model: Verification Process (Computerized model verification)
Verification:
Did I build the model right?
Concerned with the “correctness” of the simulation program
Compares the simulation model with the conceptual model
Involves software engineering, software debugging
Simulation Languages Greatly affects the time and effor for verification (including development time, model execution speed)
Some types and examples:
- Special purpose simulation languages:
- VHLD(circuits), AutoMod(Manufacturing), Opnet,ns3(telecommunications), …
- General purpose simulation languages:
- Simula, Arena, Modelica, …
- Genral purpose programming languages:
- C, FORTAN, …
Verification Techniques:
- Static techniques:
- Structured walk through, by development team
- Program verification, formal methods, proof
- Dynamic techniques:
- Simulation languages: verification of model components, RNG
- GP langugage: structured programming, module testing
- Consistensy tests within software (e.g. counters)
- Test/Validate simple configurations (e.g. queueing network with known analytic solutons)
- Trace analysis
- Model animation
Validation
highlights:
- within its domain of applicability
- satisfactory range of accuracy
- intended application of the model
Thus,
Did I build the right model? –> depends on the goals of the simulation study.
1. Data Validation
Data needs:
support three validation processes (mentioned earlier, linked all three systems, conceptual, computerized(simulation model), problem entity)
- Support Conceptual model validation (e.g. test assumptions)
- Support Simulationn model validation
- Experimentation (e.g. defining typical scenarios)
Issues:
- Collecting data is time consuming, expensive
- Data always has errors
- Are they valid?
- Check for consistency among data
- Missing data is a major problem in simulation studies.
2. Conceptual Model Validation (Conceptual VS. Problem entity)
- Determine that the theories and assumptions underlying the conceptual model are correct
- Independence assumptions, stationary assumptions, test parameters, test input distribution , Statistical results
- Determine that the model’s representation of the problem entity and the model’s structure, logic, and mathematical and causal relationships are reasonable for the intended purpose of the model
- Face validation (reviewed by subject experts), Strutured walk through, Traces( trace entities (e.g. customers) throughout the simulaiton)
3. Operational Validation (Computerized vs Problem entity)
Determining that the model’s output behavior has a satisfactory range of accuracy for the model’s intended purpose over the domain of the model’s intended applicability.
Decision Approach:
- Subjective: (Behavior Explore)
- Observable System: Explore model behavior, Comparison using graphical displays
- Non-Observable System: Explore model behavior, Comparison with other models
- Objective: (Statistical tests)
- Observable System: Comparison with physical system using statistical tests
- Non-Observable System: Comparison with other models using statistical tests Here Observable or Non-Observable refer the problem entity’s type.
How to explore the model behavior?
- Qualitative: Directions of outputs are examined and also possibility whether the magnitudes are ‘reasonable.’
- Quantitative: Both the directions and the precise magnitudes of the outputs are examined.
- SUBJECT MATTER EXPERTS: Usually know directions and often the ‘general values’ of the magnitudes of the outputs.
- they predict outputs and examine the models outputs.
- PARAMETER VARIABILITY – SENSITIVITY ANALYSIS:
- Use graphs to display data
- Use Animation and Graphics of system operation
- Use numerous sets of experimental conditions.
09. Output Analysis
Stochastic Simulations (random variables):
- Result of each simulation run is one sample of a random variable.
- Multiple runs are needed to get meaning ful information from the simulation.
- Statistics: mean of output metric
- Confidence Interval Bounded Horizon Studies (end time pre-specified)
Steady State Studies (end time is a parameter)
Output Variables
- Point set output variables (PSOV)
- Time variables (e.g. Individual wait times)
- Sample variable (entity attributes)
- Derived scalar output variables (DSOV)
- Scalar derived from the PSOV’s (e.g. average wait time)
- Output variables are random variables, thus a simulation run provides a sample value
Output Data from Multiple Runs
- The output statistic (DSOV) is a random variable with some probability distribution
- Perform n simulations runs; each produces a sample of the DSOV
Bounded Horizon Study:
Characteristics:
- Observation interval is well defined (implicit and explicit)
- Transients in the stochastic processes are common
- The simulation is run n times to produce n independent values for the DSOV
Objectives: Determine
- Point Estimate : Mean of the DSOV output values
- Interval Estimate: Confidence Interval for the point estimate or margin of error for the point estimate
Adjust n to get desired “confidence” of estimate statistics.
Point Estimate:
output sample mean for n runs of simulation.
Observation: The point estimate is itself a random variable provided the simulation runs are independent (different initial seeds for the random number generators used for each simulation run)
Confidence Interval:
a statement on the relationship between the sample mean and the true mean. More samples yield a smaller CIMost values lie outside the CI !
Central Limit Theorem:
The sample mean random variable \(\bar{y}(n)\) is normally distributed with mean μ and variance \(σ^2/n\) as \(n -> ∞\) independent of the distribution of \(y_i\).
The goal is to determine an interval such that the probability the true mean μ lies within that interval is C .
- Don’t know μ and \(σ^2\) (could use the sample mean/variance to approximate)
- n must be large; depends on underlying distribution of Y.
Student’s T-Distribution
“T statistic” Works better than sample mean method when n is small.
\(T(n) = \frac{\bar{y}(n)-\mu}{\sigma/\sqrt{n}}\)
- Use sample mean for \(\mu\)
- Use sample variance for \(\sigma^{2}\)
- Random variable \(T(n)\) will have the students t-distribution with n-1 degree of freedom.
T distribution less peaked and has longer tails than normal distribution.
Yields wider confidence intervals than normal.
Steady State Studies:
Warm-up Period : transient period from initialization until simulated system reaches steady state (Typically simulation doesn’t start collecting output statistics until the warm up period is over)
Can use Welch’s Method one approach to define the warm-up period, to determine when output stochastic process has reached steady state.
Welch’s Moving Average Method - Windows:
Rather than simply plotting \(\bar{a_i}\):
plot average of:
- \(\bar{a_i}\),
- the previous \(\bar{a_i}\) values,
- the next window \(\bar{a_i}\) values
Can avoid choppy plots, and visually observe the warm-up period ends.
CI for steady State Studies
- Can increase the number of replications to try to reduce the size of the confidence interval
- Delete warm up period for each run
- Alternatively, can increase the length of each run
- Define a right-hand boundary to collect sufficient data for providing meaningful DSOV
- Apply techniques used in bounded horizon study, except increase run length rather than number of replications to control CI size
More Efficient Approach: Batch Means Method:
Replication approach: warm up period required for each replication
Use one single long run instead (reduces compute time)
Divide the long run into time cells sufficiently long so that values for output variable are IID
- Each time cell referred to as a batch
10. Parallel and Distribution Simulation - CMB Algorithm
Motivation:
- requires too much time to complete a single simulation run
- too much time to complete a single simulation run
- Requires more memory than available on a single computer
- Requires geographically distributed people or resources
- Need to interconnect separately developed simulators, e.g., to model systems-of-systems
Time terminology
- physical time: time in the physical system
- Noon, December 31, 1999 to noon January 1, 2000
- simulation time: representation of physical time within the simulation
- floating point values in interval [0.0, 24.0]
- wallclock time: time during the execution of the simulation, usually output from a hardware clock
- 9:00 to 9:15 AM on September 10, 1999
Synchronization Algorithms
- Conservative synchronization: avoid violating the local causality constraint (wait until it’s safe) but prone to deadlock
- deadlock avoidance using null messages (Chandy/Misra/Bryant)
- deadlock detection and recovery
- synchronous algorithms (global synchronization points)
- Optimistic synchronization: allow violations of local causality to occur, but detect them at runtime and recover using a rollback mechanism
- Time Warp (Jefferson)
- numerous other approaches
Chandy/Misra/Bryant CMB algorithm (avoid deadlock) by using “NUll Message”
Assumptions:
- logical processes (LPs) exchanging time stamped events (messages)
- static network topology, no dynamic creation of LPs
- messages sent on each link are sent in time stamp order
- network provides reliable delivery, preserves order
Observations:
The above assumptions imply the time stamp of the last message received on a link is a lower bound on the time stamp (LBTS) of subsequent messages that will later be received on that link
lookahead
is a constraint on the timestamp of new messages the LP can send, Lookahead is necessary to enable conservative algorithms to achieve good performance.
Has large effect on performance: essential for concurrent processing of events for conservative algorithms
Avoid deadlock:
each LP send “null” messages indicating a lower bound on the timestamp of future messages.
Livelock will be caused if lookahead = 0
un-ending cycle of null messages where no LP can advance its simulation time
How to avoid time creep?
Each LP guarantees that if no additional events are delivered to the LP with TS < T, all subsequent messages that LP produces have a time stamp at least T + L (L = lookahead) where T is the timestamp of the LP’s next local event.(Use time of next event to avoid lookahead creep )
11. Performance Bounds and Synchronous Algorithms -YAWNS Algorithm
YAWNS Assumption:
- Unlimited number of processors are available
- Sending and receiving a message requires no time, and communication latency is zero (instantaneous communications) “Zero overhead” assumption
- Each event computation requires one unit of wallclock time (not to be confused with simulation time!) to complete
- A single event computation cannot be distributed across multiple processors Why we do this: Useful to determine if it is worth trying to parallelize
Critical path | minimum execution time
The longest path(s) is referred to as the critical path, and the length of this path is called the critical path length Minimum execution time = critical path length
YAWNS is a Conservative Synchronization with the cons:
- The dependence graph is not known in advance We do not know what new events will be created We do not know what events will be later received from other LPs
- Conservative algorithms devise ways to constrain what new events can be created (lookahead) in order to determine which events have no predecessors
- In general, conservative algorithms cannot fully exploit the parallelism available in the computation because one cannot immediately determine that an event has no predecessor events; they block LPs until we are sure
Implementing Barriers - LBTS
- Barrier: Using a Centralized Controller: O(n), Controller must send and receive N messages
- Broadcast Barrier O(n): One step approach; no central controller Each process: Broadcast message when barrier primitive is invoked Wait until a message is received from each other process N (N-1) messages O(n)
- Tree Barrier Performance: Requires 2 * log2 N steps to complete the barrier
Requires 2 * (N-1) messages - Butterfly Barrier:logN steps including notification barrier achieved N (logN) messages
General Approach with YAWNS
WHILE (unprocessed events remain)
store messages received in previous iteration in FEL
LBTS = mini (Ni + LAi) // global minimum
process events with time stamp ≤ LBTS
barrier synchronization
endDO
LBTS = a lower bound on the time stamp of any messages an LP might receive in the future
12. Optimistic Synchronization : Time Warp (Part I) :Local Control (Jefferson)
Assumption:
- Logical processes (LPs) exchanging time stamped events (messages);
- Dynamic network topology, dynamic creation of LPs OK;
- Messages sent on each link need not be sent in time stamp order;
- Network provides reliable delivery, but need not preserve order;
- Assume each event has a unique timestamp (implies the timestamp of a new event is strictly greater than that of the event scheduling it)
Basic idea:
detect out-of-order event execution and use a rollback mechanism to recover
- Process events w/o worrying about messages that will arrive later;
- Detect out of order execution, recover using rollback.
Local Control:
- Do not discard processed events
- add rollback When rollback initiated
- Restore to prior state(2)send anti-message
About anti-message:
- Used to cancel a previously sent message
- Each positive message sent by an LP has a corresponding anti-message•Anti-message is identical to positive message, except for a sign bit
- When an anti-message and its matching positive message meet in the same queue, the two annihilate each other (analogous to matter and anti-matter)
- To undo the effects of a previously sent (positive) message, the LP need only send the corresponding anti-message
- Message send: in addition to sending the message, leave a copy of the corresponding anti-message in a data structure in the sending LP called the output queue
Processing incoming messages (Three cases):
Case I: Corresponding message has not yet been processed
–>: Annihilate message/anti-message pair Case II: Corresponding message has already been processed
–>: Roll back to time prior to processing message (secondary rollback) –>: Annihilate message/anti-message pair Case III: Corresponding message has not yet been received
–>: Queue anti-message –>: Annihilate message/anti-message pair when message is received
Property 1 Rollback Propagation:
Rollback always forward, as A–>B, T(B)>T(A), if A canceled or rollback, B canceled or rollback. Secondary rollback always cancel future events.
Property 2 Forward Progress:
Unprocessed events at smallest TS will not be later rolled back. –>progress forward, but does not guarantee termination
Rollback thrashing:
a behavior where Time Warp spends most of its time executing and then rolling back computations
13. Time Warp (Part II): Global Control GVT - Enhancement in addition to the Local Control
Why GVT needed?
Besides rollback mechanism in local control, also want to realize (1) Reclaim memory for old states/events (2) Perform irrevocable operations.
Provide the lower bound time stamp of future rollback can solve these.
GVT Definition:
Global Virtual Time (GVT) is defined as the minimum time stamp of any unprocessed (or partially processed) message or anti-message in the system.
GVT provides a lower bound on the time stamp of any future rollback.
- Storage for events and state vectors older than(less than timestamp) GVT can be reclaimed
- I/O operations with time stamp less than GVT can be performed.
Synchronous vs. Asynchronous GVT Algorithms:
Synchronous GVT algorithms : LPs stop processing events while a GVT computation is taking place LPs resume once GVT computation complete (e.g., new GVT value is distributed)
Asynchronous GVT algorithms: LPs can continue processing events and schedule new events while the GVT computation proceeds “in background” Asynchronous is preferable from performance standpoint.
Alg1: Difficulties, and Samadi Alg with “acknowledgment”:
Difficulties:
- Transient message problem: messages sent but not yet received must be considered in computing GVT.
- Simultaneous reporting problem: different processors report their local minima at different points in wallclock times, leading to an incorrect GVT value.
Samadi Algorithm:
- Send an ack for each event messages & anti-messages received
- “Mark” acks sent after the processor has reported its local minimum until a new GVT value is received
Algorithm:
- Controller broadcasts “start GVT” message
- Each processor reports minimum time stamp among (1) local messages, (2) unacknowledged sent messages, (3) marked acks that were received
- Controller computes global minimum as GVT value, broadcasts new GVT
Characteristics:
- Correctly computes GVT values
- Requires a central controller, Limits scalability to large number of processors
- Requires acknowledgment messages–In practice, communications is expensive (time consuming)
Alg2: Distributed Snapshot (A better way):
Computing GVT would be trivial if we could capture an instantaneous global snapshot of the computation.
consistent cut:
cut point: divides an LP computation into past and future parts
cut: set of cut points, one per LP
cut message: a message that was sent in the past, and received in the future part of the computation
consistent cut: a cut where all messages crossing the cut are cut messages
cut value and GVT:
cut value: minimum among (1) local minimum of each LP at its cut point and (2) time stamp of cut messagesThe cut value is a valid value for GVT
GVT Algorithm Overview:
Step 1: Each LP sets a cut point - Compute local minimum of the timestamp of unprocessed messages within that LP
Step 2: Compute a global minimum
Step 3: If a cut message is later received, recompute global minimum
Step 4: Done when all cut messages have been received
GVT never goes back:
Proof:
Recall the definition of GVT(W): the minimum timestamp unprocessed or partially processed message or anti-message in the system at wallclock time W. To prove this, show that if GVT has a value of T, no new messages will be later created with timestamp less than T. GVT is based on the smallest timestamped event. This event could be a (1) a positive message with timestamp T in an LP’s input queue, (2) a transient positive message with timestamp T, or (3) a transient (not yet processed) anti-message with timestamp T. (1) Processing a positive message with timestamp T can only result in the generation of another positive messages with timestamp > T. (2) A transient positive message with timestamp T could cause a rollback to time T, but the antimessages sent, and the positive messages that now become unprocessed events due to the rollback must have timestamp larger than T, so again no new messages with timestamp < T are created. (3) An anti-message at time T will cancel an event with timestamp T, and could cause a rollback to a time greater than T. But again, just as in the previous case no new positive message or anti-message that is created will have a timestamp less than T. In all three cases, no new event can be created with timestamp < T, so any subsequent GVT values must be greater than or equal to T.
14. A comparision with conservative and optimistic synchronization methods
Conservative Algorithms:
Pro:
- Good performance reported for many applications containing good lookahead (queueing networks, communication networks, wargaming)
- Relatively easy to implement 3.Well suited for “federating” autonomous simulations, provided there is good lookahead
Con: - Cannot fully exploit available parallelism in the simulation because one must protect against a “worst case scenario”
- Lookahead is essential to achieve good performance
- Writing simulation programs to have good lookahead can be very difficult or impossible; changes in model may greatly reduce lookahead, leading to code that is “fragile” (difficult to maintain)
Optimistic Algorithms:
Pros:
- Good performance reported for a variety of application;
- Potential for a general, application independent parallel simulation engine; Cons:
- In practice, application development cumbersome;
- Overheads (state saving, rollback) may degrade performance (many solutions exist, but no guarantees);
- When federating existing simulations, rollback capability must be added to the simulator
15. System-of-Systems Simulationand the High Level Architecture
High Level Architecture (HLA)
The HLA consists of:
- Rules thatsimulations (federates) must follow to achieve proper interaction during a federation execution
- Object Model Template (OMT) defines the format for specifying the set of common objects used by a federation (federation object model), their attributes, and relationships among them
- Interface Specification (IFSpec) provides interface to the Run-Time Infrastructure (RTI), that ties together federates during model execution
HLA Time Management
Federates have different local time advance mechanisms, A federate should not need not know what time advance mechanism other federates are using (time stepped, event driven, parallel – optimistic or conservative).
Approach:
- Each federate is an LP (could be multiple LPs within a federate)
- Each federate could be a parallel simulator
- Synchronization algorithm implemented (mostly) within the RTI
HLA Time Management Summary:
- Interoperability between time stepped and event driven simulations
- Support for conservative and optimistic federates