Performance Evaluation of Database Systems Using Colored Petri Nets

ةصلاخ : ةنولملا يرتب تاكبش مادختساب كلذو ةيزآرملاو ةرركملا تانايبلا دعاوق مظنل جذامن ميمصتب نوثحابلا ماق ةقرولا هذه يف . رد ءارجإ مت دقو جذامنلا هذه ءادأ نيب ةنراقملل ةضيفتسم ةسا . ةساردلا نم بولطملا ضرغلا تققح دق اهميمصت مت يتلا جذامنلا نّإ ةساردلل تعضخ يتلا تانايبلا دعاوق مظنل ةيكيمانيدلا تاريغتملا ةيطغت ثيح نم . ءارجإب ةحرتقملا ميماصتلا ةحص قيقدت مت دق و إ مّت يتلاو ةيلمع ةاآاحم براجت ةدع يساوحلأ ةكبش ةعرسو جلاعملا ةعرس اهنم ةريغتم طباوضل اهعاضخ ب يتيلمع نيب ةبسنلاو ةمظنلأل ةباتكلاو ةءارقلا . يف اهنم ىلعأ ءادأ ةءافكب عتمتت ةرركملا تانايبلا دعاوق مظن نّأ ىلإ ريشي لاولدم ةاآاحملا جئاتن تطعأ دقو نمز يلإ ةبسن كلذو ةيزآرملا تانايبلا دعاوق مظن لاح اهيف عيطتسي لا يتلا عبشتلا ةلاح يف اضيأو ةلسرملا تاملاعتسلاا ىلع درلا تلاخدملا يف ةدايز ةيلأ ةباجتسلاا ماظنلا . ةعرس نوكت امدنع ةيزآرملا تانايبلا دعاوق مظن ءادأ يف يبسن نسحت دصر مت نكلو ةرركملا تانايبلا دعاوق مظن عم ةنراقملاب ادج ةريبآ تاكبشلاو جلاعملا .   ABSTRACT: We propose Colored Petri Net (CPN) models for replicated and centralized database systems and conduct a comparative study of their performance. The designed CPN models capture the dynamics of the studied database systems and estimate their expected performance with an appropriate level of abstraction. A number of simulation experiments were performed under various load conditions of varying parameters such as server speed, network speed, and read/write ratio. The simulation results show that under similar operation conditions, replicated systems exhibit higher performance than centralized systems in terms of query response time and system saturation levels. However, centralized systems become more competitive when their network and server speeds are much higher than those of the replicated systems.


Introduction
T he traditional setting of database systems is based on the centralized client-server architecture.
In such systems, a central server handles requests coming from a number of remote clients.A more recent development consists of deploying multiple service points keeping a copy of the data at each service point (replica).Database queries are submitted to replicas independently.The service replicas cooperate in servicing the queries and maintaining data consistency.Database queries are either read queries processed locally or write queries propagated to all replicas.
The replicated setting can be a viable alternative for many reasons.Firstly, it can offer a distributed base for data that can be scaled up and down to meet varying needs and demands.Secondly, it increases data availability, system robustness and fault tolerance.Finally, it allows deployment of clusters of workstation that offer aggregate computing power and storage capacity.
On the negative side, replicated database systems may impose substantial communication overhead especially with a high ratio of write queries resulting in poor query response time.The aim of this study is to investigate the conditions under which replication is a more attractive solution than centralization in terms of performance.
The type of consistency control method used affects the performance of the replicated system.Two broad models of consistency control are known in the literature.The first one is the asynchronous model (also known as the lazy update model) where changes introduced by a transaction are propagated only after the transaction has been committed (Wiesmann et al., 2000).The second one is the synchronous model (also known as the eager model) where a transaction should not commit until full synchronization with all replicas is completed (Wiesmann et al., 2000).
It is argued that synchronous models are hardly feasible in practice due to problems related to synchronization delays, deadlock avoidance, and scalability (Gray et al., 1996).However, it is asserted in Wiesmann et al., (2000) that group communication primitives may be a solution for building good performance synchronous replication models (Kemme and Alonso, 2000;Kemme and Alonso, 1998;Pedone et al., 1997;Pedone et al., 1998).A number of consistency control methods have also been studied in the literature (Beeri et al., 1989;Guerraoui and Schiper, 1997;Wiesmann et al., 1999;Day et al., 2001).
In this study we assume asynchronous replication with minimum communication overhead with only one update message broadcasted to all replicas for each write query.This assumes a fault-free system in order for the updates to be delivered to all replicas.Modeling fault-tolerant consistency control protocols is beyond the scope of this paper.The targeted performance evaluation is conducted using Colored Petri Nets (CPN).CPNs represent powerful means for modeling and prototyping complex, distributed, and parallel systems (Jensen 1992(Jensen , 1994)).CPNs were used successfully for modeling various systems such as VLSI chip design (Shapiro 1991), communication protocols (Mnaouer et al., 1999;Mnaouer et al., 2002;Morera and Gonzalez, 1999), and algorithm analysis (Jorgensen and Kristensen, 1999).
We propose two CPN models for centralized and replicated database systems.The CPN models are used for simulating the two systems and comparing their query response times and system saturation levels as functions of the load under various conditions of network latency, query service time, and query read/write ratio.
The paper is organized as follows.In the next section, a description of the evaluated centralized and replicated system models is given.Section 3 gives an informal overview of Colored Petri Nets followed by a detailed description of the designed CPN models used in the simulation.The simulation results are presented and discussed in section 4. Concluding remarks on the work are provided in section 5.

Overview of the centralized and replicated database system models
In a typical centralized database system, a database server is connected to a number of clients via a switched network.Database queries sent from clients to the server are subject to two communication delay components: a client-switch (τ cs ) delay and a switch-server delay (τ ss ).In real situations, the switch-server link can be 1, 10, or even 100 times faster than the client-switch link.In a fully replicated client-server database system, each site runs a replica of the database server, hence acting as a server as well as a client.That is the case when all the sites are included in the replication scheme.Read database queries are processed locally while write queries are processed both locally and remotely at all replicas.Propagation of write queries is performed via a switched network.Figure 1 shows the components of the centralized and replicated database systems.
In Figure 1.a the arrival of transactions to the server, in the centralized system, is assumed to form a Poisson process of average arrival rate Nλ transactions per second (where N represents the number of clients involved in the system).The service time for a transaction at the server is assumed to be a random variable corresponding to an exponential distribution of the service time with average service rate of µ c transactions per second.Let τ cs be the average communication delay between a client and the switch and τ ss the average communication delay between the switch and the server.The message switching time at the switch is assumed negligible.Figure 1.b shows the components of the replicated database system.We do not consider any particular replication control method; however we assume asynchronous replication with minimum inter-replica communication with only one update message broadcasted to all the replicas for every write transaction request.The assumption is realistic in the sense that many commercial database replication systems opt for the asynchronous replication model, and therefore, allow transient inconsistent states of the database system to exist (Kemme and Alonso, 2000).We wanted also to analyze the performance of replicated versus centralized database systems subject to the variation of server speed, network speed, and read/write ratio parameters while isolating the performance issues related to the application of more complex consistency control mechanisms.In this system each of the N processors (replica) is holding a copy of the database and serves both as a client and as a server.An average of λ transactions per second are assumed to be generated at each replica.A portion wλ of this rate corresponds to write transactions that have to be broadcasted to all processors.The remaining (1-w)λ rate corresponds to read transactions that can be processed locally.Since each write transaction is propagated to all processors, the total arrival rate of write transactions to each processor is Nwλ.The overall arrival of transactions to each replica (read or write), is assumed to follow a Poisson process.The service time for a transaction at each processor is assumed exponentially distributed with average service rate of µ r transactions per second.The average replica-to-switch communication delay is assumed the same as τ cs of the centralized model.

Colored petri nets
We start by presenting an informal overview of CPNs before describing the details of the proposed CPN models for the centralized and replicated database systems.

Overview of colored petri nets
A Petri net is a network of interconnected locations and activities, with rules that determine when an activity can occur, and specify how its occurrence changes the states of the associated locations.Petri Nets can be used to model systems of any type.They are particularly useful in facilitating the design and analysis of complex distributed systems that handle discrete flows of objects and information (Jensen 1992(Jensen , 1994)).
CPNs represent an extension of Petri Nets.They are graphical models that use the concept of colored tokens to represent data structures and state conditions.The presence of data or state conditions is marked by colored tokens in locations.Each token has an associated token color that specifies the type of data it is representing (usually representing arbitrary complex data values).
The locations are represented graphically by ellipses and called places.Each place is associated with a color set that specifies the type of tokens (i.e., data) that may reside in the location.Activities are represented by rectangles called transitions, which govern the occurrence of events in the system.Places can be either input or output places for a transition.Places and transitions are linked through directed arcs modeling the flow of data.Each arc has an associated arc expression that controls the transition's occurrence.This expression specifies the number of tokens consumed by the transition, or the number of tokens that is produced after its occurrence.
When the number of tokens in each input place of a transition satisfies the corresponding arc expression, then the transition is said to be enabled.An enabled transition can fire (i.e., occur) at any time.When it fires, it consumes as many tokens from its input places, as specified by their corresponding arc expressions, and produces as many tokens in its output places as specified by their corresponding arc expressions.Additional conditions for the enabling of the transition can be specified through the guard of the transition.All the Boolean conditions specified in the guard must evaluate to true for the transition to be enabled.
CPN modeling and simulation is supported by various simulation packages such as the Design/CPN tool (Jensen et al., 1996) used in this study.In this tool, the different parts of a CPN model are constructed in different CPN pages.This helps making use of the CPN hierarchy constructs that enable the designer to breakdown the complexity of the modeled system into different layers with different abstraction levels.In the Design/CPN there is a provision for using a declaration node that can be used to record the color sets, constants, variables, and function definitions.Figure 2

An arc expression @+5
A time region The guard of the transition specifies that Ordent should be bound only to tokens having the value Big.The binding means that the variable Ordent of the arc expression from the place Order In is substituted by any token that is part of the current marking of the place subject to the constraint of the transition guard.An arc expression is evaluated against the current marking of (i.e., current distribution of tokens on) the input places of the transition.In the displayed (initial marking) state the transition may be enabled if the CPN is simulated, since at the start of the simulation the initial marking of places becomes a current marking for the CPN.When the transition occurs it will consume the token Big (of which only one instance exists in Order In) and produces two instances of Big into the output place.Place fusion is one hierarchy construct that enables a place to be present physically on different CPN pages while representing a single conceptual place.All the physical places, in this case, are known to belong to the same fusion set.There can be two types of places that may belong to the same fusion set that helps construct hierarchies.Socket places and port places.Socket places are found one hierarchy level and they are mapped to port places belonging to a CPN described in a lower level of the hierarchy.Thus, this mapping of socket places and port places ensures the connection between CPNs constructed at different hierarchical levels.The place Order Out is declared as a socket place that will be used to connect to another CPN page in which a more complete shipping process with additional operation is described.Substitution transition (ST) is the second hierarchy construct that enables to hide lower level design details into the lower abstraction level.The ST construct allows the designer to relate a transition (and its surrounding arcs) in an upper level to a more complex activity (modeled by a detailed CPN) hidden in a lower level of the hierarchy.The transition Process Order can be designed to represent a ST that hides more elaborate details of product shipping that is hidden in the current level of the model.
Finally, the concept of time stamps that may be associated with timed tokens is used for the purpose of performance evaluation.Each timed token will bear an associated time stamp at each creation in accordance to a global clock maintained by the simulator.In Figure 2, the transition Process Order has an associated time region (denoted by @+5) that specifies that the occurrence of the transition takes 5 time units.The associated timestamps of the produced Big tokens will be augmented with 5 units.See Jensen (1992Jensen ( , 1994Jensen ( , 1996) ) for more details.

CPN models for the database systems
A top down approach was adopted in the modeling.A top page that models the overall CPN layout is constructed.In this top page, a single ST transition represents the server, and a set of ST transitions represent the different clients connected to the server.Figure 3a and Figure 3b    Notice that in Figure 3a, the communication goes from the clients to the server through the switch node.Clients usually get replies to their queries back from the server.However, to simplify the model, computation of the delay is done at the server considering a round trip delay.In Figure 3b, the communication goes from switch to clients.This abstracts the fact that clients broadcast their write queries to all the other clients to maintain consistency.
Figure 4 shows the declaration node used for both models where color sets, variables, function declarations and definitions are given.
Figure 5 shows the CPN page modeling a generic client operation in the centralized model.Transition Gen_Req represents query request generation.The place GEN is used as a time trigger that initiates query generation.Its token color is TR, representing a timed token color.A token t is generated after each firing of Gen_Req producing a time stamp calculated using the texp(1.0/iar)function that generates a random variable exponentially distributed with mean iar (i.e., the interarrival time).The generated token is specified by the expression of the arc connecting the transition Gen_Req to the place Generated.Notice that the size of the query (field sz) is also exponentially distributed with mean M (M is set to 100 KB considering big size transactions).The transition Send_Req is used to send the queries to the Switch place.It is at this transition that the transmission delays are applied (considering both directions) in the time stamp denoted by the mark @.The transmission depends on the query size.
The place Switch is a fusion place that links to the CPN page modeling the server depicted in Figure 6.In Figure 6, the place Idle and the transition Process represent the availability of the server.The transition Recv-ReQ is used to model reception of queries from clients that are treated according to a FIFO policy, enforced as follows: any incoming query is assigned a request Id (i.e., field rid) extracted from the token on the place GID2.The FIFO enforcement is done at the transition Process using its guard (i.e., [#rid q = i]).
The place GID4 is acting as a semaphore with its single token consumed through the variable i by the transition Process when the processing of a new query is started.A new token pointing to the next query in sequence is produced by the transition Comp-del when the processing and delay computation of the previous query is finished.Then, the next query is processed.Figure 7 describes the CPN model of the switch in the replicated model.Read queries are processed locally by clients.Write requests are propagated through the switch to all other nodes.In order to simplify the model, the write queries are issued by the switch and sent to all nodes.
The place Idle and the transition ReQ-Gen represent the state where the switch is available.The transition ReQ-Gen generates requests according to a Poisson process regulated using the place GEN with a timed token t.After each firing of ReQ-Gen a new request is generated into the place WR-Req that includes a request identifier (i.e., field rid), a generation time (i.e., field GT loaded with current time) and a query size.The transition produces also a token t with a time stamp computed using the function texp(10.0/iaw).The inter-arrival time of write requests is denoted iaw, and 10.0/iaw represents the arrival rate from 10 different clients.
The transition Broadcast sends the write query requests to all clients through the places ToCl1 through ToCl10 (connecting to other CPN pages modeling clients).This allows clients to update their replicas.The delay component applied to the transition Broadcast represents twice the time the client switch communication delay (i.e., tcs * #sz q).

Simulation results
A set of simulation experiments were conducted.Measures of query response time as a function of query arrival rate are obtained.In the first set we have varied the processing speed of the central server (µ c ).In the second set, different network connection speeds (τ ss ) between the server and the switch are studied.In the third set, we have investigated the effect of the read-write ratio on the performance of the replicated system.Along with the plots standard deviations of the means and 95% confidence intervals were computed for three selected points for each plot.

Simulation set I
In this first experiment we plot the query response time in the centralized system (T c ) and in the replicated system (T r ) as functions of the query arrival rate (λ) for different server speeds µ c .
The query response time T r of the replicated model (Figure 10.1), is lower than the query response time T c of the centralized model for the three considered server speeds (µ c = µ r , µ c = 5 µ r , µ c = 10 µ r ).
τ cs = 0.1 sec, τ ss = 0.01 sec, µ r = 10 queries/sec, w = 0.2, N = 10 (# of clients).The load saturation level (λ at which T becomes very large due to system congestion) of the replicated model is substantially higher than that of the centralized model in the first case (µ c = µ r ) and are comparable when µ c is set to five times µ r .Only when µ c was set to ten times µ r (Figure 10.4) the load saturation level of the centralized model has become substantially higher than that of the replicated model.We conclude from this experiment that the replicated model performs better than the centralized model unless a very powerful server is used in the centralized model (such as ten times, or more, faster than a replica).

Simulation set II
In this second experiment, we plot T c and T r as functions of λ for different server-switch transmission delays τ ss .We use the following assumptions for this experiment: τ cs = 0.1 sec, µ c = 100 queries/sec, µ r = 100 queries/sec, w = 0.2, N = 10 We plot T c and T r as functions of λ for different server-switch transmission delays τ ss .The client-switch transmission delay is computed based on a 10Mbps client-switch connection.The server-switch connection bandwidth will be tested with the values 10Mbps, 100Mbps, and 1Gbps.
Since we focus here on studying the impact of the communication speeds, we fix the centralized server and all replicas' processing speeds at 100 queries per second.
The response time T r of the replicated model (Figure 11.1) is lower than the response time T c of the centralized model for various server-switch transmission delays τ ss .In all cases the load saturation level of the replicated model is almost three times higher than that of the centralized model.It can be seen in Figures 11.2,11.3,and 11.4 that the communication delay has little effect on the centralized system since each query (or query reply) is transmitted only once.The effect of the communication delay is however greater on the replicated system since write queries are broadcasted to all replicas, especially for high ratios of write queries (w).The effect of w is studied in the next simulation set.

Simulation set III
In this last experiment T c and T r are plotted as functions of λ for different values of the ratio (w) of write queries.The following is assumed for this experiment: τ cs = 0.01 sec, τ ss = 0.001 sec, µ c = 100 queries/sec, µ r = 10 queries/sec, N = 10.Even though we use here for the centralized model a ten times faster server (as compared to the replica speed) and a ten times faster server-switch connection (as compared to the replicaswitch connection), the replicated model was able to perform better in terms of delay and load saturation level, especially for small ratios of write queries (w) (Figures 12.1,12.2,12.3,and 12.4).(Note that Figure 12.1 and 12.2 are the same as Figures 11.3 and 10.1, reproduced here for the sake of the comparison).The gain in response time and saturation load of the centralized model over the replicated model is not as substantial as the invested resources (ten times more powerful server and ten times faster switch-server communication link).Figure 12.4.T r vs. load (replicated: w=0.8).

Conclusion
We have proposed a Colored Petri Net-based performance evaluation of the relative merits of the centralized approach and the replicated approach for network-based database systems.A number of simulation experiments have been carried out.In addition to the well-known benefits of higher reliability and data availability of replicated solutions (as compared to centralized ones), our simulation results have revealed that the replicated solutions may offer smaller query response time and higher load saturation levels.It can also be concluded that in order to compete with replicated systems, the centralized solution requires using much more powerful database servers.
It is to be noted that we have assumed replicated systems with minimum synchronization overhead involving a single broadcast message per write transaction.For more complex synchronization protocols, further investigation of the effect of the consistency control method on the performance of the replicated system is needed.

Figure 1a .
Figure 1a.The centralized model Figure 1b.The replicated model depict the top page for the centralized model and the replicated model respectively.

Figure 3a .
Figure 3a.Centralized model CPN top page.Figure 3b.Replicated model CPN top page.

Figure 3b .
Figure 3a.Centralized model CPN top page.Figure 3b.Replicated model CPN top page.

Figure 4 :
Figure 4: The Declaration node used for both models

Figure 6 .
Figure 6.A CPN page modeling the server operation in the centralized model.
The transition Comp-del and the two places Tot-Delay and Received-FIN are used for computing the accumulated two-way response delay.A snap-shot of the simulation is shown in the Figure displaying token values.

Figure 7 .
Figure 7.A CPN page modeling the switch operation in the replicated model.

Figures 8
Figures 8 and 9 represent the CPN page modeling the clients in the replicated model (part I and part II).In Figure 8, write requests are received through the port place ToCl1.The transitions Merge 1 and Merge 2, are modeling the queuing of local read requests and external write requests into the same queue.The transition Gen_ReR represents the generation of local read requests.The place GID1 serves as a semaphore, used for sequencing requests into the Queue place.The transition Process is used to process requests according to the order of their arrival.The second part of the client model is depicted in Figure 9. Firing the transition Process produces a query token in the place ToCl1_2, and captures the synchronization token on the place Cnt.In addition, the transition Process has an associated timestamp calculated based on the query size.When the timestamp associated with the current token (i.e., query) expires, the token is consumed by the transition Recv_WrR, modeling the end of processing.The transition Recv_WrR, is used to compute the delay incurred by the query (adding up the difference between generation time and arrival time) that is added to the token residing in the place Acc-Delay.Firing the transition Recv_WrR releases the synchronization token into the place Cnt with a value equal to the number of the next query.The places Acc Delay and Recved Req belong to two global fusion sets present in all client models and representing one logical place each.

Figure 8 .
Figure 8.A CPN page modeling the client of the replicated model (part I)
depicts a small CPN diagram used for processing shipping orders.The transition Process Orders has one input place, Order In, and one output place Order Out.The token color Order is associated with the place Order In, and the equivalent token color ProductShipped is associated with the place Order Out.The color set Order is declared to hold Big and Small as data values (see declaration node).A variable Ordent is declared in the declaration node.