BG: A Benchmark to Evaluate Interactive Social Networking Actions

Sumita  Barahmand

Outline

BG: A Benchmark to Evaluate Interactive Social Networking Actions

Sumita Barahmand

2012

Abstract

Abstract BG is a benchmark that rates a data store for processing interactive social networking actions using a pre-specified service level agreement, SLA. An example SLA may require 95% of issued requests to observe a response time faster than 100 milliseconds. BG computes two different ratings named SoAR and Socialites.

Figures (16)

1.a Conceptual data model of BG’s database. Members(userid,username,pw,lastname,gender, dob, jdate,|date,address,email, profilelmage, thumbnail) Friends(userid1,userid2,status) Resources(rid, creatorid, wallUserid, body, doc) Manipulation(mid, modifierid, rid, resourcecreatorid, timestamp,type,content)

Figure 3: SoAR of 3 different systems with view profile and different profile image sizes, I =10,000, 8=100 msec, a=95%, «=y)=0, 6=0.27.

Table 1: Socialite actions and their compatibility with several social networking sites. C.1 View Profile, VP View Profile (VP) emulates a socialite visiting the profile of either herself or another member. Its input iew Profile (VP) emulates a socialite visiting the profile of either herself or another member. Its in put iclude the socialite’s id and the id of the referenced member, U,.. BG generates these two ids using a random umber conditioned using the Zipfian dis tribution of access with a pre-specified! exponent (specified by xperimentalist who is benchmarking a system). Socialite’s id may equal the id of U,., emulating a social sferencing her own profile. Its output is the ite the profile information of U,. This includes U,.’s attributes and the yllowing two aggregate information: U,.’s number of friends, U,.’s number of resources (e.g., images). If 1e socialite is referencing her own profi e (socialite id equals U,’s id) then VP retrieves a third aggreg. iformation: U,.’s number of pending friend invitations. ae . & ee ao ate include the socialite’s id and the id of the referenced member, U,.. BG generates these two ids using a random

nd can no longer satisfy the 100 msec response time requirement. The CASQL alternative is also unable ) meet this SLA because each key-value is larger than 1 MB, the maximum key-value size supported by remcached. This renders memcached idle, redirecting all requests issued by CASQL to SQL-X, producing ero for system SoAR. One may modify memcached to support key-value pairs larger than 2 MB (@=1000 nd each thumbnail is 2 KB) to realize an enhanced SoAR with CASQL. With MongoDB. an imnlementation of I.F may retrieve the confirmed friends either one document at z

Table 2: BG’s parameters and their definitions. ccept Friend Request, AFR: A socialite A uses this action to accept a pending friend request from mer er B of the social network. With MongoDB, this action inserts (a) A’s userid in B’s array of confirmec riends, and (b) B’s userid in A’s arrays of confirmedFriends, see Figure 1.b. Moreover, it removes B’ serid from A’s array of pendingFriends. With both SQL-X and CASQL, this operation updates the “status ‘tribute value of the row corresponding to B’s friend request to A to “confirmed”, see Figure 1.c. CASQ walidates the memcached key-value pairs corresponding to self profiles of members A and B, profiles c embers A and B as visited by others, list of friends for members A and B, list of pending invitations fc Yamhar A Accept Friend Request, AFR: A socialite A uses this action to accept a pending friend request from mem-

Table 3: Three mixes of social networking actions.

Figure 6 shows SoA R of the ditterent systems with the 3 mixes for a database with 10,000 members and 00 friends per member. MongoDB outperforms SQL-X for the different mixes by almost a factor of 3. The ‘A SQL is sensitive to the percentage of write actions as they invalidate cached key-value pairs, causing read ctions to be processed by the RDBMS. With a very low (0.1%) write mix, CASQL outperform MongoDB y more than a factor of 3. With a high percentage of write actions, SoAR of CASQL is slightly higher than AongoDB.

of an experiment, see Table 2. It computes the fraction of workload that should be issued by each worker process, named BGClient, and communicates it with that BGClient. BGCoord monitors the progress of each BGClient periodically, aggregates their current response time and throughput, and reports these metrics to BG’s visualization deck for display, see Item 3. Once all BGClients terminate, BGCoord aggregates the final results for display by BG’s visualization deck.

odes (hosting BG Clients) where one node produces requests at a rate faster than the other nodes. D-Zipfian

scattered across multiple files. Currently, there are two centralized implementations of the validation phase

to live (TTL) to maintain its key-value pairs up to date (instead of the invalidation discussions of Section C that require design, development, debugging, and testing of software). Figure 11 shows the behavior of three different TTL values, 30, 60, and 120 seconds, as a function of the number of BG threads, T. We assume 10% of actions are writes (see Table 3). Obtained results show a higher TTL value increases the likelihood of both a write causing a key-value pair to become stale and the stale key-value being referenced. This explains the larger amount of unpredictable data with higher TTL values. Note that with the invalidation implementation of Section C, the amount of observed unpredictable data is less than 0.001% in all exper iments. This can be reduced to zero by extending memcached with a race condition prevention technique such as Gumball [15]. to live (TTL) to maintain its key-value pairs up to date (instead of the invalidation discussions of Section C

Figure 13: Number of experiments conducted to compute SoAR. emulates a closed simulation model where a thread may not issue another request until its pending request The heuristic search technique to compute Socialites rating of a data store starts with an experiment

BG: A Benchmark to Evaluate Interactive Social Networking Actions Sumita Barahmand, Shahram Ghandeharizadeh Database Laboratory Technical Report 2012-06 Computer Science Department, USC barahman,shahram @usc.edu Los Angeles, California 90089-0781 February 14, 2013 Abstract BG is a benchmark that rates a data store for processing interactive social networking actions using a pre-specified service level agreement, SLA. An example SLA may require 95% of issued requests to observe a response time faster than 100 milliseconds. BG computes two different ratings named SoAR and Socialites. In addition, it elevates the amount of unpredictable data produced by a data store to a first class metric, including it as a key component of the SLA and quantifying it as a part of the benchmarking process. One may use BG for a variety of purposes ranging from comparing different data stores with one another, evaluating alternative physical data organization techniques given a data store, quantifying the performance characteristics of a data store in the presence of failures (either CP or AP in CAP theorem), among others. This study illustrates BG’s first use case, comparing a document store with an industrial strength relational database management system (RDBMS) deployed either in stand alone mode or aug- mented with memcached. No one system is superior for all BG actions. However, when considering a mix of actions, the memcached augmented RDBMS produces higher ratings. A Introduction Social networking sites such as LinkedIn, Facebook, Twitter (see [34] for a list) are cloud service providers for person-to-person communication. There are different approaches to building these sites ranging from A shorter version of this paper appeared in the biennial Conference on Innovative Data Systems Research, CIDR’13, Asilomar, SQL to NoSQL, Cache Augmented SQL [24, 18, 15] (CASQL), graph databases [1] and others. (See [9] for California, January 2013. 1 a survey.) Some provide a tabular representation of data while others offer alternative data models that scale out [10]. Some may sacrifice strict ACID [17] properties and opt for BASE [9] to enhance performance. Independent of a qualitative discussion of these approaches and their merits, a key question is how do these systems compare with one another quantitatively. BG is a benchmark designed to answer this question for interactive social networking actions that either read or update a very small amount of the entire dataset [30, 14]. In addition to traditional metrics such as response time and throughput, BG quantifies the amount of unpredictable data produced by a solution. This metric refers to either stale, inconsistent, or invalid data produced by a data store, see Section F. BG emphasizes interactive actions of a social networking application such as browse a profile, generate a friend request and accept one, and (not so sociable) practices such as thaw a friendship and reject a friend request. Table 1 shows the different actions and their overlap with several popular social networking sites. BG’s database consists of a fixed number of members with a registered profile. Its workload generator implements a closed simulation model with a fixed number of threads . Each thread emulates a sequence of members performing a social action shown in Table 1. At any instance in time, an emulated member who is actively engaged in a social action is called a socialite. While a database may consist of millions of members, at most simultaneous socialites issue requests with BG’s workload generator. One may use BG to compute either a Social Action Rating (SoAR) or a Socialites rating of a data store given a prespecified service level agreement (SLA). An SLA requires at least percentage of requests to observe a response time equal to or faster than with the amount of unpredictable data less than for some fixed duration of time . For example, an SLA might require 95% ( =95%) of actions to be performed faster than 100 msec ( =0.1 second) with no more than 0.1% ( =0.1%) unpredictable data for 1 hour ( =3600 seconds). SoAR pertains to the highest throughput (actions per second) of a data store that satisfies this SLA. Socialites is the highest number of threads (largest value) that satisfies this SLA, see Figure 12.a. BG quantifies the amount of unpredictable data in a system at the granularity of a social action. It does so by considering concurrent socialites and all possible race conditions to compute a range of values for a retrieved data item, e.g., number of friends for a member’s profile. If a data store fetches a value that falls outside this range then it has produced unpredictable data. BG is inspired by prior benchmarks that evaluate cloud services such as YCSB [11] and YCSB++ [23], e-commerce sites [2], and object-oriented [8] and transaction processing systems [16]. Its contributions are two folds. First, it emphasizes interactive social actions that retrieve a small amount of data. Second, it promotes the amount of unpredictable data produced by a solution as a first class metric for comparing different data stores with one another. The value of this metric is impacted by BG’s knobs such as the exponent of the Zipfian distribution used to generate referenced members and the inter-arrival time between two socialites emulated by a thread. These knobs enable one to approximate a realistic use case of an application to quantify unpredictable data practically. 2 1.a Conceptual data model of BG’s database. 1.b JSON-Like data model of BG’s database. 1.c Relational data model of BG’s database. Figure 1: Conceptual and logical data models of BG’s database. BG might be used for a variety of purposes ranging from comparing different data stores with one another to characterizing the performance of a data store under different settings: Normal mode of operation with alternative physical data organizations [6], in the presence of a failure (either CP or AP in CAP [21]), and when exercising the elasticity of a data store by adding or removing nodes incrementally. This paper illustrates BG’s use case by comparing the following 3 different data stores with one another: SQL-X: An industrial strength relational database management system with ACID properties and a SQL query interface. Due to licensing restrictions, we cannot reveal its identity and name it SQL-X. MongoDB version 2.0.6, a document store for storage and retrieval of JavaScript Object Notations, JSON. MongoDB is a representative NoSQL system. See [9] for a survey. CASQL: SQL-X extended with memcached server version 1.4.2 (64 bit). BG employs Whalin mem- cached client versions 2.5.1 to communicate with the memcached server. We configured the Whalin client to compress (uncompress) key-value pairs when storing (retrieving) them in (from) memcached. The rest of this paper is organized as follows. Section B presents the conceptual data model of BG and its logical design for relational and JSON-like data models. Social networking actions that constitute BG are detailed in Section C. Section D enumerates BG’s sessions that consist of a sequence of actions. Section E describes limitations of a centralized single node benchmarking framework and presents a parallel 3 Throughput 120 89% 0.06% 11.55% 99.66% 110 100 99.88% 90 99.94% 80 1 2 4 8 16 32 T Figure 2: Throughput of SQL-X as a function of with View Profile action, 12 KB profile image size, =100 msec, =0%, =0.27. Confidence ( ) is shown in red. implementation of BG using a shared-nothing architecture. In Section F, we describe how BG quantifies the amount of unpredictable data produced by a data store. A heuristic search technique to rate data stores is presented in Section G. Section H presents related work. Brief conclusions along with our future research directions are detailed in Section I. B Conceptual Data Model and Performance Metrics Figure 1.a shows the ER diagram of BG’s database. The Member entity set contains those users with a registered profile. It consists of a unique identifier and a fixed number of string attributes whose length can be adjusted to generate different member record sizes. In addition, each member may have either zero or 2 images. With the latter, one is a thumbnail and the second is a higher resolution image. Typically, thumbnails are displayed when listing friends of a member and the higher resolution image is displayed when a member visits a profile. Thumbnail images are typically small (in the order of KBs) and their use (instead of larger images, in the order of tens and hundreds of KBs and MBs) has a dramatic impact on system performance, see discussions of Section C.1. A member may either extend an invitation to or be friends with another member. Both are captured using the “Friend” relationship set. An attribute of this relationship set (not shown) differentiates between invitations and friendships. A resource may pertain to an image, a posted question, a technical manuscript, etc. These entities are captured in one set named “Resources”. In order for a resource to exist, it must be “Owned” by a member, a binary relationship between a member and a resource. A member may post a resource, say an image, on the profile of another member, represented as a ternary relationship between two members and a resource. (In this relationship, the two members might be the same member where the member is posting the resource on her own profile.) A member (either the owner or another) may comment on a resource (not shown). A member may restrict the ability to comment on a resource only to her friends. This is implemented using 4 the “Manipulation” relationship set. Figures 1.b and 1.c show the logical design of the ER diagram with both MongoDB’s JSON-like and relational data models. An experimentalist builds a database by specifying the number of members ( ) in the social network, number of friends per member ( ), and resources per member ( ). Some of the relationships might be generated either uniformly or using a Zipfian distribution. For example, one may use a Zipfian distribution with exponent ( ) 0.27 to assign 80% of friendships ( ) to 20% of members. One may specify BG workloads at the granularity of an action, a session, or a mix of these two pos- sibilities. A session is a sequence of actions with think time between actions and inter-arrival time between sessions. Table 1 shows BG’s list of actions and its compatibility with several social networking sites. Section D enumerates the different sessions supported by BG. One may extend BG with new sessions consisting of an arbitrary mix of actions. Similar to YCSB [11], BG exposes both its schema and its actions to be implemented by a developer. Thus, a developer may target an arbitrary data store, specify its physical data model for the conceptual data model of Figure 1.a, provide an implementation of the actions of Table 1, and run BG to evaluate the target data store. As detailed in Section E, these functionalities are divided between a Coordinator, named BGCoord, and slave processes, named BGClients. When generating a workload, BG is by default set to prevent two simultaneous threads from emulating the same member concurrently. This is to model real life user interactions as closely as possible. An experimentalist may eliminate this assumption by modifying a setting of BG. BG rates a system with at least percentage of actions observing a response time equal to or less than with at most percentage of requests observing unpredictable data in time units. For example, an experimentalist may specify a workload with the requirement that at least 95% ( =0.95) of actions to observe a response time equal to or less than 100 msec ( =0.1 second) with at most 0.1% ( =0.001) of requests observing unpredictable data for 1 hour ( =3600 seconds). With such a criterion, BG computes two possible ratings for a system: 1. SoAR: Highest number of completed actions per second that satisfy the specified criterion. Given several systems, the one with the highest SoAR is desirable. 2. Socialites: Highest number of simultaneous threads that satisfy the specified SLA. It quantifies the multi-threading capability of the data store and whether it suffers from limitations such as the convoy phenomena [7] that diminishes its throughput rating with a large number of simultaneous requests. Given several systems, the one with the highest Socialites rating is more desirable. These ratings are not a simple function of the average service time ( ) of a workload. The specified confi- dence ( ), the tolerable response time ( ), and the amount of unpredictable data ( ) observed from a system impacts its SoAR and Socialites rating. To illustrate, Figure 2 shows the throughput of SQL-X as a function 5 SoAR (Actions/Second) SQL−X MongoDB (12,442) (32,452) CASQL CASQL SQL−X MongoDB 10,000 CASQL MongoDB 1,000 MongoDB CASQL SQL−X 100 10 SQL−X (0) 1 No Image 2KB 12KB 500KB Figure 3: SoAR of 3 different systems with view profile and different profile image sizes, =10,000, =100 msec, =95%, = =0, =0.27. of the number of threads for a read only action, =0. We show the different confidence values for =0.1 second. As we increase the number of threads, the throughput of the system increases. Beyond 4 threads, a queue of requests forms causing an increase in system response time. This is reflected in a lower value. With 32 threads, almost all (99.94%) requests observe a response time higher than 100 msec. C Actions This section provides a specification of BG’s social actions, see Table 1. We present their implementation using SQL-X, MongoDB, and CASQL; see Section A for a description. Subsequently, Section C.4 presents 3 workloads consisting of a mix of actions. For the first two actions, we present SoAR numbers using the following SLA: 95% of requests observe a response time equal to or faster than 100 msec with the amount of stale data less than 0.1%. Member ids are generated using a Zipfian distribution with exponent 0.27. Reported numbers were obtained from a dedicated hardware platform consisting of six PCs connected using a gigabit Ethernet switch. Each PC consists of a 64 bit 3.4 GHz Intel Core i7-2600 processor (4 cores with 8 threads) configured with 16 GB of memory, 1.5 TB of storage, and one gigabit networking card. Even though these PCs have the same exact model and were purchased at the same time, there is some variation in their performance. To prevent this from polluting our results, the same one node hosts the different data stores for all ratings. This node hosts both memcached and SQL-X to realize CASQL. Either all or a subset of the remaining 5 nodes are used as BGClients to generate requests for this node. With all reported SoAR values greater than zero, either the disk, all cores, or the networking card of the server hosting a data store becomes fully utilized. When SoAR is zero, this means the data store failed to satisfy the SLA with one single threaded BGClient issuing requests, = =1. 6 Action Facebook Google+ Twitter LinkedIn YouTube FourSquare Delicious Academia.edu Reddit.com View Profile (VP) List ✗ Friends (LF) View ✗ ✗ ✗ ✗ ✗ ✗ Friend Requests (VFR) Invite Add to Follow Subscribe Follow Follow Follow Friend (IF) Circle Accept Friend ✗ ✗ ✗ ✗ ✗ ✗ Request (AFR) Reject ✗ ✗ ✗ ✗ ✗ ✗ Friend Request (RFR) Thaw Remove from Unfollow Unsubscribe Unfollow Unfollow Unfollow Friendship (TF) Circle View Top-K Resources (VTR) View Comments on a Resource (VCR) Post Reply Recommend Post Add Add Post Comment on to a a colleague’s Comment Comment on tag to answer to a Resource (PCR) tweet work on a video a check-in a link a question Delete Comment Delete the Withdraw Remove Delete Remove Delete from a reply for recomm- comment comment on tag from answer to Resource (DCR) a tweet endation on a video a check-in a link a question Table 1: Socialite actions and their compatibility with several social networking sites. C.1 View Profile, VP View Profile (VP) emulates a socialite visiting the profile of either herself or another member. Its input include the socialite’s id and the id of the referenced member, . BG generates these two ids using a random number conditioned using the Zipfian distribution of access with a pre-specified1 exponent (specified by the experimentalist who is benchmarking a system). Socialite’s id may equal the id of , emulating a socialite referencing her own profile. Its output is the profile information of . This includes ’s attributes and the following two aggregate information: ’s number of friends, ’s number of resources (e.g., images). If the socialite is referencing her own profile (socialite id equals ’s id) then VP retrieves a third aggregate information: ’s number of pending friend invitations. VP retrieves all attributes of except ’s thumbnail image. This includes ’s profile image assuming the database is created with images, see Section B. An implementation of VP with the different data stores is as follows. With MongoDB (SQL-X), it looks up the document (row) corresponding to the specified userid. With MongoDB, VP computes the number of friends and pending invitations by counting the number of elements in pendingFriends and confirmedFriends arrays, respectively. It counts the number of resources posted on ’s wall by querying the Resources collection using the predicate “walluserid = ’s userid”. With SQL-X, VP issues different aggregate queries. With CASQL, VP constructs two different keys using ’s userid: self profile when socialite’s id equals ’s userid and browse profile when socialite’s id does not equal ’s userid. Depending on whether socialite’s id equals ’s userid, it looks up the appropriate key in memcached. If a value is returned, it proceeds to uncompress and deserialize it, producing it as its output. 1 The exponent used in this section is 0.27. 7 Otherwise, it performs the same set of steps as those with SQL-X, computes the final output, serializes it, and stores it in memcached as the value associated with the appropriate key. This key-value pair is used by future references. Presence of a profile image and its size impact SoAR of different data stores for VP dramatically [26, 6]. Figure 3 shows the performance of three different systems for a BG database consisting of no-images, and a 2 KB thumbnail image with different sizes for the profile image: 2 KB, 12 KB, and 500 KB. These settings constitute the x-axis of Figure 3. The y-axis reports SoAR of different systems. With no images, MongoDB provides the best performance, outperforming both SQL-X and CASQL by almost a factor of two. With 12 KB images, SoAR of SQL-X drops dramatically from thousands to hundreds2 . With 500 KB image sizes, SQL-X cannot perform even one VP action per second that satisfies the 100 msec response time (with 1 thread), producing a SoAR of zero. SoAR of MongoDB and CASQL also decrease as a function of larger image size because they must transmit a larger amount of data to the BGClient using the network. However, their decrease is not as dramatic as SQL-X. CASQL outperforms SQL-X because these experiments are run with a warm up phase that issues 500,000 requests to populate memcached with key-value pairs pertaining to different member profiles. Most requests are serviced using memcached (instead of SQL-X). While this does not payoff3 with small images, with 12 KB and 500 KB image sizes, it does enhance performance of SQL-X considerably. C.2 List Friends, LF List Friends (LF) emulates a socialite viewing either her list of friends or another member’s list of friends. This action retrieves the profile information of each friend. In the presence of images, it retrieves only the thumbnail image of each friend. At database creation time, BG empowers an experimentalist to configure a database with a fixed number of friends per member ( ). Figure 4 shows SoAR of the alternative data stores for LF as a function of a different number of friends ( ) per member. (The median Facebook friend count is 100 [31, 4].) A larger value lowers the rating of all data stores. Overall, CASQL provides the best overall performance with 50 and 100 friends per member. Even though MongoDB performs no joins, its SoAR is zero for all the examined values. Below, we describe implementation details of each system. SQL-X must join the Friends table with the Members table (see Figure 1.c) to compute the socialite’s list of friends. We assume the friendship relationship between two members is represented as 1 record4 in Friends table, see Figure 1.c. CASQL caches the final results of the LF action and enhances SoAR of SQL-X by less than 10% with values of 50 and 100. With =1000, SQL-X slows down considerably 2 We use SQL-X with the physical data design shown in Figure 1.c. This design can be enhanced to improve performance of SQL-X by ten folds or more. See [6] for details. 3 There are several suggested optimizations to the source code of memcached to improve its performance [25, 3]. Their evaluation is a digression from our main focus. Instead, we focus on the standard open source version 2.5.1 [22]. 4 See [6] for a discussion of representing friendship as 2 records and its impact on SoAR. 8 SoAR (Actions/Second) 1,000 SQL−XCASQL CASQL SQL−X 100 10 CASQL (0) MongoDB MongoDB SQL−X MongoDB (0) (0) (0) (0) 1 50 100 1000 Figure 4: SoAR of List Friends with 3 different data stores as a function of number of friends per member ( ), =10,000, =100 msec, =95%, = =0, =0.27. and can no longer satisfy the 100 msec response time requirement. The CASQL alternative is also unable to meet this SLA because each key-value is larger than 1 MB, the maximum key-value size supported by memcached. This renders memcached idle, redirecting all requests issued by CASQL to SQL-X, producing zero for system SoAR. One may modify memcached to support key-value pairs larger than 2 MB ( =1000 and each thumbnail is 2 KB) to realize an enhanced SoAR with CASQL. With MongoDB, an implementation of LF may retrieve the confirmed friends either one document at a time or as a set of documents. With both approaches, the BG client starts by retrieving the confirmedFriends array of the referenced member, see Figure 1.b. With one document at time, the client processes the array and for each userid, retrieves the profile document of that member. With a set at a time, the client provides MongoDB with the array of userids to retrieve a set containing their profile documents. These two alter- natives cannot satisfy the 100 msec SLA requirement, producing a SoAR of zero for different values of . With fewer friends per member, say 10, SoAR of MongoDB is 6 actions per second. C.3 Other actions View Friend Requests, VFR: This action retrieves a socialite’s pending friend request. It retrieves the profile information of each member extending a friend request invitation along with her thumbnail (assuming the database is configured with images). Both the implementation and the behavior of SQL-X, MongoDB, CASQL with VFR are similar to the discussion of LF. Invite Friend, IV: This action enables a socialite to invite another member, say A, of the social network to become her friend. With MongoDB, this action inserts the socialite’s userid into A’s array of pendingFriends, see Figure 1.b. With both SQL-X and CASQL, this operation inserts a row in the Friends table with status set to “pending”, see Figure 1.c. CASQL invalidates the memcached key-value pairs corresponding to A’s self profile (with a count of pending invitations) and A’s list of pending invitation. A subsequent VP invocation that references these key-value pairs observes a cache miss, computes the latest key-value pairs, and inserts them in the cache. 9 Database parameters Number of members in the database. Number of friends per member. Number of resources per member. Workload parameters Total number of sessions emulated by the benchmark. Think time between social actions constituting a session. ! Inter-arrival time between users emulated by a thread. Exponent of the Zipfian distribution. " Service Level Agreement (SLA) parameters #%$ $& " Percentage of requests with response time Max response time observed by requests. . ' Max % of requests that observe unpredictable data. Min length of time the system must satisfy the SLA. ( Environmental parameters ) Number of BGClients. Number of threads. Table 2: BG’s parameters and their definitions. Accept Friend Request, AFR: A socialite A uses this action to accept a pending friend request from mem- ber B of the social network. With MongoDB, this action inserts (a) A’s userid in B’s array of confirmed- Friends, and (b) B’s userid in A’s arrays of confirmedFriends, see Figure 1.b. Moreover, it removes B’s userid from A’s array of pendingFriends. With both SQL-X and CASQL, this operation updates the “status” attribute value of the row corresponding to B’s friend request to A to “confirmed”, see Figure 1.c. CASQL invalidates the memcached key-value pairs corresponding to self profiles of members A and B, profiles of members A and B as visited by others, list of friends for members A and B, list of pending invitations for member A. Reject Friend Request, RFR: A socialite uses RFR to reject a pending friend request from a member B. BG assumes the system does not notify Member B of this event. With MongoDB, we implement RFR by simply removing B’s userid from the socialite’s array of pendingFriends, see Figure 1.b. With both SQL-X and CASQL, RFR deletes the friend request row corresponding to B’s friend request to the socialite, see Figure 1.c. CASQL invalidates the key-value pairs corresponding to socialite’s self profile and pending friend invitations from memcached. Thaw Friendship, TF: This action enables a socialite A to remove a member B as a friend. With MongoDB, TF removes A’s userid from B’s array of confirmedFriends and vice versa, see Figure 1.b. With both SQL- X and CASQL, TF deletes the row corresponding to the friendship of user A and B (with status equal to “confirmed”) from Friends table, see Figure 1.c. CASQL invalidates the key-value pairs corresponding to the list of friends for users A and B, self profile of users A and B, and profiles of users A and B as visited by other users (because their number of friends has changed). View Top-K Resources, VTR: When BG populates a database, it requires each member to create a fixed 10 number of resources. Each resource is posted on the wall of a randomly chosen member, including one self’s wall. View Top-K Resources (VTR) enables a socialite to retrieve and display her top k resources posted on her wall. Both the value of * and the definition of “top” are configurable. Top may correspond to those resources with the highest number of “likes”, date of last view/comment (recency), or simply its ID. At the time of this writing, BG supports the last one. With MongoDB, VTR queries the Resources collection in a sorted order to retrieve top * resources owned by the socialite. With SQL-X and CASQL, VTR queries the Resources table and uses top * ordered using their rid. CASQL constructs a unique key using the action and socialite userid, serializes the results as a value, and inserts the key-value pair in memcached for future reference. View Comments on Resource, VCR: A socialite displays the comments posted on a resource with a unique rid using VCR action. BG generates rids for this action by randomly selecting a resource owned by a mem- ber (selected using a zipfian distribution). With MongoDB, we looked into two different implementations. The first implementation supported the schema shown in Figure 1.b where the comments for every resource are stored within the manipulation array attribute for that resource. With this implementation, VCR retrieves the elements of manipulation array of the referenced resource, see Figure 1.b. The second implementation creates a separate collection for the comments named Manipulations, see Figure 5. With this implemen- tation VCR queries the Manipulations collection for all those documents whose rid equals the referenced resourceid. (A comparison of these alternative physical data designs is a future research direction.) With SQL-X, VCR employs the specified identifier of a resource to query the Manipulation table and retrieve all attributes of the qualifying rows, see Figure 1.c. CASQL constructs a unique key using rid to look up the cache for a value. If it observes a miss, it invokes the procedure for SQL-X to construct a value. The resulting key-value pair is stored in memcached for future reference. Post Comment on a Resource, PCR: A socialite uses PCR to comment on a resource with a unique id. BG generates rids by randomly selecting a resource owned by a member selected using a Zipfian distribution. It generates a random array of characters as the comment for a user. The number of characters is a configurable parameter. With MongoDB, PCR is implemented by either generating an element for the manipulation array attribute of the selected resource, see Figure 1.b or generating a document, setting its rid to the unique identifier of the referenced resource and inserting it into the Manipulations collection, see Figure 5. With SQL-X and CASQL, PCR inserts a row in the Manipulation table. CASQL invalidates the key-value pair corresponding to comments on the specified resource id. Delete Comment from a Resource, DCR: This action enables a socialite to delete a unique comment posted on one of her owned resources chosen randomly. With MongoDB, an implementation of DCR either removes the element corresponding to the comment from the manipulation array attribute of the identified resource, see Figure 1.b or removes the document corresponding to the comment posted on the referenced resource from the Manipulations collection, see Figure 5. With SQL-X and CASQL, DCR deletes a row of 11 Figure 5: An alternative JSON-Like data model of BG’s database. BG Social Actions Type Very Low (0.1%) Write Low (1%) Write High (10%) Write View Profile, VP Read 40% 40% 35% List Friends, LF Read 5% 5% 5% View Friend Requests, VFR Read 5% 5% 5% Invite Friend, IF Write 0.02% 0.2% 2% Accept Friend Request, AFR Write 0.02% 0.2% 2% Reject Friend Request, RFR Write 0.03% 0.3% 3% Thaw Friendship, TF Write 0.03% 0.3% 3% View Top-K Resources, VTR Read 49.9% 49% 45% View Comments on a Resource, VCR Read 0% 0% 0% Post Comment on a Resource, PCR Write 0% 0% 0% Delete Comment from a Resource, DCR Write 0% 0% 0% Table 3: Three mixes of social networking actions. the Manipulation table. CASQL invalidates the key-value pair corresponding to comments on the specified resource id. C.4 Mix of Actions One may evaluate a data store by specifying a mix of actions. Three different mixes are shown in Table 3. To simplify discussion, actions are categorized into read and write. These mixes exercise write actions that impact the friendship relationship of two members of the social network, invalidating the cached CASQL result of read actions such as View Profile, List Friends, and View Friend Requests. Each mix consists of a different percentage of write actions, ranging from very low (0.1%) to high (10%). We use MongoDB with its strict write concern which requires each write to wait for a response from the server [19]. Without this option, MongoDB produces stale data (less than 0.01%). 12 SoAR (Actions/Second) 5000 CASQL CASQL 4000 3000 2000 MongoDB MongoDB CASQL MongoDB 1000 SQL−X SQL−X SQL−X 0 0.1% Write 1% Write 10% Write Figure 6: SoAR for 3 mixes of read and write actions, =10,000, 12 KB image size, =100, = =100 msec, =95%, =0.01%, = =0, =0.27. Figure 6 shows SoAR of the different systems with the 3 mixes for a database with 10,000 members and 100 friends per member. MongoDB outperforms SQL-X for the different mixes by almost a factor of 3. The CASQL is sensitive to the percentage of write actions as they invalidate cached key-value pairs, causing read actions to be processed by the RDBMS. With a very low (0.1%) write mix, CASQL outperform MongoDB by more than a factor of 3. With a high percentage of write actions, SoAR of CASQL is slightly higher than MongoDB. The observed trends with SQL-X and MongoDB change depending on the mix of VP and LF actions. In Figure 6, MongoDB outperforms SQL-X because the frequency of VP is significantly higher than LF. If this was switched such that LF is more frequent than VP, then SQL-X would outperform MongoDB. A system evaluator should decide the mix of actions based on the characteristics of a target application. D Sessions A session is a sequence of actions performed by a socialite. BG employs the Zipfian distribution to select one of the members to be the socialite. The selected session is based on a probability computed using the frequencies specified for the different sessions in a configuration file. A key conceptual difference between actions and sessions is the concept of think time, . This is the delay between the different actions of a session emulated on behalf of a socialite. BG supports the concept of inter-arrival time ( ) between socialites emulated by a thread with both actions and sessions. Currently, BG supports 8 sessions. The first session is the starting point for the remaining 7 sessions. These sessions are as follows: 1. ViewSelfProfileSession, VP( + ,.- ),VTR(,.- ) / : A Member ,.- visits her profile page to view her profile image (if available), number of pending friend requests, number of confirmed friends, and number of resources posted on her wall. Next, the member lists her top k resources. 2. ViewFrdProfileSession, VP( + ,.- ),VTR(,.- ),LF(,.- ),VP(,10 ),VTR(,10 ) 23,1054 LF( ,.- )) / : After view- 13 ing self profile and top k resources, Member ,.- lists her friends and picks one friend randomly, ,10 . Next, ,.- ,10 ’s profile and ,10 ’s top k resources. If ,.- has no friends, the session terminates views without performing the two actions on ,10 . 3. PostCmtOnResSession, + VP(, - ),VTR( , - ),VP(, 7638:9 ),VTR(, 7638:9 ),VCR(; 7638:9 ) 27; 7638:9 4 VTR( , <6=8>9 ), PCR(; 7638:9 ),VCR(; 76=8>9 ) / : After viewing self profile and top k resources, Member , - views the profile of a randomly chosen member , 7638:9 , lists , <6=8>9 ’s top k resources, and picks one resource randomly, ; 7638:9 . If there are no resources, the rest of actions are not performed. Otherwise, , - views comments posted on ; 7638:9 , posts a comment on ; 7638:9 and views all comments on ; 7638:9 a second time. 4. DeleteCmtOnResSession, + VP(, - ),VTR( , - ),VCR(; <6=8>9 ),DCR(; <6=8>9 ) 2<; <6=8>9 4 VTR( , - ),VCR(; <6=8>9 ) / : After viewing self profile and top k resources, Member , - views comments on one of her own ran- domly selected resource, ; 7638:9 , deletes a comment from this resource (assuming it exists), and views comments on ; 7638:9 again. If ; 7638:9 has no comments, she skips the remaining actions and the session terminates. 5. InviteFrdSession, VP( + , - ),VTR(, - ),LF(, - ),IF(, 0 ),VFR(, 0 ) 2:, 0@? LF(, - ) = AB/ : After viewing self profile and top k resources, Member , - lists her friends, and selects a random member , 0 who has no pending or confirmed relationship5 with , - . (If all members of the database are , - ’s friend then the remaining two actions are not performed.) She invites , 0 to be friends and concludes by listing her own pending friend requests. 6. AcceptFrdReqSession, VP( , - ),VTR(, - ),LF(, - ),VFR(, - ),AFR(, 0 ) 2<, 0 4 VFR(, - ),VFR(, - ),LF(, - ) + / : After viewing self profile and top k resources, Member , - lists her friends and pending friend re- quests. Next, she picks a pending friend request by member , 0 and accepts this friend request (if any). She reviews her friend request a second time and concludes by listing her friends. If , - has no pending friend requests, she skips the remaining actions and the session terminates. 7. RejectFrdReqSession, VP( , - ),VTR(, - ),LF(, - ),VFR(, - ),RFR(, 0 ) 2<, 0 4 VFR(, - ),VFR(, - ),LF(, - ) + / : After viewing self profile and top k resources, Member , - lists her friends and pending friend in- vitation to select an invitation from member , 0 . She rejects friend request from , 0 , views her own friend requests and lists her friends a second time. If , - has no pending friend requests, she skips the remaining actions and the session terminates. 8. ThawFrdshipSession, VP( + , - ),VTR(, - ),LF(, - ),TF(, 0 ) 2B, 0 4 LF(, - ),LF(, - ) / : After viewing self profile and top k resources, Member , - lists her friends and select a friend , 0 randomly. Next, 5 Includes friendship, pending invitation from C@D to CFE , and pending invitation from CFE to C@D . 14 ,.- thaws friendship with ,10 . This session concludes with ,.- listing her friends. If ,.- has no friends, she skips the remaining actions and the session terminates. Note the dependency between the value of , - and , 0 with ViewFrdProfileSession, InviteFrdSession, Re- jectFrdReqSession, and ThawFrdshipSession. For example, with ViewFrdProfileSession, , 0 must be a friend of , - . If , - has no friends, the session terminates without performing the remaining actions. Moreover, some of the sessions cannot be implemented by simply using the state of the database in the data store because multiple concurrent threads may race with one another to change the database state simultaneously. BG is not previewed to how the data store serializes the concurrent actions and whether the data store implements the concept of transactions (ACID). Hence, to detect unpredictable data accurately, BG maintains in-memory data structures that track the state of the database and employs them to decide the identity of entities manipulated by the actions and sessions, see Section F for details. As an example, consider the DeleteCmtOnResSession. Conceptually, it enables a socialite to delete a comment created on one of the resources posted on her wall. Delete Comment from a Resource, DCR, action of this session consumes a randomly generated resource, ; 76=8>9 . BG does not generate ; <6=8>9 using the state of the database. Instead, it maintains in-memory data structures that track the state of the database to generate ; 7638:9 . It uses semaphores to manage the integrity of these data structures and prevents multiple socialites from deleting the same comment on a resource simultaneously. In essence, ; 7638:9 is a member of VTR( ,10 ) and guaranteed to exist prior to invocation of DCR. BG is an extendible framework and one may specify sessions consisting of a different mix of actions. E Parallelism Today’s data stores use techniques that may fully utilize resources (CPU and network bandwidth) of a single node benchmarking framework. For example, Whalin client for memcached (CASQL) is configured to compress key-value pairs prior to inserting them in the cache. It decompresses key-value pairs upon their retrieval to provide the uncompressed version to its caller, i.e., BG. Use of compression minimizes CASQL’s network transmissions and enhances its cache hit rate by reducing the size of key-value pairs with a limited cache space. It also causes the CPU of the node executing BG to become 100% utilized for certain workloads. This is undesirable because the resulting SoAR reflects the capabilities of the benchmarking framework instead of the data store. To address this issue, BG implements a scalable benchmarking framework using a shared-nothing ar- chitecture. Its software components are as follows: 1. A coordinator, BGCoord, computes SoAR and Socialites rating of a data store by implementing both an exhaustive and a heuristic search technique. Its inputs are the SLA specifications and parameters 15 Figure 7: BG’s Visualization Deck. of an experiment, see Table 2. It computes the fraction of workload that should be issued by each worker process, named BGClient, and communicates it with that BGClient. BGCoord monitors the progress of each BGClient periodically, aggregates their current response time and throughput, and reports these metrics to BG’s visualization deck for display, see Item 3. Once all BGClients terminate, BGCoord aggregates the final results for display by BG’s visualization deck. 2. A BGClient is slave to BGCoord and may perform three possible tasks. First, create a database. Second, generate a workload for the data store that is consistent with the BGCoord specifications. Third, compute the amount of unpredictable data produced by the data store. It transmits key metrics except for the amount of unpredictable data to BGCoord periodically. At the end of the experiment, it computes all metrics and transmits them to BGCoord. 3. BG visualization deck enables a user to specify parameter settings for BGCoord, initiate rating of a data store, and monitor the rating process, see Figure 7. Once BGCoord activates BGClients, each BGClient generates its workload independently to enable the benchmarking framework to scale to a large number of nodes. We realize this by constructing the physical database of Section B to consist of logical self-contained fragments. Each fragment consists of a unique collection of members, resources, and their relationships. BG can realize this because it generates the benchmark database. BGCoord assigns a logical fragment to one BGClient to generate its workload. This partitioning enables BG to implement uniqueness of concurrent socialites, i.e., the same member does not manipulate the database simultaneously. Note that construction of logical fragments has no impact on the size of the physical database and its parameter settings such as number of friendships. With BG, an experiment may specify a Zipfian distribution with a fixed exponent and vary the number of BGClients, value of . BGClients implement a decentralized Zipfian, D-Zipfian [5], that produces the 16 Throughput 40,000 35,000 8 BGClients 30,000 4 BGClients 25,000 16 BGClients 20,000 2 BGClients 15,000 10,000 1 BGClient 5,000 0 0 100 200 300 400 500 600 700 800 900 1000 T Figure 8: MongoDB’s Throughput as a function of with view profile (VP) action and different number of BGClients, . =10,000, No image, =100 msec, =95%, = =0, =0.27. same distribution of references with different values of . This enables us to compare results obtained with different number of BGClients with one another. We implement D-Zipfian to incorporate heterogeneity of nodes (hosting BGClients) where one node produces requests at a rate faster than the other nodes. D-Zipfian assigns more load to the fastest node by assigning a larger logical fragment to it and requiring it to produce more requests. Hence, the BGClients complete issuing requests at approximately the same time. For details of D-Zipfian, see [5]. Figure 8 shows the throughput of MongoDB as a function of socialites. Presented results pertain to different number of BGClients performing view profile (VP) action with D-Zipfian and exponent 0.27. The Socialites rating is the length of each curve along the x-axis. While it is 317 with 1 BGClient, it increases 3.2 folds to 1024 with 8 (16) BGClients. A solid rectangular box denotes the SoAR rating with a given number of BGClients. It also increases as a function of ; from 15,800 with 1 BGClient to 33,200 with 16 BGClients. With 1 BGClient, client component of MongoDB is limiting the observed ratings. We know it is not the hardware platform because we can run multiple BGClients on one node to observe higher ratings. Four physical nodes are used in the experiments of Figure 8. Both SoAR and Socialites rating remain unchanged from 8 to 16 BGClients. D-Zipfian ensures the same distribution of requests is generated with 1 to 16 BGClients. F Unpredictable Data Unpredictable data is either stale, inconsistent, or simply invalid data produced by a data store. For example, the design of a CASQL may incur dirty reads [18] or suffer from race conditions that leave the cache and the database in an inconsistent state [15], a data store may employ an eventual consistency [32, 29] technique that produces either stale or inconsistent data for some time [23], and others. The requirements of an appli- cation dictate whether these techniques are appropriate or not. A key question is how much unpredictable data is produced by a data store for interactive social networking actions. This section describes how BG quantifies an answer to this question. 17 Figure 9: A write of GH- may overlap the read of GH- in four possible ways. Conceptually, BG is aware of the initial state of a data item in the database (by creating them using deterministic functions) and the change of value applied by each update operation. There is a finite number of ways for a read of a data item to overlap with concurrent actions that write it. BG enumerates these to compute a range of acceptable values that should be observed by the read operation. If a data store produces a different value then it has produced unpredictable data. This process is named validation and its details are as follows. BG implements validation in an off-line manner after it rates a data store, preventing it from exhausting the resources of a BGClient. During the benchmarking phase, each thread of a BGClient invokes an action that generates one log record. There are two types of log records, a read and a write log record corresponding to either a read or a write of a data item. A log record consists of a unique identifier, the action that produced it, the data item referenced by the action, its socialite session id, and start and end time stamp of the action. The read log record contains its observed value from the data store. The write log contains either the new value (named Absolute Write Log, AWL, records) or change (named Delta Write Log, DWL, records) to existing value of its referenced data item. The start and end time stamps of each log record identifies the duration of an action that either read or wrote a data item. They enable BG to detect the 4 possible ways that a write operation may overlap a read operation, see Figure 9. During validation phase, for each read log record that references data item GH- , BG enumerates all completed write log records that reference GH- and overlap the read log record, computing a range of possible values for this data item. If the read log record contains a value outside of this range then its corresponding action has observed unpredictable data. To elaborate, BG uses the set of I DWL records to compute all serializable schedules that a data store may generate. The theoretical upper bound on the number of schedules is IBJ . However, BG computes fewer schedules when two or more DWL records do not overlap: The end time stamp of one is prior to the start of the second. This produces an accurate range of possible values for the read operation. This is best illustrated with an example. Consider the four log records of Table 10 where 3 DWL records overlap 1 read log record. Theoretically, there is a maximum of six (3!) possible ways for the updates to overlap one another. However, the actual number of possibilities is +K+ DWL1,DWL2,DWL3 / , + DWL2,DWL1,DWL3 /K/ , because DWL3 has a start time stamp after both two, DWL1 and DWL2. Thus, assuming the value of GML is zero at time zero, acceptable values for the read are 18 Operation id Type Data item Start End Value Read1 Read GML 0 10 3 DWL1 Write GML 1 3 -1 DWL2 Write GML 2 4 1 DWL3 Write G L 5 6 2 Figure 10: Example log records. + -1, 0, 1, 2 / , flagging the observed value 3 as unpredictable. If one had assumed 3! possible schedules incorrectly then value 3 would have appeared in the acceptable set, confirming Read1 as valid incorrectly. Log records produced by one BGClient are independent of those produced by the remaining ONQP BGClients because BGCoord partitions members and resources among the BGClients logically. Thus, there are no conflicts across BGClients and each BGClient may perform validation independently to compute number of actions (sessions) that observe unpredictable data. BGCoord collects these numbers from all BGClients to compute the overall percentage of actions (sessions) that observed unpredictable data. Depending on the value of , a BGClient may produce a large number of log records. These records are scattered across multiple files. Currently, there are two centralized implementations of the validation phase using interval-trees [12] as in-memory data structures and a persistent store using a relational database. The latter is more appropriate when the total size of the write log records exceeds the available memory of a BGClient. We also have a preliminary implementation of the validation phase using MapReduce [13] that requires the log files to be scattered across a distributed file system. Below, we describe the centralized implementation of the validation phase. Both in-memory and persistent implementations of validation are optimized for workloads dominated with actions that read data items [1]. These optimizations are as follows. First, if there are no update log records then there is no need for a validation phase; the validation phase terminates by deleting the read log file(s) and reporting 0% unpredictable reads. Second, write log records are processed first to construct a main memory data structure (independent of interval-trees or the RDBMS) that maintains each updated data item and its value prior to the first write log record and after the last write log record, start time stamp of the first write log record, and the end time stamp of the last write log record. This enables BG to quickly process read log records that either reference data items that were never updated (do not exist in the main memory data structure), or were issued before the first or after the last writer (there is only one possible value for these and available in the main memory data structure). Third, multiple threads may process the read log records by accessing the aforementioned data structure with no semaphores as they are simply looking up data. This makes the validation phase suitable for multi-core CPUs as it employs multiple threads to process the read log files simultaneously. Figure 11 shows the percentage of unpredictable data produced by a CASQL system that employs a time 19 % Unpredictable Data 40 98.15% 99.96% 99.2% 30 TTL=120 secs 99.9% 20 99.54% 99.87% 88.43% 87.77% 10 TTL=60 secs 96.24% 79.89% 79.82% 99.7% TTL=30 secs 0 1 10 50 100 T Figure 11: Percentage of unpredictable data ( ) as a function of the number of threads with memcached (CASQL). Mixed workload with 10% write actions, see Table 3. =10,000, 12 KB image size, = =100, =100 msec, =0.27, is in red. to live (TTL) to maintain its key-value pairs up to date (instead of the invalidation discussions of Section C that require design, development, debugging, and testing of software). Figure 11 shows the behavior of three different TTL values, 30, 60, and 120 seconds, as a function of the number of BG threads, . We assume 10% of actions are writes (see Table 3). Obtained results show a higher TTL value increases the likelihood of both a write causing a key-value pair to become stale and the stale key-value being referenced. This explains the larger amount of unpredictable data with higher TTL values. Note that with the invalidation implementation of Section C, the amount of observed unpredictable data is less than 0.001% in all exper- iments. This can be reduced to zero by extending memcached with a race condition prevention technique such as Gumball [15]. A higher TTL value also enhances performance of CASQL by increasing the number of references that observe a cache hit. This is shown with a higher percentage of request that observe a response time faster than 100 msec ( ) with =100: increases from 79.8% with a 30 second TTL to 98.15% with a 2 minute TTL. In essence, a higher TTL value enhances performance of CASQL by producing a higher amount of stale data. G Rating a data store Once the BGClient for a data store has been developed (debugged and tested), instances of it are deployed across one or more servers. Next, BGCoord is provided with the identity (IP and port) of these BGClient instances and an SLA consisting of values for , , , and (see Table 2). BGCoord employs the BGClients to compute SoAR and Socialites rating of the data store. It rates a data store by conducting several experiments, each with a fixed number of threads . These enable BGCoord to compute SoAR and Socialites rating of the data store. Details are as follows. Each experiment uses threads (spread across the BGClients) to issues actions during time units. At the end of the experiment, each BGClient reports its observed number of unpredictable reads, and the 20 12.a Throughput as a function of . 12.b R as a function of . 12.c as a function of . 12.d as a function of . 12.e SoAR search space. Figure 12: Assumptions of heuristic search. 21 percentage of requests that observed a response time equal to or faster than . This experiments is successful as long as the following6 hold true: 1) with each BGClient, the percentage of unpredictable reads should be less than or equal to the SLA specified tolerable amount of unpredictable reads, and 2) with each BGClient, the percentage of requests that observe a response time less than or equal to is greater than or equal to . Otherwise, this experiment has failed to meet the specified SLA. One approach to compute SoAR and Socialites rating of a data store is to conduct experiments starting with =1 and increment by one every time an experiment succeeds. It would maintain the highest ob- served throughput and the highest value. And, it terminates once an experiment fails (see Assumption 1 below) to satisfy the SLA, reporting the highest observed throughput as SoAR and the largest as So- cialites rating of the data store. A limitation of this strategy is that it requires a substantial amount of time. For example, in Figure 8, MongoDB supports a Socialites rating of 1000, =1000. An exhaustive search starting with 1 thread and assuming =10 minutes would require almost 7 days. BGCoord employs heuristic search to expedite rating of a data store by conducting fewer experiments than an exhaustive search. This technique makes the following 3 assumptions about the behavior of a data store as a function of : 1. Throughput of a data store is either a square root function or a concave inverse parabola of the number of threads, see Figure 12.a. 2. Average response time of a workload either remains constant or increases as a function of the number of threads, see Figure 12.b. 3. Percentage of stale data produced by a data store either remains constant or increases as a function of the number of threads, see Figure 12.c. These are reasonable assumptions that hold true in most cases. Below, we formalize the second assumptions in greater detail. Subsequently, we detail the heuristic for SoAR and Socialites rating. Finally, we describe S R sampling using values (smaller than ) to further expedite the rating process. ) of a workload as a function of . With one thread, R Figure 12.b shows the average response time ( is the average service time ( ) of the system for processing the workload. With a handful of threads, R R may remain a constant due to use of multiple cores and sufficient network and disk bandwidth to service requests with no queuing delays. As we increase the number of threads, may increase due to either (a) an increase in R = +T T attributed to use of synchronization primitives by the data store that slow it down [7, 20], (b) queuing delays attributed to fully utilized server resources where and is the T average queuing delay, or both. In the absence of (a), the throughput of the data store is a square root function of , see Figure 12.a. In scenario (b), is bounded with a fixed number of threads since BG 6 We treat each BGClient individually because its fragment of the social network is independent of others, see Section E. 22 # of Visited States 60 50 Total 40 30 Unique 20 10 100 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 SoAR Figure 13: Number of experiments conducted to compute SoAR. R emulates a closed simulation model where a thread may not issue another request until its pending request is serviced. Moreover, as increases, the percentage of requests observing an RT lower than or equal to decrease, see Figure 12.d. The heuristic search technique to compute Socialites rating of a data store starts with an experiment using one thread, =1. If the experiment succeeds, it doubles the value of . It repeats this process until an experiment fails, establishing an interval for the value of . The minimum value of this interval is the previous value of that succeeded and its maximum is the value of that failed. The heuristic performs a binary search of in this interval to compute the highest value that enables an experiment to succeed. This is the Socialites rating of the data store. It is accurate as long as Assumption 1 is satisfied, see Figure 12.a. The heuristic to compute SoAR is similar to Socialites with several key differences. First, BGCoord maintains the highest observed throughput with each value, UWV . It stops doubling once an experiment produces a throughput lower than UWV or fails to satisfy the pre-specified SLA as is the case with the square root curve of Figure 12.a. This is the point denoted as XY in Figure 12.e. It may not simply focus on VZ the interval (T,2T) because the peak throughput might be in the interval ( ,T), see Figure 12.e. Instead, it identifies the peak throughput as follows. It conducts experiments with %[\ threads to determine the slope of the curve in each direction. If both slopes are negative then is the peak and reported as the SoAR of the data store. Otherwise, it focuses on the interval that contains the peak and performs a hill climbing process to identify the peak. The heuristic to compute SoAR visits tens of states even though SoAR might be in the order of hundreds of millions of actions per second. To illustrate, we used the following quadratic function to model throughput Z_^a` of a data store as a function of the number of threads: throughput= N] . An experiment employs a fixed number of threads that serves as the input to the function to report the observed throughput (all computed negative values are reset to zero). The vertex of this function is the maximum throughput, SoAR, and is computed by solving the first derivative of the quadratic function: ` = Zb . The heuristic must compute this value of ` as the SoAR of a system modeled using . We select different values of to model diverse systems whose SoAR varies from 500 to 100 million 23 \ actions per second. Figure 13 shows the number of visited states when =1. When SoAR is 100 million, the heuristic conducts 54 experiments to compute the value of that maximizes the output of the function. Ten states are repeated from previous iterations with the same value of . To eliminate these, the heuristic maintains the observed results for the different values of and performs a look up of the results prior to conducting the experiment. This reduces the number of unique experiments to 40. This is 2.6 times the number of experiments conducted with a system modeled to have a SoAR of 500 (which is several orders of magnitude lower than 100 million). During its search process, BGCoord may run the different experiments with a shorter duration ( ) than S to expedite the rating process, Sdce . Once it identifies the ideal value of S with for SoAR (Socialites), it runs a final experiment with to compute the final SoAR (Socialites rating) of a data store. A key question S is what is the ideal value of ? Ideally, it should be small enough to expedite the time required to rate a data store and, large enough to enable BG to rate a data store accurately. There are several ways to address this. For example, one may compare the throughput computed with and S for the final experiment and, if they S Z differ by more than a certain percentage, repeat the rating process with a larger value. Another possibility is to employ a set of values for S : +YSYL , S ,..., Sf-g/ . If the highest two Sf- values produce identical ratings, then they establish the value of S for that experiment. The number of S values in the set should be small enough to render the rating process faster than performing the search with . The value of S is an input to BGCoord. If it is left unspecified, BG uses for the rating process. As an example, numbers of Figure 8 are generated using S =3 minute. The solid rectangle boxes (SoAR ratings with different ) are generated using =10 minutes. H Related Work BG falls in the vector based approach of [27] that models application behavior as a list of actions and sessions (the ‘vector’) and randomly applies each action to its target data store with the frequency a real application would apply the action. The input workload file of BG specifies the frequency of different actions and session, configuring BG to emulate a wide range of social networking applications. (See Table 3 for three example mixes.) This flexibility is prevalent with both YCSB [11] and YCSB++ [23]. In fact, our implementation of BG employs the core components of YCSB and extends them with new ones such as the actions of Section C, D-Zipfian, BGCoord, and BG’s visualization deck. Those with hands on experience with YCSB find BG familiar with the following key modifications and extensions: 1. A more complex conceptual schema specific to social networks. 2. Simple table operations of YCSB have been replaced with social actions and sessions. 24 3. BG consumes an SLA to compute two ratings for a data store: SoAR and Socialites. If no SLA is specified, BG execute the same as YCSB. 4. BG quantifies the amount of unpredictable data produced by a data store. 5. BG employs a shared-nothing architecture and constructs self-contained fragments of its database to ensure concurrent socialites emulated by independent BGClients are unique, see Section E. This eliminates the need for coordination between BGClients during benchmarking phase, enabling BG to scale to a large number of nodes. Some of BG’s extensions to YCSB are similar to those that differentiate YCSB++ from YCSB. For example, the concept of multiple BGClients managed by BGCoord is similar to how YCSB++ supports multiple YCSB clients. However, there are also differences. First, YCSB++ includes mechanisms specific to evaluate table stores such as HBase. These include function shipping and fine grained access control. Instead of these, BG focuses on interactive social networking actions of Section C and their implementation with alternative data stores. While extension 5 of BG (see the previous paragraph) is similar to ingest- intensive extension of YCSB++, it goes beyond simple ranges that partition data across multiple nodes: Friendships and resources of members are logically partitioned to construct self-contained independent social networks where is the number of BGClients. Second, YCSB++ consists of an elegant mechanism to quantify the inconsistency window: The lag in acknowledged data store changes that are not seen by other clients for some time due to use of a weak consistency semantic such as eventual consistency [32]. BG captures the impact of such design decisions by quantifying the amount of unpredictable data. Both metrics are in synergy and may co-exist in a benchmark. Finally, while both YCSB and YCSB++ lack the concept of an SLA to rate a data store, SLAs are the essence of TPC-A/C benchmarks [16]. For example, TPC-A measures transactions per second (tps) subject to a response time constraint. BG is similar as it employs SLAs to obtain its rating. It is different than TPC because it focuses on social networking actions and incorporates unpredictable data as a component of SLAs. I Future Research While there are many data stores, there is a “gaping hole” with scarcity of benchmarks to substantiate the claims of these data stores [9]. Social networking companies continue to contribute data stores to address their requirements for interactive member actions, e.g., Cassandra and TAO [1] by Facebook and Voldemort by LinkedIn. BG is a benchmark to evaluate these alternative implementations and their claims objectively. The most important feature of BG is its ability to scale to characterize the performance of a data store accurately. 25 Our immediate short term activities are as follows. First, we are studying alternative physical design of data with both the relational [6] and JSON-Like models. Second, we are extending the validation phase of BG to utilize its log records to compute the lag for an acknowledged update to be visible to all clients [33]. Third, we are extending BG with additional interactive actions such as posting and viewing a tweet with Twitter, newsfeed with Facebook, job change with LinkedIn [28]. Fourth, we are using BG in a number of studies to evaluate elasticity of data stores and their behavior in the presence of failures. J Acknowledgments We thank Jason Yap for his implementation of CASQL BGClient. We are grateful to anonymous CIDR 2013 reviewers for their insights and valuable comments. References [1] Z. Amsden, N. Bronson, G. Cabrera III, P. Chakka, P. Dimov, H. Ding, J. Ferris, A. Giardullo, J. Hoon, S. Kulka- rni, N. Lawrence, M. Marchukov, D. Petrov, L. Puzar, and V. Venkataramani. TAO: How Facebook Serves the Social Graph. In SIGMOD Conference, 2012. [2] C. Amza, A. Chanda, A. Cox, S. Elnikety, R. Gil, K. Rajamani, W. Zwaenepoel, E. Cecchet, and J. Marguerite. Specification and Implementation of Dynamic Web Site Benchmarks. In Workshop on Workload Characteriza- tion, 2002. [3] C. Aniszczyk. Caching with Twemcache, http://engineering.twitter.com/2012/07/caching-with-twemcache.html. [4] L. Backstrom. Anatomy of Facebook, http://www.facebook.com/note.php?note id=10150388519243859, 2011. [5] S. Barahmand and S. Ghandeharizadeh. D-Zipfian: A Decentralized Implementation of Zipfian, USC DBLAB Technical Report 2012-04, http://dblab.usc.edu/users/papers/dzipfian.pdf, 2012. [6] S. Barahmand, S. Ghandeharizadeh, and J. Yap. Physical Relational Database Design for Interactive Social Net- working Actions, USC DBLAB Technical Report 2012-08, http://dblab.usc.edu/users/papers/RelationalBG.pdf. [7] M. W. Blasgen, J. Gray, M. F. Mitoma, and T. G. Price. The Convoy Phenomenon. Operating Systems Review, 13(2):20–25, 1979. [8] M. J. Carey, D. J. DeWitt, and J. F. Naughton. The OO7 Benchmark. In SIGMOD Conference, pages 12–21, 1993. [9] R. Cattell. Scalable SQL and NoSQL Data Stores. SIGMOD Rec., 39:12–27, May 2011. [10] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst., 26(2), 2008. [11] B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking Cloud Serving Systems with YCSB. In Cloud Computing, 2010. 26 [12] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms, chapter 15, pages 290–295. MIT Press, 2001. [13] J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In Symposium on Opearting Systems Design & Implementation - Volume 6, 2004. [14] A. Floratou, N. Teletria, D. J. DeWitt, J. M. Patel, and D. Zhang. Can the Elephants Handle the NoSQL On- slaught? In VLDB, 2012. [15] S. Ghandeharizadeh and J. Yap. Gumball: A Race Condition Prevention Technique for Cache Augmented SQL Database Management Systems. In Second ACM SIGMOD Workshop on Databases and Social Networks, 2012. [16] J. Gray. The Benchmark Handbook for Database and Transaction Systems (2nd Edition), Morgan Kaufmann 1993, ISBN 1055860-292-5. [17] J. Gray and A. Reuter. Transaction Processing: Concepts and Techniques, pages 677–680. Morgan Kaufmann, 1993. [18] P. Gupta, N. Zeldovich, and S. Madden. A Trigger-Based Middleware Cache for ORMs. In Middleware, 2011. [19] C. Harris. Overview of MongoDB Java Write Concern Options, November 2011, http://www.littlelostmanuals.com/2011/11/overview-of-basic-mongodb-java-write.html. [20] R. Johnson, I. Pandis, N. Hardavellas, A. Ailamaki, and B. Falsafi. Shore-MT: A Scalable Storage Manager for the Multicore Era. In EDBT, pages 24–35, 2009. [21] N. Lynch and S. Gilbert. Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services. ACM SIGACT New, 33:51–59, 2002. [22] memcached. Memcached, http://www.memcached.org/. [23] S. Patil, M. Polte, K. Ren, W. Tantisiriroj, L. Xiao, J. L´opez, G. Gibson, A. Fuchs, and B. Rinaldi. YCSB++: Benchmarking and Performance Debugging Advanced Features in Scalable Table Stores. In Cloud Computing, New York, NY, USA, 2011. ACM. [24] D. R. K. Ports, A. T. Clements, I. Zhang, S. Madden, and B. Liskov. Transactional Consistency and Automatic Management in an Application Data Cache. In OSDI. USENIX, October 2010. [25] P. Saab. Scaling memcached at Facebook, https://www.facebook.com/note.php?note id=39391378919. [26] R. Sears, C. V. Ingen, and J. Gray. To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem. Technical Report MSR-TR-2006-45, Microsoft Research, 2006. [27] M. Seltzer, D. Krinsky, K. Smith, and X. Zhang. The Case for Application Specific Benchmarking. In HotOS, 1999. [28] A. Silberstein, A. Machanavajjhala, and R. Ramakrishnan. Feed Following: The Big Data Challenge in Social Applications. In DBSocial, pages 1–6, 2011. [29] M. Stonebraker. Errors in Database Systems, Eventual Consistency, and the CAP Theorem. Communications of the ACM, BLOG@ACM, April 2010. 27 [30] M. Stonebraker and R. Cattell. 10 Rules for Scalable Performance in Simple Operation Datastores. Communi- cations of the ACM, 54, June 2011. [31] J. Ugander, B. Karrer, L. Backstrom, and C. Marlow. The Anatomy of the Facebook Social Graph. CoRR, abs/1111.4503, 2011. [32] W. Vogels. Eventually Consistent. Communications of the ACM, Vol. 52, No. 1, pages 40–45, January 2009. [33] H. Wada, A. Fekete, L. Zhao, K. Lee, and A. Liu. Data Consistency Properties and the Trade-offs in Commercial Cloud Storages: The Consumers’ Perspective. In CIDR, 2011. [34] The Free Encyclopedia Wikipedia. List of Social Networking Websites, http://en.wikipedia.org/wiki/list of social networking websites. 28

References (34)

Z. Amsden, N. Bronson, G. Cabrera III, P. Chakka, P. Dimov, H. Ding, J. Ferris, A. Giardullo, J. Hoon, S. Kulka- rni, N. Lawrence, M. Marchukov, D. Petrov, L. Puzar, and V. Venkataramani. TAO: How Facebook Serves the Social Graph. In SIGMOD Conference, 2012.
C. Amza, A. Chanda, A. Cox, S. Elnikety, R. Gil, K. Rajamani, W. Zwaenepoel, E. Cecchet, and J. Marguerite. Specification and Implementation of Dynamic Web Site Benchmarks. In Workshop on Workload Characteriza- tion, 2002.
C. Aniszczyk. Caching with Twemcache, http://engineering.twitter.com/2012/07/caching-with-twemcache.html.
L. Backstrom. Anatomy of Facebook, http://www.facebook.com/note.php?note id=10150388519243859, 2011.
S. Barahmand and S. Ghandeharizadeh. D-Zipfian: A Decentralized Implementation of Zipfian, USC DBLAB Technical Report 2012-04, http://dblab.usc.edu/users/papers/dzipfian.pdf, 2012.
S. Barahmand, S. Ghandeharizadeh, and J. Yap. Physical Relational Database Design for Interactive Social Net- working Actions, USC DBLAB Technical Report 2012-08, http://dblab.usc.edu/users/papers/RelationalBG.pdf.
M. W. Blasgen, J. Gray, M. F. Mitoma, and T. G. Price. The Convoy Phenomenon. Operating Systems Review, 13(2):20-25, 1979.
M. J. Carey, D. J. DeWitt, and J. F. Naughton. The OO7 Benchmark. In SIGMOD Conference, pages 12-21, 1993.
R. Cattell. Scalable SQL and NoSQL Data Stores. SIGMOD Rec., 39:12-27, May 2011.
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst., 26(2), 2008.
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking Cloud Serving Systems with YCSB. In Cloud Computing, 2010.
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms, chapter 15, pages 290-295. MIT Press, 2001.
J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In Symposium on Opearting Systems Design & Implementation -Volume 6, 2004.
A. Floratou, N. Teletria, D. J. DeWitt, J. M. Patel, and D. Zhang. Can the Elephants Handle the NoSQL On- slaught? In VLDB, 2012.
S. Ghandeharizadeh and J. Yap. Gumball: A Race Condition Prevention Technique for Cache Augmented SQL Database Management Systems. In Second ACM SIGMOD Workshop on Databases and Social Networks, 2012.
J. Gray. The Benchmark Handbook for Database and Transaction Systems (2nd Edition), Morgan Kaufmann 1993, ISBN 1055860-292-5.
J. Gray and A. Reuter. Transaction Processing: Concepts and Techniques, pages 677-680. Morgan Kaufmann, 1993.
P. Gupta, N. Zeldovich, and S. Madden. A Trigger-Based Middleware Cache for ORMs. In Middleware, 2011.
C. Harris. Overview of MongoDB Java Write Concern Options, November 2011, http://www.littlelostmanuals.com/2011/11/overview-of-basic-mongodb-java-write.html.
R. Johnson, I. Pandis, N. Hardavellas, A. Ailamaki, and B. Falsafi. Shore-MT: A Scalable Storage Manager for the Multicore Era. In EDBT, pages 24-35, 2009.
N. Lynch and S. Gilbert. Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services. ACM SIGACT New, 33:51-59, 2002.
memcached. Memcached, http://www.memcached.org/.
S. Patil, M. Polte, K. Ren, W. Tantisiriroj, L. Xiao, J. López, G. Gibson, A. Fuchs, and B. Rinaldi. YCSB++: Benchmarking and Performance Debugging Advanced Features in Scalable Table Stores. In Cloud Computing, New York, NY, USA, 2011. ACM.
D. R. K. Ports, A. T. Clements, I. Zhang, S. Madden, and B. Liskov. Transactional Consistency and Automatic Management in an Application Data Cache. In OSDI. USENIX, October 2010.
P. Saab. Scaling memcached at Facebook, https://www.facebook.com/note.php?note id=39391378919.
R. Sears, C. V. Ingen, and J. Gray. To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem. Technical Report MSR-TR-2006-45, Microsoft Research, 2006.
M. Seltzer, D. Krinsky, K. Smith, and X. Zhang. The Case for Application Specific Benchmarking. In HotOS, 1999.
A. Silberstein, A. Machanavajjhala, and R. Ramakrishnan. Feed Following: The Big Data Challenge in Social Applications. In DBSocial, pages 1-6, 2011.
M. Stonebraker. Errors in Database Systems, Eventual Consistency, and the CAP Theorem. Communications of the ACM, BLOG@ACM, April 2010.
M. Stonebraker and R. Cattell. 10 Rules for Scalable Performance in Simple Operation Datastores. Communi- cations of the ACM, 54, June 2011.
J. Ugander, B. Karrer, L. Backstrom, and C. Marlow. The Anatomy of the Facebook Social Graph. CoRR, abs/1111.4503, 2011.
W. Vogels. Eventually Consistent. Communications of the ACM, Vol. 52, No. 1, pages 40-45, January 2009.
H. Wada, A. Fekete, L. Zhao, K. Lee, and A. Liu. Data Consistency Properties and the Trade-offs in Commercial Cloud Storages: The Consumers' Perspective. In CIDR, 2011.
The Free Encyclopedia Wikipedia. List of Social Networking Websites, http://en.wikipedia.org/wiki/list of social networking websites.

BG: A Benchmark to Evaluate Interactive Social Networking Actions

Sign up for access to the world's latest research

Abstract

Related papers

References (34)

Related papers