Performance Analysis of Cloud Databases Handling Social Networking Data
2013, 2013 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM)
https://doi.org/10.1109/CCEM.2013.6684437…
4 pages
1 file
Sign up for access to the world's latest research
Abstract
With the growing popularity of Social Networking, the need for storage and analysis of data generated by such applications has increased in leaps and bounds. With the advent of cloud computing tools that handle large amounts of data with ease, there is and increased usage of such tools to manage social networking data. However, usage of tools that can be employed for accessing and analyzing social networking data needs to be optimized. Performance of such tools largely depends on the nature of the database that is used in the back-end. This research illustrates that the choice of database decides to a large extent the performance of the tool and related application. This research will also bring in clarity on using social networking tools for specific purposes and the effect of MapReduce on various storage structures.






Related papers
2011
Social platforms such as Facebook and Twitter have been growing exponentially in the last few years. As a result of this growth, the amount of social data increased enormously. The need for storing and analyzing social data became crucial. New storage solutions -also called NoSQL -were therefore created to fulfill this need. This thesis will analyze the structure of social data and give an overview of currently used storage systems and their respective advantages and disadvantages for differently structured social data. Thus, the main goal of this thesis is to find out the structure of social data and to identify which types of storage systems are suitable for storing and processing social data. Based on concrete implementations of the different storage systems it is analyzed which solutions fit which type of data and how the data can be processed and analyzed in the respective system. A focus lies on simple analyzing methods such as the degree centrality and simplified PageRank calculations.
In this paper, we report about benchmark experiments and results from optimizing database connectivity for querying social networking data from Apache Shindig in a Neo4j database. We built on our experiments from [1] and tried to improve performance of the current RESTful http connection in comparison to JDBC in order to fully utilize performance benefits of the graph database compared to relational database management systems. We implemented a database driver based on WebSockets. We found that BSON is a better data transfer format than JSON and compression increases performance in some settings while decreasing it in others. Multiple WebSocket connections are needed to scale to a high number of client requests and fully utilize database performance. Multi-threading is another key factor for scalability. Implementing a kind of stored procedure, we were able to further increase throughput and decrease response times.
2011
The amount of traffic on web based social networks is very difficult to predict. In order to avoid wasting resources during low traffic periods or being overloaded during peak periods, it would be interesting to adapt the amount of resources dedicated to the service. In this work we detail the design and implementation of our own social network application, called Bwitter. Our first goal is to make Bwitter performance scales with the number of machines we dedicate to it. Our second goal is linked to our first one, we want to make Bwitter elastic so that it can react to flash crowds without suspending its services by adding resources in order to handle this load. To achieve the desired scalability and elasticity, Bwitter is implemented on a scalable key/value datastore with transactional capabilities running on the Cloud. During our tests we study the behaviour of Bwitter using the Scalaris datastore and having both running on Amazon's Elastic Compute Cloud. We show that the performance of Bwitter increases almost linearly with the number of resources we allocate to it. Bwitter is also able to improve its performance significantly in a matter of minutes.
Modeling Decisions for Artificial Intelligence, 2014
In the last years, with the increase of the available data from social networks and the rise of big data technologies, social data has emerged as one of the most profitable market for companies to increase their benefits. Besides, social computation scientists see such data as a vast ocean of information to study modern human societies. Nowadays, enterprises and researchers are developing their own mining tools in house, or they are outsourcing their social media mining needs to specialised companies with its consequent economical cost. In this paper, we present the first cloud computing service to facilitate the deployment of social media analytics applications to allow data practitioners to use social mining tools as a service. The main advantage of this service is the possibility to run different queries at the same time and combine their results in real time. Additionally, we also introduce twearch, a prototype to develop twitter mining algorithms as services in the cloud.
The increasing use of internet technologies with the convergence of computing machines, from mainframe to cellular devices and to multimodal HCIS, has resulted into tremendous amount of data rich in volume and variety, since all data is not always required ,but only the relevant one suited for particular information need, some methods are required to supply user with application specific data .The technique to this huge amount of data and to extract value out of this volume and variety rich data are collectively called Big data. Over the recent years, there has been an emerging interest in big data for social media analysis.
International Journal of Reasoning-based Intelligent Systems, 2015
A new database model called NoSQL plays a vital role in BigData analytics. NoSQL databases are non-relational open source data stores, which have employed massively scaled web site scenarios and upgraded in better performance by retrieving multiple data sets from commodity machines. NoSQL databases can be a hybrid mix of different databases based on specific use case applications. In this paper, we focus on four different NoSQL databases used by social networking sites. Their features are compared and analysed. These features are compared for the sub categories of the four NoSQL databases for fast querying from social networks. In the detailed analysis presented, the features like data storage and fast retrieval phase of query processing are given primary importance. We also present a comparison of the time taken during insert and read operations of social network data of Facebook. The results are compared and most suitable database from NoSQL graph database subcategories for insert and read operations in Facebook is identified.
Indonesian Journal of Electrical Engineering and Computer Science
In the era of rapid growth of cloud computing, performance calculation of cloud service is an essential criterion to assure quality of service. Nevertheless, it is a perplexing task to effectively analyze the performance of cloud service due to the complexity of cloud resources and the diversity of Big Data applications. Hence, we propose to examine the performance of Big Data applications with Hadoop and thus to figure out the performance in cloud cluster. Hadoop is built based on MapReduce, one of the widely used programming models in Big Data. In this paper, the performance analysis of Hadoop MapReduce WordCount application for Twitter data is presented. A 4-node in-house Hadoop cluster was setup and experiment was carried out for analyzing the performance. Through this work, it was concluded that Hadoop is efficient for BigData applications with 3 or more nodes with replication factor 3. Also, it was observed that system time was relatively more compared to user time for BigData applications beyond 80GB. This experiment had also thrown certain pattern on actual data blocks used to process the WordCount application.
2013
Today, social networks present massive amounts of data by the hour that need storage, therefore, along with the aid of cloud computing, social networks users can have their data stored in data centers anywhere around the globe belonging to the cloud. This paper will be focusing on how to allocate user data to the appropriate global data centers from a social networking point of view. The method is carried out using the proposed algorithm where a number of factors are involved such as; read-rate, write-rate, and the number/location of friend connections are used to calculate which data center would yield shorter latency and therefore better results if the user data was to be stored at that location. After validating which was done via simulation, the algorithm proved to yield sufficiently improved data-access latency scores in all test cases.
International Journal of Advanced Computer Science and Applications, 2017
From last three decades, the relational databases are being used in many organizations of various natures such as Education, Health, Business and in many other applications. Traditional databases show tremendous performance and are designed to handle structured data with ACID (Atomicity, Consistency, Isolation, Durability) property to manage data integrity. In the current era, organizations are storing more data i.e. videos, images, blogs, etc. besides structured data for decision making. Similarly, social media and scientific applications are generating large amount of semi-structured data of varied nature. Relational databases cannot process properly and manage such large amount of data efficiently. To overcome this problem, another paradigm NoSQL databases is introduced to manage and process massive amount of unstructured data efficiently. NoSQL databases are divided into four categories and each category is used according to the nature and need of the specific problem. In this paper we will compare Oracle relational database and NoSQL graph database using optimized queries and physical database tuning techniques. The comparison is two folded: in the first iteration we compare various kinds of queries such as simpler query, database tuning of Oracle relational database such as sub databases and perform these queries in our desired environments. Secondly, for this comparison we will perform predictive analysis for the results obtained from our experiments.
Twitter, one of the largest and famous social media site receives millions of tweets every day on variety of important topic. This large amount of raw data can be used for industrial , Social, Economic, Government policies or business purpose by organizing according to our need and processing. Hadoop is one of the best tool options for twitter data analysis and hadoop works for distributed Big data , Streaming data , Time Stamped data , text data etc. This paper discuss how to use FLUME for extracting twitter data and store it into HDFS for analysis, and after that we are use hadoop ecosystem for analysing these data.
References (12)
- Daniel Nurmi, Rich Wolski, Chris Grzegorczyk, Graziano Obertelli, Sunil Soman, Lamia Youseff, and Dmitrii Zagorodnov. The eucalyptus open-source cloud- computing system. In Proceedings of 9th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 09), Shanghai, China., 2009.
- Facebook Meets the Virtualized Enterprise, Washington, DC, USA, 2008. IEEE Computer Society.
- OpenSocial and Gadgets Specification Group. Opensocial specification v0.9. http://www.opensocial.org/Technical- Resources/opensocial-specv09/OpenSocial- Specification.html, April 2009.
- David Recordon and Drummond Reed. Openid 2.0: a platform for usercentric identity management. In DIM '06: Proceedings of the second ACM workshop on Digital identity management, pages 11-16, New York, NY, USA, 2006. ACM.
- Amazon. building facebook applications on aws website. http://aws.amazon.com/solutions/global-solution- providers/facebook/.
- K. Keahey, I. Foster, T. Freeman, and X. Zhang. Virtual workspaces: Achieving quality of service and quality of life in the grid. Scientific Programming Journal: Special Issue: Dynamic Grids and Worldwide Computing, 13(4):265-276, 2005.
- David P. Anderson. Boinc: A system for public-resource computing and storage. In GRID '04: Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing, pages 4-10, Washington, DC, USA, 2004. IEEE Computer Society.
- K. Czajkowski, S. Fitzgerald, I. Foster, and C. Kesselman. Grid information services for distributed resource sharing. In the 10th IEEE Symposium on High Performance Distributed Computing (HPDC), 2001.
- D. Neumann, J. Ster, A. Anandasivam, and N. Borissov. SORMA -Building an Open Grid Market for Grid Resource Allocation. In Lecture Notes in Computer Science: The 4th International Workshop on Grid Economics and Business Models (GECON 2007), pages 194-200, Rennes, France, 2007.
- Nazareno Andrade, Francisco Brasileiro, Miranda Mowbray, and Walfredo Cirne. A reciprocation-based economy for multiple services in a computational grid. In R. Buyya and K. Bubendorfer, editors, Market Oriented Grid and Utility Computing, pages 357-370. Wiley Press, 2009.
- Zhenhua Guo, Raminderjeet Singh, and Marlon Pierce. Building the polargrid portal using web 2.0 and opensocial. In GCE '09: Proceedings of the 5th Grid Computing Environments Workshop, pages 1-8, New York, NY, USA, 2009. ACM.
- Anjomshoaa et al. Job Submission Description Language (JSDL) Specification, Version 1.0. 2005.