Langsung ke konten utama

Unggulan

Big Data & Data Analytics: Assignment week 11 "Text Mining Analysis" : Tweets #ShameOnYouSyedSaddiq

Big Data & Data Analytics: Assignment week 11 "Text Mining Analysis" : Tweets #ShameOnYouSyedSaddiq Well, it seems like this one hashtag successfully got the first rank in the trending topic of a social media, Twitter. This time, I tried to find out what words and how many words came out on the tweets containing the hashtag. I did a text mining analysis using the Orange application with the classification method and here are the results: Pic 1: Display This picture shows the display of how I use the Orange. The attributes are: - From text mining: Twitter, Corpus Viewer, Preprocess Text, Topic Modelling, Word Cloud, Sentiment Analysis, and Tweet Profiler. - From Visualize: Box Plot Pic 2: result of Corpus Modelling From the picture above (Pic 2), we can see I took 100 tweets from Twitter that contains with the hashtag of #ShameOnYouSyedSyaddiq . Pic 3: result of Topic Modelling  Based on the picture that I put (Pic 3), with the number of topics

Case study of Big Data Implementation: Grab (Model, Method, Measurement, Accuracy)

Case study of Big Data Implementation:
Grab
(Model, Method, Measurement, Accuracy)




  • Model used
In this case study, Grab company uses the "Optimize and Extend" model which explains that the model is used if a company has a problem with the data in it, and here the Grab fixes the problem with the data by doing a new model on the data by means of rewrite the data system every two years so that the data can be processed in a structured manner by the Grab company itself.


  • Methodology used
Based on the problems that have been presented before, the Grab company uses the "Clustering" and "Prediction" methods.
Clustering is a method of grouping data. According to Edy Irwansyah (in Tan, 2006) clustering is a process to group data into several clusters or groups so that data in one cluster has a maximum level of similarity and data between clusters has a minimum similarity. According to Irwansyah (2017) Clustering is widely used in various applications such as business intelligence, image pattern recognition, web search, the field of biological sciences, and for security. In business intelligence, clustering can organize many customers into multiple groups. For example grouping customers into several clusters with strong similarity characteristics. In this case study, Grab company uses this method in terms of data processing which has a very large size in Grab companies. Data collected by Big Data Grab reaches 4 petabytes or about 4,000 terabytes. Using this method, at the end of 2016, the company took the decision to change the existing architecture, Grab took the decision to move the server to the Cloud service on Amazon and move to use Data Lake by utilizing the Helios service from Amazon. So the data generated is more real-time for analysis.
According bootupacademyai Prediction is a process that will find certain patterns of data. The pattern can be known from the variables in the data. The pattern obtained can be used to predict other variables whose value or type is unknown. In this case study, Grab company uses this method in terms of solving the problem of drivers gathering in one spot area which results in other areas having minimal or no drivers. This method shows the results of data processing, namely the region and hours that allows the crowds of customers who will order services offered by Grab. So with this method reducing the number of areas that will be minimal drivers (widespread driver distribution).


  • Measurement
In this case study, the Grab Company initially had problems with its data system regarding the slow input of data to the system, which made the Grab Company itself failed to target the data targets to be analyzed. In the end Grab uses a new model method by changing the existing architecture with the consideration that the old architecture is not able to provide data analytics in real time and server implementation becomes difficult given the large number of requests so that if the server is upgraded it will require no small cost. Grab made the decision to move the server to the Cloud service on Amazon and move to use Data Lake by utilizing the Helios service from Amazon. With this, Grab company is trying to fix anything that causes data on Grab company not to be optimally inputted. It can be clearly seen that with the role of big data in this problem, the problems in Grab companies can be overcome so that they can meet the targets that Grab has intended.


  • Accuracy
The things that are done at Grab company in fixing the modeling system on the data can run well, including what is done by grab is to limit the people accessing the data on the Grab application and limit the query action that is done so that the data entering the Grab company can structured and so that all incoming data can be easily read by the Grab company, and another thing that will be generated by the Grab company is that Grab wants to implement the Kafka service from Apache combined with Spark which is also from Apache to support streaming data in real time. Then the results of this streaming can be used to also create a real timemonitoring system that is considered to help companies make decisions and understand the system better. The following are the decisions made by Grab company in fixing the data so that the company can run according to what is targeted in the future.



References https://mti.binus.ac.id/2017/10/04/1886/ https://bootup.ai/blog/data-mining-adalah/ https://socs.binus.ac.id/2017/03/09/clustering/

Komentar

Postingan Populer