Langsung ke konten utama

Unggulan

Big Data & Data Analytics: Assignment week 11 "Text Mining Analysis" : Tweets #ShameOnYouSyedSaddiq

Big Data & Data Analytics: Assignment week 11 "Text Mining Analysis" : Tweets #ShameOnYouSyedSaddiq Well, it seems like this one hashtag successfully got the first rank in the trending topic of a social media, Twitter. This time, I tried to find out what words and how many words came out on the tweets containing the hashtag. I did a text mining analysis using the Orange application with the classification method and here are the results: Pic 1: Display This picture shows the display of how I use the Orange. The attributes are: - From text mining: Twitter, Corpus Viewer, Preprocess Text, Topic Modelling, Word Cloud, Sentiment Analysis, and Tweet Profiler. - From Visualize: Box Plot Pic 2: result of Corpus Modelling From the picture above (Pic 2), we can see I took 100 tweets from Twitter that contains with the hashtag of #ShameOnYouSyedSyaddiq . Pic 3: result of Topic Modelling  Based on the picture that I put (Pic 3), with the number of topics

Case Study of Big Data Analytics Implementation: Grab (Objective, Problem, Solution)

Case Study of Big Data Analytics Implementation:
Grab
(Objective, Problem, Solution)


  • Objective
Grab uses Big Data to support its services in operating its activities, namely knowing the behavior of users both customers and drivers.
With Big Data, this company can also find out which regions most frequently occur orders and travel destinations.
Data processing can also regulate the flow of orders at rush hour in an area.
Optimize ordering speed.
Thus, Big Data is used by Grab companies to support its business and services, not only in the short term but also in the long term.
  • Problems
The amount of data available in Grab. Data collected by Big Data Grab reaches 4 petabytes or about 4,000 terabytes. The amount of data is equivalent to 2 trillion pages of writing or a 53-year-long video with high resolution.
Drivers gather in one spot area which results in other areas being minimal even without drivers.
  • Solutions
Grab to rewrite the system every two years.
One example of innovations made by utilizing big data is the presence of GrabShare and GrabNow. GrabShare is a ride sharing service with others, with a point of unidirectional direction. While GrabNow is a way to get the fastest driver by directly approaching the nearest driver that is not in a booking status. At GrabNow, this becomes a solution with observations in Jakarta that users will find it easier to order the Grab directly in front of them.
Grab invested in building a research and development center (R&D center) in various locations.
Collaborating with Microsoft covers many things such as big data, artificial intelligence and machine learning. One of the implementations is planned to use mobile facial recognition with AI built-in for drivers and passengers.
Big data analytics not only helps to understand the information contained in the data but also helps to identify the data that is most important for current and future business decisions.
Grab is one company that combines services with technology. With the use of this technology, Grab utilizes Big Data to support its services in operating its activities. Big data is an important element for Grab to know the behavior of its users, as well as its driver partners. With Big Data, Grab can find out which regions travel the most orders and destinations of the trip. This is used by Grab to support its business and services.
"A lot of data collected means that there is a lot of insight for us. In one day we received 10 TB of data. When totaling the same as multi-petabytes of data. This is what makes us the most requested online transportation service in Southeast Asia, "said Head of Engineering Grab, Ditesh Gathani, on October 25, 2017.
His party utilizes Big Data from passenger and driver data track records to be processed even better. He called it as "Data Demand". Data processing can also set the flow of orders at rush hour in certain areas. Take for example, the results of the data that is processed will optimize the process of booking passengers and taking bookings by drivers.
One example of innovations made by utilizing big data is the presence of GrabShare and GrabNow. GrabShare is a ride sharing service with others, with a point of unidirectional direction. While GrabNow is a way to get the fastest driver by directly approaching the nearest driver that is not in a booking status.
He said that to solve the problem, his team applied the hyperlocal approach. For example, it sent 15 Grab teams to spend six months in Jakarta. They finally found that Jakarta citizens would find it easier to order Grab in front of them.
Ditesh explained, the purpose of Grab is to want to change the transportation system in the cities where it operates within the next 10 years. This is what Grab wants to achieve by combining the functions of big data and machine learning, which can predict consumer demand. Not to forget, Grab also plans to collaborate with the local government. Currently, Grab claimed to have shared data in real-time with the Singapore government, which contains data related to location, direction of travel, speed, to analyze traffic flow.
Quoted from Tech Wire Asia, Monday (12/17/2018), data collected by Big Data Grab reaches 4 petabytes or about 4,000 terabytes. The amount of data is equivalent to 2 trillion pages of writing or a 53-year-long video with high resolution.
Ditesh also said, the abundant data on one side forced Grab to rewrite the system every two years. Therefore, the Grab engineer team works only to provide solutions that are valid for a period of two years.
Abundant data, making companies willing to invest heavily to build research and development centers (R&D centers) in various locations. The total R&D Grab has six points, Seattle (US), Ho Chi Minh (Vietnam), Singapore, Beijing (China), Bangalore (India), and Jakarta (Indonesia). The choice of location is also not arbitrary. It considers the availability of qualified local engineers to help the Grab business. For locations that do not exist in the Grab business area, such as Seattle, Beijing and Bangalore, it was chosen because in that country it has good talented engineers due to the presence of various triple class A technology companies.
Of all the data processing results mentioned, Grab invests more in the use of big data. Unmitigated, it has also pocketed capital disbursement from Softbank with a value of US $ 750 million or equivalent to Rp 9.8 trillion.
Meanwhile, cooperation with Microsoft will cover many things such as big data, artificial intelligence and machine learning. One of the implementations is planned to use mobile facial recognition with AI built-in for drivers and passengers.
After the ETL process is completed, other services to perform analytics such as holistics, tableau and Spark will access the data in the data warehouse. The obstacle experienced by using architecture like this is that the analyzed data is not real time because it is yesterday's data and the burden on Redshift as a data warehouse is very high along with the amount of data analytics needed.
At the end of 2016, the company took the decision to change the existing architecture with the consideration that the old architecture was not able to provide data analytics in real time and server implementation became difficult given the large number of requests so that if the server was upgraded it would cost a lot. Grab made the decision to move the server to the Cloud service on Amazon and move to use Data Lake by utilizing the Helios service from Amazon. Every existing MySQL database will be combined with the PyroisOrchestrator service which will automatically perform ETL every hour and the ETL results will be stored directly into Data Lake. In Data Lake, data is stored as Parquet and partitioned according to time. According to Grab, partitions must be in accordance with company requirements, time-based partitions are considered appropriate for Grab because the relevance of the data most needed to be analyzed in this company is limited by the time domain. Grab creates a Data Gateway that is integrated with Google authentication to restrict access to existing Data Lake. With the addition of this security layer, Grab can limit the people who can access data and limit the access and queries performed.


References
Firdaus, S. (2018, December 15). IMPLEMENTASI BIG DATA ANALYTICS PADA APLIKASI GRAB. Retrieved from satrianifirdaus.my.id: http://satrianifirdaus.my.id/2018/12/15/implementasi-big-data-analytics-pada-aplikasi-grab/
KumparanTECH. (2017, Oktober 25). Cara Grab Pakai Big Data untuk Memahami Penumpang dan Pengemudi. Retrieved from Kumparan: https://kumparan.com/@kumparantech/cara-grab-pakai-big-data-untuk-pahami-penumpang-dan-pengemudi
Nabila, M. (2017, Oktober 26). Mengintip Strategi Grab Optimalkan Big Data dalam Operasional. Retrieved from DailySocialId: https://dailysocial.id/post/mengintip-strategi-grab-optimalkan-big-data-dalam-operasional
R., J. I. (2017, October 25). Grab Sebut Big Data Jadi Strategi Penunjang Layanan. Retrieved from Liputan6: https://www.liputan6.com/tekno/read/3140751/grab-sebut-big-data-jadi-strategi-penunjang-layanan?utm_expid=.9Z4i5ypGQeGiS7w9arwTvQ.0&utm_referrer=http%3A%2F%2Fsatrianifirdaus.my.id%2F2018%2F12%2F15%2Fimplementasi-big-data-analytics-pada-aplikasi-grab%2F
Setiawan, R. (2018, December 17). Grab Punya Big Data 4.000 TB, Setara Video HD Durasi 53 Tahun. Retrieved from DetikInet: https://inet.detik.com/cyberlife/d-4347750/grab-punya-big-data-4000-tb-setara-video-hd-durasi-53-tahun

Komentar

Postingan Populer