Technologies used in Big Data
- Cloud computing: Cloud computing is delivering the computer services over the internet. User can access the application as utilities over the internet. Example: Yahoo, Gmail, Hotmail etc. In 2006 Amazon is the first who provide the first public cloud.
- HADOOP: HADOOP means High Availability distributed Object Oriented Platform. It is a framework for the Big Data.
Two modules of Hadoop
- MapReduce: MapReduce is a parallel programming. It helps in processing the massive data on large clusters of commodity hardware. Data includes structured, unstructured and semi structured data.
- HDFS: HDFS means Hadoop distributed File system. It is used to stroe and process the datasets.
Tools of Hadoop
- Sqoop: it is used for data transfer from Hadoop and Hive and import export of data between HDFS and RDMS.
- Pig: data manipulation operations are performed under Hadoop by using Pig. Pig is a data flow high level language.
- Hive: it process HDFS data in Hadoop. It runs on the top of the Hadoop. And firstly used in Data mining purpose.
- Predictive Analytics: As the name suggest predictive means prediction about the future. That means analytics which makes the prediction about the future that what will be going to happen in the future. It will help to predict the uncertainties of business so that they can improve their performance and risk factor.
- NoSQL: NoSQL means non relational or non SQL database. Basically NoSQL used to handle the unstructured data. It is open source database for Big data and it stored the massive data like unstructured data in no particular schema. Introduced in 1960 and get famous in 21st century after used by the most popular website Facebook, Google etc.
- Search and Knowledge discovery: It will first search then collects the information find out the knowledge after that it will delivers the context. It is a useful technology sources for information is API, streams etc.
- Data virtualization: In data virtualization data is retrieved and manipulated. It is processed in real time or near real time basis.
- Data Integration: Amazon Elastic MapReduce
- Big data in EXCEL: Microsoft HDinsight connected with big data store. We can connect data stored in Hadoop using EXCEL 13
- Presto: Facebook developed Presto which is a query engine. It can easily and quickly retrieve the data.