COIT 20253 Business Intelligence using Big Data

Questions-

What Big Data is, and the difference between Online and Offline Big Data

How to select the right Big Data application for your business, project and desired outcomes. •

What are the technologies available in Big Data?

Business Impact of Big Data

Organizational Impact of Big Data.

 

Answer -

 

What is Big Data?

Big Data means the collection of both structural and unstructured datasets in really big volume. It is very large data that any traditional data management tools are unable to store and process this data. Example: Facebook a social media website stores 500+ terabytes new data every day which includes photo, video uploading messages and comments everyday user upload.

Categories of Big Data

  1. Structured
  2. Unstructured
  3. Semi structured

 

  1. Structured: data that can be stored in the database SQL is known as structured data. Information is stored in the form of table with rows and columns. It can be managed easily and it is stored, accessed and processed in fixed format. Structured data represents only 10 to 20% of informatics data.
  2. Unstructured: data that cannot be stored in the form of table. It covers 80% of data. Unstructured data includes the images, videos, audios, presentations and much other information. It is difficult to manage. Main source of unstructured data is either machine generated or human generated. Example: Google search.
  3. Semi structured data: semi structured data that includes both form of data structural and unstructured. Example: table information as RDBMS and other as XMLfile.

Characteristics of Big Data

  1. Volume
  2. Variety
  3. Velocity
  4. Variability
  5. Veracity

 

  1. Volume: Volume means the quantity that means amount of data that is generated every second. Like in zettabytes. Zettabytes is equal to 1021 bytes
  2. Variety: Variety is the nature of the data and the format of the data. Data can be structured, unstructured, semi, email, and video, audio of type.
  3. Velocity: it refers to the speed of the data that is generated and being processed. Example: transaction credit card and social media message in milliseconds.
  4. Variability: it refers to the inconsistency of the data and to handle the inconsistency we use the hamper process and it manages the problem.
  5. Veracity: it is the messiness of data. Example twitter post with hash tags.

Strategy to select right Big Data application

Flow the steps for the right big data

  1. Business Objectives Identified: firstly review your companies goal and objective and requirements for the business
  2. Big data infrastructure reviewed: find out the best infrastructure best suited for your business which includes the sizes, properties, location etc.
  3. Data enrichment and transformation approach: define data enrichment and transformed to support business objectives.
  4. Recommendation strategy implementation

Difference between Online and Offline big data

Online big data:

Online Big Data

Offline Big Data

It is created, transformed, managed, analyzed and manages in real time

It is long running processed

 

It has low latency

It runs for hours and more

It is  highly available

It is less available

Example: MongoDB and NoSQL

Example: Spark or Hadoop based workloads

 

 

 

 

Share this post