7082 CEM Big Data Management and Data Visualisation
Code- 7082 CEM assignment help
Subject- Big Data Management and Data Visualisation assignment help
Task:
1. Select a dataset of your choice from one of the open dataset repositories (Kaggle/UCI/others).
Students are advised to inform the module leader by email of the dataset they have decided to work on
and get approval.
2. Use PySpark (or another Big Data program from the Hadoop Ecosystem) to analyze the dataset. You
should perform one or a combination of data analysis tasks (regression, clustering, classification, etc).
You should explain your choice of the technique(s) used.
3. Use visualization to show the results of your analysis. You can use either Tableau or another program
of your choice.
4. Critically analyze your findings: the results and the methods used.
Clarifications:
You can use any operating system that you prefer to install your program.
This document is for Coventry University students for their own use in completing their
assessed work for this module and should not be passed to third parties or posted on any
website. Any infringements of this rule should be reported to
facultyregistry.eec@coventry.ac.uk.
Coding the task you are performing yourself is a plus.
Given the nature of this module and the task, you should document everything you do.
Everything you do should be reproducible. The link to the dataset should be clear (direct link
to the dataset not the site where it is hosted). If you use a code from an external source, the
link should be clear and direct. If the code is not too long, it is better to include it in the report
(in the appendix, or as snippets in the report), or submit it separately with your submission. If
you modify a code, the modification should be very clearly indicated (meaning you should show
the original part that you modified, and the modification you made).
Report Structure:
Your report should typically have:
o A title.
o An introduction in which you briefly describe your project.
o An implementation part, in which you should introduce the program you are using (PySpark or
another - the description should be more detailed if you use another program from the Hadoop
Ecosystem), how it is installed, how it is configured, how it works, the dataset you are applying
your program to/the data analysis task you are performing.
o A discussion of your findings.
o A conclusion.
o References.
Mark distribution:
Technical quality (45 Marks): This aspect concerns the depth of the information presented in the report
Difficulty (15 Marks): This aspect concerns the difficulty of the program used or the analysis
applied/the complexity of the dataset/applying several data analysis
tasks/programming the method by the student himself/herself.
Visualization (20 Marks): This aspect concerns the quality of visualization produced
Reproducibility (10 Marks): This aspect concerns using screen shots/providing codes used/ clear
explanation of the steps taken
Style and format (10 Marks)
Notes:
1. You are expected to use the Coventry University APA style for referencing. For support and
advice on this students can contact Centre for Academic Writing (CAW).
2. Please notify your registry course support team and module leader for disability support.
This document is for Coventry University students for their own use in completing their
assessed work for this module and should not be passed to third parties or posted on any
website. Any infringements of this rule should be reported to
facultyregistry.eec@coventry.ac.uk.
3. Any student requiring an extension or deferral should follow the university process as outlined
here.
4. The University cannot take responsibility for any coursework lost or corrupted on disks, laptops
or personal computer. Students should therefore regularly back-up any work and are advised to
save it on the University system.
5. If there are technical or performance issues that prevent students submitting coursework
through the online coursework submission system on the day of a coursework deadline, an
appropriate extension to the coursework submission deadline will be agreed. This extension will
normally be 24 hours or the next working day if the deadline falls on a Friday or over the
weekend period. This will be communicated via your Module Leader.
6. You are encouraged to check the originality of your work by using the draft Turnitin links on Aula.
7. Collusion between students (where sections of your work are similar to the work submitted by
other students in this or previous module cohorts) is taken extremely seriously and will be
reported to the academic conduct panel. This applies to both courseworks and exam answers.
8. A marked difference between your writing style, knowledge and skill level demonstrated in class
discussion, any test conditions and that demonstrated in a coursework assignment may result in
you having to undertake a Viva Voce in order to prove the coursework assignment is entirely your
own work.
9. If you make use of the services of a proof reader in your work you must keep your original version
and make it available as a demonstration of your written efforts.
10. You must not submit work for assessment that you have already submitted (partially or in full),
either for your current course or for another qualification of this university, with the exception of
resits, where for the coursework, you maybe asked to rework and improve a previous attempt.
This requirement will be specifically detailed in your assignment brief or specific course or module
information. Where earlier work by you is citable, i.e. it has already been published/submitted,
you must reference it clearly. Identical pieces of work submitted concurrently may also be
considered to be self-plagiarism.
Mark allocation