Unit Details
Unit Code: 301046
Unit Name: Big Data (PG)
Credit Points: 10
Unit Level: 7
Assumed Knowledge: It is expected that students enrolled in this unit should have basic programming skills in
any programming language and working knowledge in elementary probability and statistics,
including the concepts of random variables, basic probability distributions, expectations, mean
and variance.
Note: Students with any problems, concerns or doubts should discuss those with the Unit Coordinator as early as they can.
Unit Coordinator
Name: Miroslav Filipovic
Phone: 4736 0836
Location: Penrith Y2 08
Email: m.filipovic@westernsydney.edu.au
Consultation Arrangement:
by appointment only
Teaching Team
Name: Ivan Bojicic
Location: Penrith Y2 08
Email: i.bojicic@westernsydney.edu.au
Consultation Arrangement:
by appointment only
Note: The relevant Learning Guide Companion supplements this document
Contents
1 About Big Data (PG) 2
1.1 An Introduction to this Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 What is Expected of You . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Changes to Unit as a Result of Past Student Feedback . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Assessment Information 3
2.1 Unit Learning Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Approach to Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Contribution to Course Learning Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 Assessment Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5 Assessment Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5.1 Labs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5.2 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5.3 In-Class Exam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6 General Submission Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Teaching and Learning Activities 14
4 Learning Resources 18
4.1 Recommended Readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1
1 About Big Data (PG)
1.1 An Introduction to this Unit
””Big data”” is the label for the ever-increasing gigantic amount of data with which humanity has to cope. The
availability of data and the development of cloud computing architectures to process and analyse these data have
made data analytics a central tool in our endeavours. This unit will introduce students to the realm of ””big data””,
covering the important principles and technologies of retrieving, processing and managing massive real-world data
sets. It is designed to provide the basic techniques required by any discipline that needs to make sense out of the
growing amount of data, and to equip students with the knowledge and key set of skills set to be competitive in the
growing job market in the analytics field.
1.2 What is Expected of You
Study Load
A student is expected to study an hour per credit point a week. For example a 10 credit point unit would require 10
hours of study per week. This time includes the time spent within classes during lectures, tutorials or practicals.
Attendance
It is strongly recommended that students attend all scheduled learning activities to support their learning.
Online Learning Requirements
Unit materials will be made available on the unit’s vUWS (E-Learning) site (https://vuws.westernsydney.edu.au/).
You are expected to consult vUWS at least twice a week, as all unit announcements will be made via vUWS. Teaching
and learning materials will be regularly updated and posted online by the teaching team.
No E-Learning resources required for this Unit.
Special Requirements
Essential Equipment:
Not Applicable
Legislative Pre-Requisites:
Not Applicable
1.3 Changes to Unit as a Result of Past Student Feedback
The University values student feedback in order to improve the quality of its educational programs. The feedback
provided helps us improve teaching methods and units of study. The survey results inform unit content and design,
learning guides, teaching methods, assessment processes and teaching materials.
You are welcome to provide feedback that is related to the teaching of this unit. At the end of the semester
you will be given the opportunity to complete a Student Feedback on Unit (SFU) questionnaire to assess the unit.
You may also have the opportunity to complete a Student Feedback on Teaching (SFT) questionnaire to provide
feedback for individual teaching staff.
As a result of student feedback, the following changes and improvements to this unit have recently been made:
– We are upgrading our practicals as well as lectures to have more visial and content appeal that is relevant to
students.
2
2 Assessment Information
2.1 Unit Learning Outcomes
”Big data” is the label for the ever-increasing gigantic amount of data with which humanity has to cope. The
availability of data and the development of cloud computing architectures to process and analyse these data have
made data analytics a central tool in our endeavours. This unit will introduce students to the realm of ”big data”,
covering the important principles and technologies of retrieving, processing and managing massive real-world data
sets. It is designed to provide the basic techniques required by any discipline that needs to make sense out of the
growing amount of data, and to equip students with the knowledge and key set of skills set to be competitive in the
growing job market in the analytics field.
Outcome
1 Explain the major trends in technology, business, and science behind big data
2 Analyse and compare a selection of major big data management techniques in use today, including parallel
databases, NoSQL, MapReduce, cloud services
3 Evaluate the relative strengths and weaknesses of MapReduce and parallel database systems and apply the
appropriate technique to tackle relevant big data problems
4 Apply proper methods of data pre-processing and cleaning for big data analysis
2.2 Approach to Learning
Lectures are designed to introduce students to this emerging technology and get updated on the major trends behind
big data. Students will also be taught the important concepts and
principles for big data processing and analytics, including databases, parallel computing and data wrangling.
Practicals will assist students to acquire and develop the necessary practical skills required for big data processing and analytics. In particular, students will obtain hands-on experiences in Python programming, one of the most
popular and powerful languages for data analysis, and database management skills such as SQL and NoSQL query
methods, as well as the use of MapReduce framework to solve practical big data problems.
3
2.3 Contribution to Course Learning Outcomes
3701: Graduate Certificate in Information and Communications Technology
Course Learning Outcomes ULO 1 ULO 2 ULO 3 ULO 4
1. Demonstrate knowledge of core concepts related to ICT, including established theories and recent
developments, with an understanding of the both local and international perspectives.
Introduced Developed Introduced Introduced
2. Identify, analyse and communicate problems related to ICT, and respond to stakeholder needs and
goals, within the framework of professional and ethical practice.
Developed Introduced Developed Assured
3700: Graduate Diploma in Information and Communications Technology
Course Learning Outcomes ULO 1 ULO 2 ULO 3 ULO 4
1. Develop an advanced understanding of core concepts related to ICT body of knowledge, including
established theories and recent developments with an understanding of the both local and international
perspectives.
Introduced Introduced
2. Identify, analyse and communicate problems and issues related to ICT and articulate appropriate
solutions in order to respond to stakeholder needs and goals, within the framework of professional and
ethical practice.
Introduced Developed Introduced
3. Demonstrate a high level of personal autonomy and accountability in acquisition and application of
knowledge and skills.
Developed Developed Developed
3699: Master of Information and Communications Technology
Course Learning Outcomes ULO 1 ULO 2 ULO 3 ULO 4
1. Demonstrate an advanced understanding of core and specialised concepts related to ICT body of
knowledge, including established theories and recent developments with an understanding of the both
local and international perspectives.
Introduced Introduced
2. Identify and analyse problems and issues related to ICT and articulate appropriate solutions and justify
propositions in order to respond to stakeholder needs and goals, within the framework of professional and
ethical practice.
Introduced Developed Introduced
3. Demonstrate a high level of personal autonomy and accountability, in acquisition and application of
knowledge and skills and in problem solving in professional context.
Developed Developed Developed
4. Apply enquiry-based learning, investigate and synthesise complex ideas and concepts, and develop ways
of learning by exploring new knowledge within ICT discipline.
Assured Assured
5. Develop skills in scholarly research and communicate complex ideas in a variety of formats to diverse
audiences.
Assured Assured
4
3698: Master of Information and Communications Technology (Advanced)
Course Learning Outcomes ULO 1 ULO 2 ULO 3 ULO 4
1. Demonstrate an in-depth understanding of core concepts related to ICT body of knowledge, including
established theories and recent developments with an understanding of the both local and international
perspectives.
Introduced Introduced
2. Further develop knowledge and skills in specialised areas that are closely applicable to ICT profession. Introduced Developed Introduced
3. Identify and analyse problems and issues related to ICT and articulate appropriate solutions and justify
propositions in order to respond to stakeholder needs and goals, within the framework of professional and
ethical practice.
Developed Developed Developed
4. Demonstrate a high level of personal autonomy and accountability in acquisition and application of
knowledge and skills and in problem solving in professional context.
Assured Assured
5. Apply enquiry-based learning, investigate and synthesise complex ideas and concepts, and develop ways
of learning in exploring new knowledge within ICT discipline.
Assured Assured
6. Develop skills in scholarly research, critically evaluate contemporary literatures in ICT field and
communicate complex ideas in a variety of formats to diverse audiences.
Introduced Developed Introduced
3702: Master of Information and Communications Technology (Research)
Course Learning Outcomes ULO 1 ULO 2 ULO 3 ULO 4
1. Demonstrate an in-depth understanding of core concepts related to the ICT body of knowledge,
including established theories, professional ethics and recent developments with an understanding of both
local and international perspectives.
Introduced Introduced
2. Develop advanced knowledge for identifying and analysing research problems and acquisition and
application of research methods and techniques related to ICT.
Introduced Developed Introduced
3. Demonstrate cognitive, creative and technical skills to generate and evaluate complex concepts at an
abstract level for problem solving in a research context.
Developed Developed Introduced Developed
4. Analyse, investigate and synthesise complex ideas and concepts, and develop ways of learning in
exploring new knowledge within the ICT discipline.
Assured Assured Developed
5. Evaluate contemporary literature, and create a high-level plan for conducting original research in the
ICT field and communicate complex ideas and research results in a variety of formats to diverse audiences.
Assured Developed Assured
6. Design, execute and evaluate a substantial research-based project in the ICT field with a high level of
personal autonomy and accountability.
Introduced Developed Introduced
5
3735: Master of Data Science
Course Learning Outcomes ULO 1 ULO 2 ULO 3 ULO 4
1. Apply Data Science methods to problems in various disciplines (e.g. Business, Science, Social Science,
Engineering, Education and the Humanities).
Developed Developed Assured Developed
2. Conduct and manage the formulation of problems and the use of data ethically and responsibly. Developed Introduced Introduced Developed
3. Design and conduct data gathering and analysis to provide information and advice that is reliable, valid,
timely and relevant.
Developed Developed Assured Developed
4. Generate interpretive and predictive reports, working alongside professional colleagues in
decision-making.
Developed Introduced Assured Developed
5. Provide expert advice to professional colleagues on the validity and reliability of interpretations and
predictions based on analysis of large complex data sets.
Developed Developed Assured Developed
6
2.4 Assessment Summary
The assessment items in this unit are designed to enable you to demonstrate that you have achieved the unit learning
outcomes. Completion and submission of all assessment items which have been designated as mandatory or compulsory is essential to receive a passing grade.
To pass this unit you must:
1. Achievement of at least 50% overall is mandatory to pass this unit.
2. For the in-class exam, the Threshold is Yes. In order to pass the unit, student is required to pass In-class Exam
of minimum 50%.
Item Weight Due Date ULOs Assessed Threshold
Labs 30% at the end of each lab 1, 2, 3, 4 No
Project 35% Friday 7th June 2019 (17:00) – week 14 2, 3, 4 No
In-Class Exam 35% Friday 31st May 2019 – week 13 1, 2, 3 Yes
Feedback on Assessment
Feedback is an important part of the learning process that can improve your progress towards achieving the learning
outcomes. Feedback is any written or spoken response made in relation to academic work such as an assessment
task, a performance or product. It can be given to you by a teacher, an external assessor or student peer, and may
be given individually or to a group of students. As a Western Sydney University student, it is your responsibility to
seek out and act on feedback that is provided to you as a resource to further your learning.
Feedback is an important part of the learning process that can improve your progress towards achieving the learning
outcomes. Feedback is any written or spoken response made in relation to academic work such as an assessment
task, a performance or product. It can be given to you by a teacher, an external assessor or student peer and may
be given to individually or to a group of students. As a UWS student, it is your responsibility to seek out and act on
feedback that is provided to you as a resource to further your learning.
In this unit you can expect oral feedbacks on your lab work during each practical session, as well as your presentation for assignment 2 during last week’s practical session when student presentations are given, and written
feedbacks feedback on your assignment 2 report submitted to vUWS upon the completion of the marking process.
7
2.5 Assessment Details
2.5.1 Labs
Weight: 30%
Type of Collaboration: Individual
Due: at the end of each lab
Submission: in class
Format: 10 Labs (2 Hours Each)
Length: 2 hours (10 labs)
Curriculum Mode: Practical
In each practical session, you are expected to complete a few exercises. These may include programming exercises
or/and short answer questions. Your work will be marked by the tutor at the end of each lab session. Marks are
assigned based on attendance as well as the quality of the answers provided for the mared exercises conducted in
each session. The total mark achievable for this assessment item is 30%, with 3% for each lab.
Resources:
Check the lecture notes for the corresponding week and references therein for further details
Marking Criteria:
Criteria High Distinction Distinction Credit Pass Unsatisfactory
Marks are assigned
based on
attendance as well
as the quality of the
answers provided
for the planed
exercises conducted
in each session.
85%+
Demonstrates
complete
understanding of
the problem. All
requirements of the
question are
included in
response.
75% – 85%
Demonstrates
considerable
understanding of
the problem. Most
requirements of the
question are
included in
response.
65% – 75%
Demonstrates
partial
understanding of
the problem. Many
requirements of the
question are
included in
response.
50% – 65%
Demonstrates little
understanding of
the problem. Many
requirements of the
question are missing
from response.
<50%
Demonstrates no
understanding of
the problem, or no
response/question
not attempted.
8
2.5.2 Project
Weight: 35%
Type of Collaboration: Individual
Due: Friday 7th June 2019 (17:00) – week 14
Submission: Online (report/programme), In class (Presentation)
Format: A report with a maximum of 2000 words + Final presentation 10 minutes including
question time. A report should be submitted in PDF format.
Length: A report with a maximum of 2000 words + final presentation 10 minutes including
question time
Curriculum Mode: Report
This is a take-home assignment that includes two parts. For the first part, you have two options. For option 1,
you can write a technical report to discuss one of the major techniques in big data, including but not limited to
parallel database, NoSQL, MapReduce, data wrangling and cleaning, predictive analysis, etc. For option 2, you need
to write a program to perform data acquisition and manipulation for a simple data analysis task. You need to submit
the report or the source code of your programme through the submission link on the vUWS site of this unit by the
submission deadline (5pm Friday Week 14). The first part of the project is worth 20%.
Which-over option you take, you will have to present your work in the lab class in week 14 as the second part of
this project. You will be allocated 10 minutes including question time to go through the content of your report in
Powerpoint slides or show your programme to the audience (tutor and rest of the class) followed by questions from
the audience. The mark for your presentation will be determined by peer assessment, i.e. averaging marks that the
tutor and other students give you. The second part of the project is worth 15%.
This is an individual project and you must work by yourself. However under special circumstances and subject to the
approval of the unit coordinator, you can form groups of 2 students and work in pairs. In this case, you will have to
label clearly the individual contributions for the submission of report/programme. Both of you have to participate in
the presentation. You will still be marked based on individual performance.
Resources:
– Lecture notes and links provided therein
– Recommended readings
Marking Criteria:
Criteria High Distinction Distinction Credit Pass Unsatisfactory
Objective 10% The report has
clear aim and is
well focused on a
single important
technique in big
data
The report has
clear aim and is
well focused on a
single important
technique in big
data
Mainly focusing on
a single area of big
data, with some
deviations and
irrelevant
discussions
Mainly focusing on
a single area of big
data, with some
deviations and
irrelevant
discussions
Not focusing on a
single area of big
data
Originality 20% The report contains
some original
thoughts and
analysis besides
discussions on facts
and observations
The report contains
some original
thoughts and
analysis besides
discussions on facts
and observations
There are some
original points
made by the author
although not to a
sufficient extent
There are some
original points
made by the author
although not to a
sufficient extent
Basically repetition
of what is discussed
in the literature and
no insight/input
made on the
author’s part
Presentation 20% The report reads
very well, easy to
understand and free
from grammatical
mistakes, spelling
errors and use of
colloquial English
The report reads
very well, easy to
understand and free
from grammatical
mistakes, spelling
errors and use of
colloquial English
The report reads
well in general,
despite some
grammatical
mistakes and lack
of clarity on some
minor issues
The report reads
well in general,
despite some
grammatical
mistakes and lack
of clarity on some
minor issues
The report is poorly
written and can
hardly be
understood
Technical Quality
30%
The report is up to
a high calibre of
technicality and
reflects high level of
critical thinking
from the author
The report is up to
a high calibre of
technicality and
reflects high level of
critical thinking
from the author
The report has
good quality in
general, despite
some errors and
inaccuracies in the
discussion
The report has
good quality in
general, despite
some errors and
inaccuracies in the
discussion
The report contains
many factual errors
and contains
virtually no
technical value
9
Criteria High Distinction Distinction Credit Pass Unsatisfactory
Organisation 10% The report is well
structured into
sections. Each
section covers one
point and is further
divided into
paragraphs.
The report is well
structured into
sections. Each
section covers one
point and is further
divided into
paragraphs.
The report is
structured into
paragraphs and
sections, although
not all of them
make sense.
The report is
structured into
paragraphs and
sections, although
not all of them
make sense.
The report is poorly
organised.
References 10% Contains sufficient
references in
bibliographic
section and
references are cited
in the main text
Contains sufficient
references in
bibliographic
section and
references are cited
in the main text
References included
but are insufficient;
or references not
linked in the main
text
References included
but are insufficient;
or references not
linked in the main
text
Lacking references
10
2.5.3 In-Class Exam
Weight: 35%
Type of Collaboration: Individual
Due: Friday 31st May 2019 – week 13
Submission: in class
Format: 2 Hours Open Book. The exam contains both multiple choice questions and short
answer questions.
Length: 2 hours open book exam including reading time
Curriculum Mode: Quiz
Threshold Detail: A threshold for In-class Exam of minimum 50% – in order to pass the whole unit.
This is a paper-based open book exam to be held during the lab class in week 13. You are allowed to bring any
reference books, notes, dictionaries. You can also use non-programmable calculators. You are not allowed to use
computers, including desktop, laptops, iPads, or a smartphone. The exam is 2 hours long including reading time.
The exam contains both multiple choice questions and short answer questions. You need to attempt all questions
and provide your answers in clear writing. The exam covers all lteaching materials from week 1 to week 13. Sample
questions will be provided online on the vUWS site of this unit closer to the dates of the exam.
Resources:
– Lecture notes and links provided therein
– Recommended Readings
Marking Criteria:
Criteria High Distinction Distinction Credit Pass Unsatisfactory
Correct Answer 85%+
Demonstrates
complete
understanding of
the problem. All
requirements of the
question are
included in
response.
75% – 85%
Demonstrates
considerable
understanding of
the problem. Most
requirements of the
question are
included in
response.
65% – 75%
Demonstrates
partial
understanding of
the problem. Many
requirements of the
question are
included in
response.
50% – 65%
Demonstrates little
understanding of
the problem. Many
requirements of the
question are missing
from response.
<50%
Demonstrates no
understanding of
the problem, or no
response/question
not attempted.
11
2.6 General Submission Requirements
Submission
– All assignments must be submitted by the specified due date and time.
– Complete your assignment and follow the individual assessment item instructions on how to submit. You must
keep a copy of all assignments submitted for marking.
Turnitin
– The Turnitin plagiarism prevention system may be used within this unit. Turnitin is accessed via logging into
vUWS for the unit. If Turnitin is being used with this unit, this means that your assignments have to be
submitted through the Turnitin system. Turnitin from iParadigms is a web-based text-matching software that
identifies and reports on similarities between documents. It is also widely utilised as a tool to improve academic
writing skills. Turnitin compares electronically submitted papers against the following:
– Current and archived web: Turnitin currently contains over 24 billion web pages including archived pages
– Student papers: including Western Sydney University student submissions since 2007
– Scholarly literature: Turnitin has partnered with leading content publishers, including library databases,
text-book publishers, digital reference collections and subscription-based publications (e.g. Gale, Proquest, Emerald and Sage)
– Turnitin is used by over 30 universities in Australia and is increasingly seen as an industry standard. It is
an important tool to assist students with their academic writing by promoting awareness of plagiarism.By
submitting your assignment to Turnitin you will be certifying that:
– I hold a copy of this assignment if the original is lost or damaged
– No part of this assignment has been copied from any other student’s work or from any other source except
where due acknowledgement is made in the assignment
– No part of the assignment has been written for me by any other person/s
– I have complied with the specified word length for this assignment
– I am aware that this work may be reproduced and submitted to plagiarism detection software programs for
the purpose of detecting possible plagiarism (which may retain a copy on its database for future plagiarism
checking).
Self-Plagiarising
– You are to ensure that no part of any submitted assignment for this unit or product has been submitted by
yourself in another (previous or current) assessment from any unit, except where appropriately referenced, and
with prior permission form the Lecturer/Tutor/Unit Co-ordinator of this unit.
Late Submission
– If you submit a late assessment, without receiving approval for an extension of time, (see next item), you will
be penalised by 10% per day for up to 10 days. In other words, marks equal to 10% of the assignment’s weight
will be deducted from the mark awarded.
– For example, if the highest mark possible is 50, 5 marks will be deducted from your awarded mark for each late
day.
– Saturday and Sunday are counted as one calendar day each.
– Assessments will not be accepted after the marked assessment task has been returned to students.
– This is consistent with Clause 51 of the Western Sydney University’s Assessment Policy – Criteria and StandardsBased Assessment.
Extension of Due Date for Submission
Extensions are only granted in exceptional circumstances. To apply for an extension of time, locate an application
form via the Western Sydney University homepage or copy the following link:
https://www.westernsydney.edu.au/currentstudents/current students/forms
Application forms must be submitted to the Unit Coordinator/Convenor. Requests for extension should be made as
early as possible and submitted within policy deadlines. Appropriate, supporting documentation must be submitted
with the application. An application for an extension does not automatically mean that an extension will be granted.
Assessments will not be accepted after the marked assessment task has been returned to students.
Resubmission Resubmission of assessment items will not normally be granted if requested.
12
Application for Special Consideration
It is strongly recommended that you attend all scheduled learning activities to support your learning. If you have
suffered misadventure, illness, or you have experienced exceptional circumstances that have prevented your attendance
at class or your completion and submission of assessment tasks, you may need to apply for Special Consideration via the
Western Sydney University website. https://monkessays.com/write-my-essay/westernsydney.edu.au/currentstudents/current students/services
and facilities/special consideration2 or the Student Centre/Sydney City Campus Reception. Special Consideration is
not automatically granted. It is your responsibility to ensure that any missed content has been covered. Your lecturer
will give you more information on how this must be done.
13
3 Teaching and Learning Activities
Weeks Topic Lecture Prac/Lab Independent Assessments Due
Week 1
04-03-2019
Introduction to Big Data Unit introduction and syllabus;
Overview of big data and useful
applications; tools in big data
Get started with Python
programming and go through
the online tutorials by Code
Useful online tutorials – Labs
Academy https://monkessays.com/write-my-essay/codecademy.com/
en/tracks/python
https://monkessays.com/write-my-essay/learnpython.org/
Week 2
11-03-2019
Python Basics I Python data types; operations;
selection statements
Complete Python programming
exercises
Useful online tutorials – Labs
https://monkessays.com/write-my-essay/codecademy.com/
en/tracks/python
https://monkessays.com/write-my-essay/learnpython.org/
Week 3
18-03-2019
Python Basics II Advanced data types; loops; Practice SQL selection
statements for database query
using sqlite
Useful online tutorials – Labs
le I/O https://monkessays.com/write-my-essay/codecademy.com/
en/tracks/python
https://monkessays.com/write-my-essay/learnpython.org/
Week 4
25-03-2019
Relational Database Fundamentals of relational
database; Structured Query
Language (SQL)
Practice NoSQL CRUD (create,
read, update, delete) operations
with MongoDB
SQL links – Labs
http:
//www.w3schools.com/sql/
http:
//www.sqlite.org/docs.html
Week 5
01-04-2019
NoSQL Database Fundamentals of NoSQL
database; Comparison of SQL
and NoSQL; applications of
Practice I/O operations on NoSQL and MongoDB links – Labs
NOSQL files with diff http:
//www.mongodb.com/nosql- erent formats. explained
14
Weeks Topic Lecture Prac/Lab Independent Assessments Due
https://monkessays.com/write-my-essay/tutorialspoint.com/
mongodb/
https://monkessays.com/write-my-essay/w3resource.com/
mongodb/nosql.php
Week 6
08-04-2019
Data format CSV, XLS, HTML, XML, JSON Write Python programmes to
automatically acquire and
process Twitter tweets and
XML: – Labs
yahoo https://monkessays.com/write-my-essay/w3.org/XML/
finance data. JSON: http://json.org/
Week 7
15-04-2019
Data Acquisition and Wrangling
I and II
Web APIs; Python Pandas
library for data processing
Advanced data wrangling
exercises with Python Pandas
Twitter API: – Labs
https://dev.twitter.com/
Pandas tutorial:
http:
//pandas.pydata.org/pandas- docs/stable/tutorials.html
Week 8
22-04-2019
BREAK BREAK BREAK BREAK – Labs
Week 9
29-04-2019
BREAK BREAK BREAK BREAK – Labs
Week 10
06-05-2019
MapReduce I Fundamentals of MapReduce;
parallel processing patterns;
simple examples
Write Python programmes to
implement simple examples of
MapReduce discussed in the
lecture
MapReduce links – Labs
http://research.google.com/
archive/mapreduce.html
(the original MapReduce paper)
http://hadoop.apache.org/
docs/r1.2.1/mapred
tutorial.html
(tutorial)
15
Weeks Topic Lecture Prac/Lab Independent Assessments Due
Week 11
13-05-2019
MapReduce II Fundamentals of MapReduce;
parallel processing patterns;
simple examples
Write Python programmes to
implement simple examples of
MapReduce discussed in the
lecture
MapReduce links – Labs
http://research.google.com/
archive/mapreduce.html
(the original MapReduce paper)
http://hadoop.apache.org/
docs/r1.2.1/mapred
tutorial.html
(tutorial)
Week 12
20-05-2019
Predictive Analytics Introduction to machine
learning; overview of techniques
and tools; classfi
Create an SVM classifi The Coursera Machine Learning – Labs
cation er for digit recognition using
Python scikit-learn package.
course
https:
//www.coursera.org/course/ml
Online collection of SVM
tutorials

Tutorials


Week 13
27-05-2019
EXAM Unit revision In-Class Exam (Assessment 3) All lecture notes and references – In-Class Exam
therein
Week 14
03-06-2019
Project Project presentation Project presentation continued All lecture notes and references – Project
therein
Week 15
10-06-2019
Week 16
17-06-2019
Week 17
24-06-2019
16
The above timetable should be used as a guide only, as it is subject to change. Students will be advised of any changes as they become known on the unit’s vUWS site.
17
4 Learning Resources
4.1 Recommended Readings
Essential Reading
– Dean, J., & Ghemawat, S. (2008). Mapreduce: Simplified data processing on large clusters. Communications
of the ACM, 51(1), 107-113. doi: 10.1145/1327452.1327492
– Gates, A. (2011). Programming Pig. Sebastopol, CA: Oreilly & Associates Inc.
– Holmes, A. (2012). Hadoop in practice. Shelter Island, NY: Manning.
– McKinney, W. (2013). Python for data analysis. Beijing: O’Reilly.
– Miner, D., & Shook, A. (2013). MapReduce design patterns. Sebastopol, Calif.: O’Reilly.
– Rajaraman, A., & Ullman, J. D. (2012). Mining of massive datasets. Retrieved from http://infolab.stanford.
edu/∼ ullman/mmds.html
– Sadalage, P. J., & Fowler, M. (2013). NoSQL distilled: a brief guide to the emerging world of polyglot
persistence. Upper Saddle River, NJ: Addison-Wesley.
– Schutt, R., & O’Neil, C. (2013). Doing data science. Beijing O’Reilly Media.
– Stonebraker, M., Abadi, D., Dewitt, D. J., Madden, S., Paulson, E., Pavlo, A., & Rasin, A. (2010).
MapReduce and Parallel DBMSs: Friends or Foes? Communications of the ACM, 53(1), 64-71. doi:
10.1145/1629175.1629197
18

Published by
Write
View all posts