Data science
Dr Nina Dethlefs
n.dethlefs@hull.ac.uk
16 September 2021
Assignment – Understanding
Artificial Intelligence
771763 – 2021/2022
Description:
The assessment for this module is via a portfolio of work that will be assembled over the
course of our three lab sessions. Topics will include (1) a data analytics, interpretation +
visualisation task (lab session 1, 4/5 Nov – 18/19 Nov), (2) a computer vision task (lab session
2, 02/03 Dec), and (3) an ethical analysis (based on the lecture and materials in Week 5, w/c
29 Nov). You will receive formative support on all of these items during our lab sessions.
Specific topics that should be included in the portfolio are as follows.
Component 1 — Water quality analysis:
This component uses CEFAS’ 2021 data on biotoxins and phytoplantkon (see https://
www.cefas.co.uk/data-and-publications/habs/england-and-wales-biotoxins-and-
phytoplankton-results-2021/) to find patterns of higher or lower concentration of either (or
both) according to features provided. You should read the data into a program (second tab on
phytoplankton), clean it and then train a multi-layer feed-forward neural network to predict
from a set of input features whether the phytoplankton level detected is above the threshold
specified (see end of file). You will need to make a range of decisions in your analysis on data
cleaning, network architecture and evaluation setup.
You should answer the following questions:
• Specify the accuracy you achieved across 3 architectural modifications (e.g. different
numbers of layers, different hyperparameters, etc.)
• Why do you think your accuracy is not higher / lower?
DATA ANALYSIS AND VISUALISATION 1
mailto:n.dethlefs@hull.ac.uk
https://www.cefas.co.uk/data-and-publications/habs/england-and-wales-biotoxins-and-phytoplankton-results-2021/
https://www.cefas.co.uk/data-and-publications/habs/england-and-wales-biotoxins-and-phytoplankton-results-2021/
https://www.cefas.co.uk/data-and-publications/habs/england-and-wales-biotoxins-and-phytoplankton-results-2021/
• What effect does the optimisation function have on network performance?
• What happens if you include more than 4 (hidden) layers?
• What is the effect of the data size on your accuracy?
Generate and include in your your report the most suitable graphical plot of the data.
Component 2 — Multi-object recogniser:
Download the “vehicles” dataset from here and adapt your CNN from the lab session to
recognise the 4 object types in the dataset. Generate a graphical plot of your training and
validation accuracy during training. Then answer the following questions:
• How long does the network need to train until reaching an accuracy of 95% (or
does it not reach this level at all)?
• What is the tradeoff between using many layers (i.e. having a “deeper” network) and
accuracy? And layers and time?
• What is the effect of changing the pooling mechanism, e.g. average vs max?
As a follow-on part, collect your own dataset of images containing the four object
categories above. Make sure that they occur in different context, e.g. close-up, far-away, in a
busy visual context, in an isolated image, etc. It is up to you how you collect these images- you
can either take photos yourself or collect images from the internet. You should collect 20
images and copy these into your report, so I can see them.
• How well does your network do at classifying these images?
• Does fine-tuning make a difference?
Extra challenge – integrate explainability methods, such as tf-explain (https://github.com/
sicara/tf-explain ) to visualise how your model makes predictions for a small set of example
images.
Component 3 — Discussion of Ethics in AI:
Choose one of these three research papers to discuss:
DATA ANALYSIS AND VISUALISATION 2
https://universityofhull.box.com/s/9bg1lyysbt3ktczo82zxub2r5v2tgm1u
https://github.com/sicara/tf-explain
https://github.com/sicara/tf-explain
1. Energy and Policy Considerations for Deep Learning in NLP
2. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification
3. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜
In 800 words: highlight briefly the ethical challenge described and the researchers’
approach to uncovering and addressing it. Discuss in more detail areas of applied AI where
you speculate similar challenges may occur and what incentives can be provided to AI
researchers to tread carefully around ethical challenges.
This part of your portfolio should use a formal academic writing style and references
in Harvard style, see here for guidance.
Marking and components
Portfolio 100%, with each component being worth 1/3 of the overall mark.
DO NOT include programming code into the report, i.e. screenshots or similar. If you
want to present an algorithm, neural network architecture etc., then use pseudocode, a
diagram or some other presentation that is not copy-pasted code.
Code submission:
You will need to submit your code alongside your report. It will not be marked
separately but will be checked to ensure that it supports the functionality described in the
report and is not plagiarised.
Hand-in and deadline:
The portfolio is due: 14 December 2021, 2pm
Hand-in will be via Canvas.
Marking criteria:
DATA ANALYSIS AND VISUALISATION 3
https://arxiv.org/pdf/1906.02243
http://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a
https://dl.acm.org/doi/pdf/10.1145/3442188.3445922
https://libguides.hull.ac.uk/referencing/harvard
Portfolio marking criteria and weighting:
Criteria DISTINCTION MERIT PASS
Component 1 –
Neural Net
All questions are answered
(correctly) and
quantitative evidence is
provided to support the
answer (e.g. a table of
results, learning plot,
reasoned explanation +
reference).
Evidence of 3
architectural variants is
provided.
The data is fully visualised
in an appropriate plot.
Code is submitted and
fully replicates the results.
Top mark >90% – a
small discussion
paragraph is written that
relates your own findings
with the background
literature on the topic
(note: you’ll need to
identify this literature
yourself)
100 points max
Most questions are
answered (correctly) and
some evidence is provided
to support the answer.
Evidence of more than 1
architectural variants is
provided.
A visualisation of the
input data is provided.
Code is submitted and
fully replicates the results.
69 points max
Some questions are
answered (correctly) and
some evidence is provided
to support the answer.
Code is submitted and
fully replicates the results.
59 points max
Criteria
DATA ANALYSIS AND VISUALISATION 4
Component 2 –
Computer vision
A CNN is successful
trained for the multi-label
object recognition task, a
learning plot and results
are provided in evidence.
All questions are answered
(correctly) and
quantitative evidence is
provided to support the
answer.
A dataset is gathered and
shown in the report as
evidence. The dataset is
varied and includes
multiple visual
perspectives.
The code successfully
transfers to the new data
(accuracy is not important
here).
Code is submitted and
fully replicates the results.
Top mark >90% – extra
challenge is fully
completed
100 points max
A CNN is successful
trained for the multi-label
object recognition task, a
learning plot and results
are provided in evidence.
Most questions are
answered (correctly) and
quantitative evidence is
provided to support the
answer.
A dataset is gathered and
shown in the report as
evidence.
The code transfers to the
new data (accuracy is not
important here).
Code is submitted and
fully replicates the results.
69 points max
A CNN is trained for the
multi-label object
recognition task, some
evidence is provided for
this.
Some questions are
answered (correctly) and
quantitative evidence is
provided to support the
answer.
A dataset is gathered and
shown in the report as
evidence.
Code is submitted and
fully replicates the results.
59 points max
Component 3 – Ethics The research question and
methodology of the
academic paper is clearly
stated.
The ethical dilemma is
identified and stated
clearly.
At least 3 real world
applications of the
research are proposed and
ethical consequences are
discussed in a manner that
is analytical, critical and
reflective.
Top mark >90% – a set
of novel recommendations
is generated from your
review that could
influence policy making.
100 points max
The research question
and methodology of the
academic paper is stated.
The ethical dilemma is
identified and stated.
At least 1 real world
application of the
research are proposed
and ethical consequences
are discussed in a manner
that is analytical, critical
and reflective.
69 points max
The research question
and methodology of the
academic paper is stated.
The ethical dilemma is
identified and stated.
At least 1 real world
application of the
research are proposed
and ethical consequences
are discussed
59 points max
DISTINCTION MERIT PASSCriteria
DATA ANALYSIS AND VISUALISATION 5