Data science

Dr Nina Dethlefs
n.dethlefs@hull.ac.uk

16 September 2021

Assignment – Understanding
Artificial Intelligence

771763 – 2021/2022

Description:

The assessment for this module is via a portfolio of work that will be assembled over the

course of our three lab sessions. Topics will include (1) a data analytics, interpretation +

visualisation task (lab session 1, 4/5 Nov – 18/19 Nov), (2) a computer vision task (lab session

2, 02/03 Dec), and (3) an ethical analysis (based on the lecture and materials in Week 5, w/c

29 Nov). You will receive formative support on all of these items during our lab sessions.

Specific topics that should be included in the portfolio are as follows.

Component 1 — Water quality analysis:

This component uses CEFAS’ 2021 data on biotoxins and phytoplantkon (see https://

www.cefas.co.uk/data-and-publications/habs/england-and-wales-biotoxins-and-

phytoplankton-results-2021/) to find patterns of higher or lower concentration of either (or

both) according to features provided. You should read the data into a program (second tab on

phytoplankton), clean it and then train a multi-layer feed-forward neural network to predict

from a set of input features whether the phytoplankton level detected is above the threshold

specified (see end of file). You will need to make a range of decisions in your analysis on data

cleaning, network architecture and evaluation setup.

You should answer the following questions:

• Specify the accuracy you achieved across 3 architectural modifications (e.g. different

numbers of layers, different hyperparameters, etc.)

• Why do you think your accuracy is not higher / lower?

DATA ANALYSIS AND VISUALISATION 1

mailto:n.dethlefs@hull.ac.uk

https://www.cefas.co.uk/data-and-publications/habs/england-and-wales-biotoxins-and-phytoplankton-results-2021/

• What effect does the optimisation function have on network performance?

• What happens if you include more than 4 (hidden) layers?

• What is the effect of the data size on your accuracy?

Generate and include in your your report the most suitable graphical plot of the data.

Component 2 — Multi-object recogniser:

Download the “vehicles” dataset from here and adapt your CNN from the lab session to

recognise the 4 object types in the dataset. Generate a graphical plot of your training and

validation accuracy during training. Then answer the following questions:

• How long does the network need to train until reaching an accuracy of 95% (or

does it not reach this level at all)?

• What is the tradeoff between using many layers (i.e. having a “deeper” network) and

accuracy? And layers and time?

• What is the effect of changing the pooling mechanism, e.g. average vs max?

As a follow-on part, collect your own dataset of images containing the four object

categories above. Make sure that they occur in different context, e.g. close-up, far-away, in a

busy visual context, in an isolated image, etc. It is up to you how you collect these images- you

can either take photos yourself or collect images from the internet. You should collect 20

images and copy these into your report, so I can see them.

• How well does your network do at classifying these images?

• Does fine-tuning make a difference?

Extra challenge – integrate explainability methods, such as tf-explain (https://github.com/

sicara/tf-explain ) to visualise how your model makes predictions for a small set of example

images.

Component 3 — Discussion of Ethics in AI:

Choose one of these three research papers to discuss:

DATA ANALYSIS AND VISUALISATION 2

https://universityofhull.box.com/s/9bg1lyysbt3ktczo82zxub2r5v2tgm1u

https://github.com/sicara/tf-explain

1. Energy and Policy Considerations for Deep Learning in NLP

2. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification

3. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜

In 800 words: highlight briefly the ethical challenge described and the researchers’

approach to uncovering and addressing it. Discuss in more detail areas of applied AI where

you speculate similar challenges may occur and what incentives can be provided to AI

researchers to tread carefully around ethical challenges.

This part of your portfolio should use a formal academic writing style and references

in Harvard style, see here for guidance.

Marking and components

Portfolio 100%, with each component being worth 1/3 of the overall mark.

DO NOT include programming code into the report, i.e. screenshots or similar. If you

want to present an algorithm, neural network architecture etc., then use pseudocode, a

diagram or some other presentation that is not copy-pasted code.

Code submission:

You will need to submit your code alongside your report. It will not be marked

separately but will be checked to ensure that it supports the functionality described in the

report and is not plagiarised.

Hand-in and deadline:

The portfolio is due: 14 December 2021, 2pm

Hand-in will be via Canvas.

Marking criteria:

DATA ANALYSIS AND VISUALISATION 3

https://arxiv.org/pdf/1906.02243

http://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a

https://dl.acm.org/doi/pdf/10.1145/3442188.3445922

https://libguides.hull.ac.uk/referencing/harvard

Portfolio marking criteria and weighting:

Criteria DISTINCTION MERIT PASS

Component 1 –
Neural Net

All questions are answered
(correctly) and
quantitative evidence is
provided to support the
answer (e.g. a table of
results, learning plot,
reasoned explanation +
reference).

Evidence of 3
architectural variants is
provided.

The data is fully visualised
in an appropriate plot.

Code is submitted and
fully replicates the results.

Top mark >90% – a
small discussion
paragraph is written that
relates your own findings
with the background
literature on the topic
(note: you’ll need to
identify this literature
yourself)

100 points max

Most questions are
answered (correctly) and
some evidence is provided
to support the answer.

Evidence of more than 1
architectural variants is
provided.

A visualisation of the
input data is provided.

Code is submitted and
fully replicates the results.

69 points max

Some questions are
answered (correctly) and
some evidence is provided
to support the answer.

Code is submitted and
fully replicates the results.

59 points max

Criteria

DATA ANALYSIS AND VISUALISATION 4

Component 2 –
Computer vision

A CNN is successful
trained for the multi-label
object recognition task, a
learning plot and results
are provided in evidence.

All questions are answered
(correctly) and
quantitative evidence is
provided to support the
answer.

A dataset is gathered and
shown in the report as
evidence. The dataset is
varied and includes
multiple visual
perspectives.

The code successfully
transfers to the new data
(accuracy is not important
here).

Code is submitted and
fully replicates the results.

Top mark >90% – extra
challenge is fully
completed

100 points max
A CNN is successful
trained for the multi-label
object recognition task, a
learning plot and results
are provided in evidence.

Most questions are
answered (correctly) and
quantitative evidence is
provided to support the
answer.

A dataset is gathered and
shown in the report as
evidence.

The code transfers to the
new data (accuracy is not
important here).

Code is submitted and
fully replicates the results.
69 points max

A CNN is trained for the
multi-label object
recognition task, some
evidence is provided for
this.

Some questions are
answered (correctly) and
quantitative evidence is
provided to support the
answer.

A dataset is gathered and
shown in the report as
evidence.
Code is submitted and
fully replicates the results.
59 points max

Component 3 – Ethics The research question and
methodology of the
academic paper is clearly
stated.

The ethical dilemma is
identified and stated
clearly.

At least 3 real world
applications of the
research are proposed and
ethical consequences are
discussed in a manner that
is analytical, critical and
reflective.

Top mark >90% – a set
of novel recommendations
is generated from your
review that could
influence policy making.

100 points max

The research question
and methodology of the
academic paper is stated.

The ethical dilemma is
identified and stated.

At least 1 real world
application of the
research are proposed
and ethical consequences
are discussed in a manner
that is analytical, critical
and reflective.

69 points max
The research question
and methodology of the
academic paper is stated.
The ethical dilemma is
identified and stated.

At least 1 real world
application of the
research are proposed
and ethical consequences
are discussed

59 points max

DISTINCTION MERIT PASSCriteria

DATA ANALYSIS AND VISUALISATION 5

Turn in your highest-quality paper
Get a qualified writer to help you with

“ Data science ”

Get high-quality paper

NEW! AI matching with writer

Order an Essay Now & Get These Features For Free:

Turnitin Report

Formatting

Title Page

Citation

Outline

Place an Order