Need help with my assessment on Social Media Analytics. I need it by Tuesday 25th January or before.
The assessment must be very well written and the assessment manual need to be followed thoroughly. It is a 2000 words report. There can’t be any plagiarism as it will be checked for plagiarism and I would need the plagiarism check report. Please follow the assessment manual and word limits for each section carefully. Need to provide SIX independently researched sources in the report. Referencing must be done properly with a bibliography at the end. Harvard style referencing preferred. Related class lectures and assessment manual attached.
Only those who can actually do it properly message me because if this goes well I will take your help for more assessments.
(*) A use case is a description of one aspect of how a product (e.g. the software) is to be used in a practical
application. Example: entering login details to log into the homepage is a common use case.
Subject Code: DATA4500
Subject Name: Social Media Analytics
Assessment Title:
Investigative Report – Sentiment Analytics & Product
Development
Assessment Type: Individual Report
Weighting: 35%
Word Count: 2000 Words (+/-10%)
Total Marks 35
Submission: Turnitin
Due Date: Monday Week 10, 23:55pm AEST
Assessment Description
You are a tech consultant working for Ion Plus, a boutique tech consulting firm. Your firm has been
approached by Facebook AI, a technology division of Facebook developing AI augmented
conversations and human-machine interfacings, in particular, around language processing.
Facebook AI’s core use cases* are focused on (social) network language systems across speech
audio and vision. Through machine learning and deep learning algorithms, Facebook has deployed
its solutions across an ecosystem of over 3 billion users worldwide. This is a lucrative market but it
is beginning to show signs of saturation. To continue to grow the business, new ideas and new
markets are required.
Facebook AI has engaged Ion Plus to help them break into the market for chatbots in customer
service and social media advertising. Facebook already has a mature AI engine for natural
language processing. A transition into chatbots and broader communication spectrums is
commercially lucrative and would allow Facebook to leverage some of its established capabilities.
You have been called to lead the project.
Language expressed over social forums (including product reviews), social networks and customer
service apps is unique. Text is often abbreviated, misspelt (sometimes deliberately) and augmented
with other media elements such as images, emojis, sound clips and videos. In order to successfully
disguise itself as humanistic, we need to see whether an autonomous chatbot can be taught to
recognise and adjust social media speak in accordance in a dynamical environment where
conversation and context are in constant flux. Facebook’s goal is that once the prototype ‘social
media language’ AI has been sufficiently trained it can be used to:
1. respond to social media posts and generate influence; and
2. respond intelligently and naturally to customer service enquiries (Q&A).
Your task is to develop a report for the Board of Directors of Facebook AI. The report needs to have
an executive summary and a body addressing the below issues:
A) Discuss the value proposition of chatbots and outline the business opportunities that exist
for Facebook AI if they were to push ahead with developing a commercial prototype.
https://ai.facebook.com/
(*) A use case is a description of one aspect of how a product (e.g. the software) is to be used in a practical
application. Example: entering login details to log into the homepage is a common use case.
8 marks
B) Describe the characteristics of a chatbot and how current AI capabilities can be adapted to
develop a chatbot capable of mimicking human personalities in an online setting to achieve
marketing objectives.
8 marks
C) Identify and examine the commercial cases for adopting chatbots in customer service and
discuss to what extent Facebook’s existing AI capabilities can be leveraged to develop cost-
effective chatbots to field advanced customer service enquires.
10 marks
D) Recommend THREE specific actions Facebook AI can take over the next 12 months to test
the viability of developing and adding chatbots to their existing business.
9 marks
Students are provided with the following sources to consult:
• The truth behind Facebook AI inventing a new language
https://towardsdatascience.com/the-truth-behind-facebook-ai-inventing-a-new-language-
37c5d680e5a7
Research in Brief: Unsupervised Question Answering by Cloze Translation
https://ai.facebook.com/blog/research-in-brief-unsupervised-question-answering-by-cloze-
translation/
Introducing long-form question answering
https://ai.facebook.com/blog/longform-qa/
A new generative QA model that learns to answer the whole question
https://ai.facebook.com/blog/a-new-generative-qa-model-that-learns-to-answer-the-whole-
question/
Research in Brief: Training AI to Answer Questions Using Compressed Search Results
https://ai.facebook.com/blog/research-in-brief-training-ai-to-answer-questions-using-
compressed-search-results/
• Chatbots in Customer Service – Accenture Interactive
https://www.accenture.com/t00010101t000000__w__/br-pt/_acnmedia/pdf-45/accenture-
chatbots-customer-service
• Chatbots Point of View – Deloitte Digital
https://www2.deloitte.com/content/dam/Deloitte/nl/Documents/deloitte-analytics/deloitte-nl-
chatbots-moving-beyond-the-hype
Students should refer to the RECOMMENDED RESOURES above and provide SIX of their own
independently researched sources in their report.
The report should be appropriately referenced based on the resources found on MyKBS (Academic
Success Centre): https://elearning.kbs.edu.au/mod/page/view.php?id=194263
https://towardsdatascience.com/the-truth-behind-facebook-ai-inventing-a-new-language-37c5d680e5a7
https://towardsdatascience.com/the-truth-behind-facebook-ai-inventing-a-new-language-37c5d680e5a7
https://ai.facebook.com/blog/research-in-brief-unsupervised-question-answering-by-cloze-translation/
https://ai.facebook.com/blog/research-in-brief-unsupervised-question-answering-by-cloze-translation/
https://ai.facebook.com/blog/longform-qa/
https://ai.facebook.com/blog/a-new-generative-qa-model-that-learns-to-answer-the-whole-question/
https://ai.facebook.com/blog/a-new-generative-qa-model-that-learns-to-answer-the-whole-question/
https://ai.facebook.com/blog/research-in-brief-training-ai-to-answer-questions-using-compressed-search-results/
https://ai.facebook.com/blog/research-in-brief-training-ai-to-answer-questions-using-compressed-search-results/
https://www.accenture.com/t00010101t000000__w__/br-pt/_acnmedia/pdf-45/accenture-chatbots-customer-service
https://www.accenture.com/t00010101t000000__w__/br-pt/_acnmedia/pdf-45/accenture-chatbots-customer-service
https://www2.deloitte.com/content/dam/Deloitte/nl/Documents/deloitte-analytics/deloitte-nl-chatbots-moving-beyond-the-hype
https://www2.deloitte.com/content/dam/Deloitte/nl/Documents/deloitte-analytics/deloitte-nl-chatbots-moving-beyond-the-hype
https://elearning.kbs.edu.au/mod/page/view.php?id=194263
(*) A use case is a description of one aspect of how a product (e.g. the software) is to be used in a practical
application. Example: entering login details to log into the homepage is a common use case.
Task-Based Rubric
Section Criteria
A) Research
and Description
of Current
Trends
Demonstrate an ability to describe, evaluate and conduct
research on current trends in big data and analytics as they relate
to sentiment analytics and applications of natural language
process and latent semantic analysis.
Integrate an understanding of analytics techniques, governance
and social media to address the application of big data and data
analytics to commercial developments.
B) Predictions
and Reflections
Synthesise theory and practice taught in the course, personal
research and the reports provided to provide a logical, convincing
and supported conclusion/prediction on the effects of big data and
analytics on the future of social media, social networks and AI-
assisted recommendation engines.
C) Recommend
strategic
actions by the
bank
Recommend three specific strategies an organisation could
undertake to address the changing landscape of social media and
social networks in light of emerging machine learning and analytics
capabilities. Briefly justify each strategy with reference to current
theory and practice.
Referencing,
Presentation
and
Communication
Construct a report that follows a logical sequence of an executive
summary, introduction, clear sections and a summary/conclusion.
Judge the needs of the report audience (senior management and
data professionals) and adapt the report structure, presentation
and jargon to address those needs.
Conduct research into the report topic based on provided
readings and personal research. Demonstrate an understanding of
which sources are most relevant to addressing the report focus.
Apply Harvard Referencing conventions to acknowledging all
sources used in constructing the report
Page 4 Kaplan Business School Assessment Outline
Important Study Information
Academic Integrity Policy
KBS values academic integrity. All students must understand the meaning and consequences
of cheating, plagiarism and other academic offences under the Academic Integrity and Conduct
Policy.
What is academic integrity and misconduct?
What are the penalties for academic misconduct?
What are the late penalties?
How can I appeal my grade?
Click here for answers to these questions:
http://www.kbs.edu.au/current-students/student-policies/.
Word Limits for Written Assessments
Submissions that exceed the word limit by more than 10% will cease to be marked from the point
at which that limit is exceeded.
Study Assistance
Students may seek study assistance from their local Academic Learning Advisor or refer to the
resources on the MyKBS Academic Success Centre page. Click here for this information.
http://www.kbs.edu.au/current-students/student-policies/
https://elearning.kbs.edu.au/course/view.php?id=1481
- Assessment 2 Information
W2: DATA4500
User Composition, Behaviour
&
A look at the force of positive feedback in a social
network, the network effect and strategies for
developing monetised channels.
DATA4500 Roadmap
Week 1
A Brief History
Week 2
User Behaviour &
Monetisation
Week 3
Methods of
Analysis
Week 4
Commercial
Opportunities
Week 5
Sentiment
Analysis –
Part 1
Week 6
Assessment:
Case Study 1
Week 7
Sentiment
Analysis –
Part 2
Week 8
Insights Mining
Part 1
Week 9
Insights Mining
Part 2
Week 10
Ethical
Considerations
Part 1
Week 11
Ethical
Considerations
Part 2
Week 12
Assessment:
Case Study 2
Lesson Learning Outcomes
1 Develop an appreciation of social media user
trends.
2 Examine the user composition of social media sites
and draw conclusions about their behaviour
3 Evaluate the efficacy and impact of various
monetisation channels.
4 Analyse the integration of SMA with traditional and
digital transformative business models.
5 Consider the social aspects of online gaming and its
synergies with social media.
Celinne Da Costa
Writer, speaker & freelance
journalist
“should the business just
be a product or service
provider, or a vision that
an audience believes in
and subscribes to?”
This is what we will cover…
Trends in
content
&
spend behaviour
monetisation
monetisation
’s
business model
Gaming &
Streaming
Pull up your search history from
your phone, laptop or tablet.
1. Which sites do you regularly
visit?
2. What do you use the site for
(e.g. news, social,
eCommerce)?
More than half the world’s population have persistent access to the
internet and almost as many are mobile internet users (2019).
Source: Hootsuite (??)
World’s most visited websites – based on number of visitors and total
page views from Alexa (2019).
How internet users engage social media – survey based results (2019).
Top Youtube search queries (2018).
Content streaming activities* (% of internet users who stream each
type of content, 2019).
More than 1 billion people around the world now stream games over
the internet each month, with games like Fortnite becoming global
phenomena.
* Note that the above is derived from survey results.
Social media penetration by country as % of total population (2019).
Social media growth rankings – absolute (2019).
Social media growth rankings – relative (2019).
Social media demographics – countries with the most stark gender
skews (2019).
eCommerce
The world’s first online sale was recorded by Pizza Hut in 1994…
Pizza Hut’s PizzaNet had one problem – can you guess what it was?
eCommerce
E-Commerce spend vs. total retail spend (2019).
eCommerce
Proliferation of E-Commerce amongst internet users (2019).
Class Activity
Do you spend more more buying things online or offline in physical
stores?
If you buy online, what is your most-used website to make purchases?
How would the following features affect your likelihood of making
online purchases:
More than 2,200 stores are closing in 2020.
Follow the link to the article. What are the reasons for these closures?
How is social media contributing to retail’s future?
Pier 1 Imports: 450 stores
Papyrus: 254 stores
Gap: 230 stores
Walgreens: 200 stores
Chico’s: 200 stores
Forever 21: 178 stores
Destination Maternity: 183 stores
A.C. Moore: 145 stores
Bose: 119 stores
Olympia Sports: 76 stores
Sears: 51 stores
Earth Fare: 50 stores
Kmart: 45 stores
Bed, Bath & Beyond: 44 stores
Lucky’s Market: 32 stores
Express: 31 stores
Macy’s: 30 stores
Hallmark: 16 stores
JCPenney: 6 stores
https://www.businessinsider.com.au/stores-closing-in-2020-list-2020-1?r=US&IR=T
Time Spent
How internet users engage social media – survey based results (2019).
Music #1
Top Youtube search queries (2018).
• Growth in social media access highly correlated with internet access
and smartphone usage.
• China, India, Indonesia and Brazil are strong markets. However most
rapid growth coming from other developing nations.
• 25 years after the first online sale, eCommerce remains a rapid and
transformative force in defining retail competition.
• Time spent on social media for personal and work purposes makes
it an integral part of narrative shaping.
Group activities:
Q1. How does the internet and social media stimulate business and
commercial activity? Use figures from previous slides to justify your position.
Q2. Is there any correlation between how users spend their time on the
internet and the sites they visit? Use figures from previous slides to justify
your position.
Q3. Imagine you’re running a social media platform (e.g. Facebook, Twitter,
Tik Tok etc). Based on the information and statistics from the previous slides,
what should you do to try and expand your business?
Monetisation
Monetisation in social media is the process of deriving revenue from
content.
Fee for Service
An example of monetisation is paid content. You may have a channel
that provide free and premium content. Users may subscribe to
premium content for a monthly fee.
User Productisation
By far, the most popular method of monetisation is the insertion of
advertisements throughout ‘free’ content. Users ‘pay’ for their content
by agreeing to become products ‘consumed’ by advertisers.
Facebook
Monetisation channels…
In-stream Ads Branded
Content
Fan
Subscriptions
Audience
Network
Instant
Articles
On
Platform
Off Platform
• Facebook’s in-stream video ads: 5-15 second video ads to people
watching videos on Facebook
• Strict eligibility requirements to place ad breaks in their video
content. Community Standards- authenticity, safety, privacy, dignity
etc
Why in-stream ads?
• Deliver complex marketing messages.
• Multimodal engagement – sound, video, captions.
• Reach (1+ billion audience) and penetration (92% ad impression).
What is it? A marketing technique of creating content linked to a brand
that increases consumer connection with the brand
Why branded content?
• Publishers create content for social feeds and mobile environments.
• Build credibility with their audience – brand trustworthiness and
loyalty.
• Business partners benefit from high-quality content shared with
audiences from trusted sources.
S htt // l di /10 b d t t k ti l /
http://alvomedia.com/10-brand-content-marketing-examples/
What is it? Exclusive access for “fans” to content for a subscription fee
to be part of a community.
Why subscription content?
• Earnings stability – predictable monthly recurring earnings, helping
you to plan ahead.
• Supporter benefits – Reward your most loyal fans with exclusive
access, content and more.
• Custom terms – Tailor the supporter experience in a way that’s
authentic to your brand, style and community.
News Content
News Feed is the first thing people see when they log in to Facebook. Its
goal is to show people the stories they care about most.
Principles for effective distribution:
• Deliver value through meaningful, informative content.
• Ensure accuracy and authenticity.
• Observe standards for safe, respectful behaviour.
News feeds track content, preferences and sentimentality:
• Inventory – The collection of stories shared and pages followed.
• Signalling – Detailed metadata on the content and its viewers.
• Predictions – Recommendation algorithms for content choices.
• Scoring – Rank content on consolidated signals.
Audience Network takes the concept of Facebook ads and extends them
to external apps and websites.
Audience Network uses the same analytics and targeting tools as
Facebook ads. Campaign optimisation and delivery is automated to
reduce the advertiser’s overhead.
Why use Audience Network?
• Targeted monetisation – AN ad optimisation is surgical and
selective. Ads are more likely to match up with audience interests.
• Quality control – Pre- and post-campaign transparency, block lists,
keyword blocking and severity level controls for brand-safe ads.
• Market-driven bidding process – Bidding provides an impartial and
open auction that driven revenue yield.
For the following questions, use your phone or laptop to search relevant
statistics to support your conclusions.
Q1.Define Audience network, fan subscription and branded content
monetisation channels. How could a furniture company like Freedom
https://www.freedom.com.au use these channels to promote its products?
Q2. Why does Facebook maintain multiple monetisation channels if all of its
cash generating capacity is in advertising?
Q3. In groups, identify Facebook’s biggest revenue stream. What is the biggest
threat to that revenue stream. What could cause Facebook’s revenue
composition to change in the future?
https://www.freedom.com.au
Youtube
Monetisation channels…
Advertising
Revenue
Channel
Memberships Merchandise
PremiumSubscription
TrueView Ads
Skippable ads (after 5 secs) that appear at the beginning of YouTube videos.
Cost effective – pay only when videos plays for more than 30 secs or played
until end.
Non-skippable Video Ads
Max length 20 sec video that tells a deeper, more nuanced story with build up.
Skilful ad-targeting is essential to hone in on desired audience with willingness
to listen and engage.
Bumper Ads
Non-skippable but max length 6 sec. Paid on cost-per-thousand-clicks basis.
Ideal for targeting mobile users and for recycling longer content.
Channel memberships allow viewers to join your channel through
monthly recurring payments (subscriptions).
Why channel memberships?
• Create a loyal fanbase that generates recurrent revenue.
• Forge brand stability through loyal following.
• Use as launching pad for campaigns targeting broader audience.
• Gather comprehensive data and metadata on users – far more
information comprehensive than regular Youtube users.
Most popular YouTube Channels: https://www.brandwatch.com/blog/most-subscribed-
youtubers-channels/
Content creators offering physical and digital merchandise alongside
premium content.
Adds a new feature ‘BUY’ to existing features ‘LIKE’ and ‘SUBSCRIBE’.
Why merchandising?
• Once a side show, merchandising is now a significant revenue stream.
E.g. Star Wars
– $9.31B worldwide box office.
– $3B in toys, $3.5B in video games, $2B in book sales, $1.3B in
licensing deals.
Similar figures for MARVEL’S Avenger series and DC’s Batman.
Chat Flairs
Flair: Internet ‘badge of honour’ for a username or avatar. It can be text,
icons / emojis or a combination of both.
They have no monetary value. Basic flairs are usually available free of
charge. Channel members have access to premium flairs from their
subscription services.
Why flairs?
• Good question – nobody really knows. But they are very popular.
• There is no evidence of causality between flairs and revenue.
• Flairs come with subscriptions so channel members almost always
use flairs. Flairs are one type of artefact that help promote identity
and belonging in a forum.
Youtube
Premium
Premium Youtube: Removes ads and allow users to download videos.
Premium Youtube and standard Youtube have a cannibalising
relationship – they are mutually exclusive and every additional
premium user means one less audience member for advertisers.
Why offer premium?
• Brand authenticity – Ad free content has value and it would be
disingenuous to not offer it when doing so is trivial.
• More data points on users – it is important to understand and
differentiate:
– users that prefer the ‘fremium’ model where they are subject to
ads and become the product; and
– users that prefer a ‘fee for service’ model.
Part of Facebook and Youtube’s appeal is the straightforwardness of their
value proposition and monetisation model (esp. Youtube)…
Revenue sharing agreements between the platform provider and content
creators are open, transparent and market-driven.
Advertisers
Users
Users
$$ $
Premium
Fremium
Influencer
1. Which public figures are you most influenced by?
2. What makes them trustworthy to you?
3. What have they influenced you to do?
LinkedIn
A multi-sided platform for multi-sided customer segments…
Hiring Learning &
Development
HR Managers Professionals
LinkedIn
LinkedIn’s primary growth tool is its freemium model. As a professional
social network, anyone can join the platform thus enabling its viral growth.
Q1. Examine Facebook’s monetisation channels. How do they work and why
do they work?
Q2. Examine Youtube’s monetisation channels. What similarities can you
draw between Youtube’s and Facebook’s monetisation channels? What
differences can you draw?
Q3. What are they key features of Facebook’s and Youtube’s business model?
Q4. In groups, look up LinkedIn’s business model. What are the key features?
How can the platform be monetised?
Historically, getting users to pay for content has been an uphill battle.
‘Free-to-play’ online games changed that dynamic…
League of Legends was the largest gaming category in December by hours
watched at 74 million hours.
Gaming & Streaming
The US gaming industry generated $120 – $180 billion USD in revenue in
2019.
It was already a hot industry in the 1990s and 2000s, having gained
traction with Gen X, Gen Y and Millennials.
Social media helped to delocalise and unlock value in gaming by creating
a platform that elevated gaming to a sport giving it an audience.
Social media gave gamers an identity from which to tell their stories.
Twitch (Amazon)
9.3 billion hours
(73.2%)
Youtube (Google)
2.7 billion hours
(21.2%)
Facebook
356 million hours
(2.8%)
Mixer (Microsoft)
353 million hours
(2.8%)
Gaming & Streaming
Google and Amazon engaged in a bidding war in
2014 for Twitch. Amazon won the bid, shelling out
$1 billion USD cash for the acquisition. Twitch
was valued at nearly $4 billion USD in 2019.
Youtube maintains a formidable share of the
gaming market thanks to the organic synergies
between video streaming and game streaming.
However, growth has stagnated in recent years.
Facebook and Microsoft (Mixer) are relative
latecomers but growing fast. Facebook has the
world’s largest social media audience and
Microsoft has a rich gaming eco-system with a
large number of titles (PC & XBox).
The social and monetary ecosystem of gaming…
Game Vendors
Advertisers
Gaming
Community
Influencers
(Product Partner)
Platform
Sponsorship
$
Sponsorship
$
Sponsorship
$$
$$
Sponsorship
Streaming Content
Game Revenue $
Streaming
Content
Subscription $
Ad Views
Q1. Look up a popular online game… how the game make money?
Q2. How and why are the tech giants diversifying themselves across a
multitude of social media and gaming networks? Provide examples to support
your response.
This is what we have covered…
Trends in
content
eCommerce &
spend behaviour
Facebook
monetisation
Youtube
monetisation
LinkedIn’s
business model
Gaming &
Streaming
✅
DONE!
✅
DONE!
✅
DONE!
✅
DONE!
✅
DONE!
✅
DONE!
Established Methods in Social Media Analytics
• Social Media Analytics (SMA) and its business impact.
• Distinctions between SMA and traditional (statistical)
analysis.
• Capabilities introduced by Sentiment Analysis and Insights
Mining.
• applications SMA in trend detection and its merits as an early
warning system.
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Youtube Premium
Stories and identity…
Lecture Roadmap
Class Activity
Access
Popular Sites
Time Spent
Music #1
Content
Penetration
Ranked by Size…
Ranked By Growth…
Gender Skews
eCommerce
eCommerce
eCommerce
Class Activity
Closing Down
Time Spent
Music #1
Conclusions
Activity 1
Monetisation
Facebook
In-Stream Advertisements
Branded Content
Fan Subscriptions
News Content
Audience Network
Activity 2
Youtube
Advertising Revenue
Channel Memberships
Merchandising
Chat Flairs
Social Media – Business Model
Influencer
LinkedIn
LinkedIn
Activity 3
Gaming & Streaming
Gaming & Streaming
Gaming & Streaming
Social Gaming – Business Model
Activity 4
Lecture Roadmap (Revisited)
Next Week
W3: DATA4500
Established Methods in
Social Media Analytics (SMA)
Current directions in SMA capabilities development
and applying a marketer’s perspective toward public
perceptions and brand profitability.
DATA4500 Roadmap
Week 1
A Brief History
Week 2
User Behaviour &
Monetisation
Week 3
Methods of
Analysis
Week 4
Commercial
Opportunities
Week 5
Sentiment
Analysis –
Part 1
Week 6
Assessment:
Case Study 1
Week 7
Sentiment
Analysis –
Part 2
Week 8
Insights Mining
Part 1
Week 9
Insights Mining
Part 2
Week 10
Ethical
Considerations
Part 1
Week 11
Ethical
Considerations
Part 2
Week 12
Assessment:
Case Study 2
Lesson Learning Outcomes
1 Develop an appreciation of Social Media Analytics
(SMA) and its business impact.
2 Examine the distinctions between SMA and
traditional (statistical) analysis.
3 Evaluate the capabilities introduced by Sentiment
Analysis and Insights Mining.
4 Analyse the process of developing and implementing
institutional SMA capabilities.
5 Consider applications SMA in trend detection and its
merits as an early warning system.
Carly Fiorina
Former CEO, HP
“The goal is to turn data
into information and
information into insight.”
This is what we will cover…
Know Your
Audience
SMA & Business
Opportunities
Motivations &
Capabilities
Sentiment
Analysis
Insights
Mining
SMA Costs &
Benefits
Human activity in the digital domain has grown exponentially every
since the advent of commercial internet…
…and this trend is not expected to change…
500+ million
4540 million
200 million
625 million
Active
Blogs
Active
Internet
Users
FY2009 FY2019
7.7 billion6.8 billionWorldPopulation
Social media may be defined as a “group of Internet-based applications
that build on the ideological and technological foundations of Web 2.0,
and that allow the creation and exchange of user generated content”
FY2010
30 million^
FY2019
330 million^
^ Active Monthly Users
FY2010
431 million^
FY2019
2449 million^
FY2010
750 million^
FY2010
1860 million^
The use of social media creates vast amounts of information on a daily
basis:
• consumer opinions and experiences
• sentiments toward brands, products and services
This information is used by businesses in their decision feedback loop to:
• address business funnel constraints and pain points
• design new campaigns
Business Impact
Benefits of organisations bolstering their analytical capabilities to analyse
and interpret vast amounts of online information to gain costumer and
business insight:
• Scale and Speed
Social media campaigns can be set up rapidly. The distribution
infrastructure already exists in the internet, optimised search engines
and modern browsers.
• Cost efficiency and operational flexibility
Reduce reliance on physical infrastructure saves material and human
capital. Real time reporting and a tighter feedback loop allow pruning
to occur at a much faster pace.
Machine Learning is the right tool for SMA – this is a data rich environment
with overwhelming quantities generated on a daily basis.
It’s all about making predictions.
Data Prediction
Sun + Wind + Time
Radiology
Genetic
Data
Emails
Social Media Data
Future Energy Price
COVID-19 Precursor
IFV Trait Selection
Spam FIlter
Sentiment Analysis
AI – replicate intelligence in machines.
ML – study of algorithms that learn through experience (data).
DL – applying neural networks to ML.
Normal software is basically a set of rules, written by a human, intended
to achieve a particular output.
Machine learning software finds rules (patterns) on its own and tries to
produce a certain output. It’s software that writes software.
Traditional Programming
Data
Machine Learning
Program
Computation Output
Desired Output
Training Output
Model /
Algorithm
Existing Data
New Data
Traditional Programs vs. ML
Here’s a quick and easy flowchart to help you figure out what type of
algorithm is being executed…
Algorithm
Pattern
recognition
using big data?
NO
YES
Not ML
Algo receiving
explicit
instructions?
NO
YES
Reach objective
through trial
and error?
Supervised
Learning
Machine
Learning
YES
NO
Unsupervised
Learning
Reinforced
Learning
Algo uses
artificial neural
networks?
YES
Deep
Learning
Traditional business analytics systems use structured data.
Structured
Data Source
Statistical
Pattern
Recognition
Reporting
Name of Area PopDensity SexRatio PropMAgric PropMMining PropFManuf PropFDomServ DRTuberculosis DRLung ClaimRisk
Aberayron 0.205 0.807 0.483 0.003 0.067 0.112 3.901 1.204 HIGH
Abergavenny/Bedwelty 0.804 1.119 0.104 0.301 0.068 0.088 2.161 4.279 MEDIUM
Aberystwyth 0.199 0.884 0.34 0.22 0.075 0.135 2.578 1.837 MEDIUM
Abingdon 0.376 0.935 0.49 0.006 0.204 0.131 2.489 3.427 HIGH
Alcester 0.334 0.988 0.449 0.015 0.256 0.111 1.803 3.064 LOW
Alderbury/Salisbury 0.464 0.926 0.313 0.01 0.112 0.187 2.464 3.097 HIGH
Alnwick 0.206 0.94 0.413 0.07 0.074 0.145 2.644 2.36 MEDIUM
Alresford 0.182 1.046 0.534 0.004 0.057 0.176 2.449 1.342 LOW
Alston 0.172 0.98 0.108 0.582 0.071 0.141 2.251 4.75 LOW
Alton 0.236 0.987 0.589 0.006 0.058 0.156 2.189 3.323 LOW
Altrincham (Bucklow) 0.614 0.933 0.461 0.019 0.087 0.21 2.289 2.957 MEDIUM
Social media data is…
Heterogeneous
There is high variability of data types and formats. They
are possibly ambiguous and low quality due to missing
values, high data redundancy, and untruthfulness.
Unstructured
Lacks a pre-defined data model or is not organized in a
pre-defined manner. Data is typically text-heavy and may
be plagued by irregularities and ambiguities.
SMA
Capabilities
Combining new analytics technologies with organisational processes,
people and
knowledge
creates SMA capabilities.
INFRASTRUCTURE
Heterogeneous &
unstructured data
Organisational
processes, people &
knowledge
Organisational
Motivation
SMA
Capabilities
Value Add
What are the drivers that are pushing the
organisation to innovate?
Is there a need and a want to improve
customer insights?
What knowledge, competencies and
technologies are available to the
organisation to help it better understand
its customers and environment?
Has there been any quantifiable change
in the organisation’s enterprise value as a
result of implementing SMA for better
decision making?
Q1. What information is generated by social media and how is that
information used by businesses in their decision feedback loop?
Q2. What are the benefits of Social Media Analytics to the organisation? Give a
product example (good or service).
Q3. What is the difference between SMA and traditional analytics? What
typically characterises social media data?
Q4. Describe how SMA capabilities are created. In your response, address the
business impact framework for SMA.
Organisational motivations are defined as the goals that an organisation
pursues, and guide the subsequent actions of that organisation.
• Product Leadership
An organisation may have a goal to develop
product leadership within its market niche by
the introduction of innovations in the design of
existing products.
• Market Position
Strengthening the organisation’s market
position by the introduction of online
marketing innovations to increase customer
intimacy.
Motivation –
Customer Insights
Businesses need insights into customer’s values and behaviour in order to
create products and services that are sought after and fit for purpose.
“With the goal of targeting specific interest groups, the company needs a
clear understanding of each group’s needs and habits.”
Excited to meet the @samsungmobileus at #sxsw so I can
show them my Sprint Galaxy S still running Android 2.1. #fail
.@wesley83 I have a 3G iPhone. After 3 hrs
tweeting at #RISE_Austin, it was dead! I need to
upgrade. Plugin stations at #SXSW.
At #sxsw. Oooh. RT @mention Google to Launch Major New
Social Network Called Circles, Possibly Today {link}
The impact of on-line marketing campaigns can be scientifically gauged.
It is necessary to track whether the messages were delivered to intended
customer segments which impacts the return on investment of social
media initiatives.
Emotions that
make online
content go viral.
(based on the
10,000 most
shared articles
across the web)
Organisations need to gather new ideas about brands, products and
services, including online feedback, in order to continue driving growth.
• Why does your brand exist?
• Why does your brand matter?
• What are your customers unique set of needs?
• What sets you apart and why is that meaningful for
your customers?
• What do you want your customers to say about you
when you’re not in the room?
With the increasing use of ad-blockers and public mistrust in institutional
policies toward data privacy, the need to identify people or communities
that have the power and ability to influence the intentions of others
becomes all the more important.
Organisation
SM Influencer
Social Media Ecosphere
Ad Blockers
Privacy
Users
Motivation – Social Influencers
From motivations to capabilities…
Organisational Motivations
• Provide customer insights
• Develop social media strategies and initiatives
• Gather ideas about brands, products and services
• Determine the impact of online campaigns
• Identify social influencers
SMA Capabilities aimed to…
• Understand content, context and business impact of online posts
and conversations – e.g. sentiment analysis.
• Discover valuable customer information – e.g. insight mining.
• Monitor relationships between online users and communities – e.g.
influence analysis, network analysis.
Q1. What is the motivation behind understanding (consumer) sentiment?
How does this lead to new ideas?
Q2. Discuss the motivation behind using social media influencers to market
products and brands. What are the risks and rewards?
Q3. Form two groups. In your group discuss how organisational motivations
can lead to the development of SMA capabilities. Use practical examples to
supplement your views.
Capabilities are the ability of organisations to utilise resources to perform a
coordinated set of tasks.
Capabilities are a key concept within the Resource-based view (RBV) of
the firm.
Resource Based View
Organisation
Intangible Tangible
Skills
Knowledge
Routines Processes
Hardware
Software
PeopleData
SMA capabilities are a mutually reinforcing system of SMA technology
assets and organisational SMA competencies.
Customer Insight Mining Example
It is the ability of an organisation to utilise SMA related resources to
perform SMA tasks.
SMA
Technology
SMA
Competencies
Hardware
Software
Data
Skills
Knowledge
People
Routines Processes
Customer Insights
A number of SMA capabilities may be identified from the literature and
success stories.
These include the ability of organisations to discover into:
• customer behaviours – what they do vs. what they say they do;
• customer intentions – long term behavioural, end state;
• customer preferences – wants and needs; and
• customer demographics – population strata and structures.
“…[sentiment analysis was used] to assess whether a given text [customer
comment on product] expresses a positive, negative or neutral comment.”
(IBM 2012, p4)
Sentiment analysis is contextual mining of text which identifies and extracts
subjective information in source material, and helping a business to
understand the social sentiment of their brand, product or service while
monitoring online conversations.
Creative use of advanced artificial intelligence techniques can be an
effective tool for doing in-depth research. Insights generated include:
• Key aspects of a brand’s product that customers care about.
• Users’ underlying intentions and reactions concerning those aspects.
“What was the response to our social media
campaign broken down by demographic?”
https://www.paralleldots.com/sentiment-analysis
Capability – Sentiment Analysis
Sentiment analysis is being used to help:
• improve service at a hotel chain by analysing
guest comments;
• customize incentives and services to address
what customers really want;
• determine how consumers really feel based on
opinions from social media.
Q1. Discuss how SMA capabilities building and capabilities deployment can
generate and preserve enduring value for the organisation
Q2.
Consider how sentiment analysis may be used at AirBnB.
Identify a business area where sentiment analysis can be applied.
Outline the data that need to be collected, the features in that data and how
these features can be used to explain the ebbs and flows of public opinion.
Discuss how these insights may be applied to create value.
* A use case is an example that highlights the use of an instrument or framework.
https://www.airbnb.com.au
Discover insight into customer behaviours, intentions, and preferences.
More than just gathering customer insights, insight mining is a dynamic,
continuous and iterative process that seeks to craft marketing messages
that shape and influence those insights.
Consumer
Understanding
New Ideas,
Innovation
Consumer
Insights
Impactful
Messaging
Capability – Insight Mining
A good insight is a deeply felt human perception that feels as relevant to
the brand as it does to us.
Good insights are difficult to extract. Good insights are discovered by
asking the right questions.
Questions fail if they are:
• Not challenging enough;
• Not thorough enough;
• Directed at the wrong audience.
Consider the Australia Talks National Survey which asked 54,000
Australians about their lives and what matters to them.
https://australiatalks.abc.net.au/
Capability – Insight Mining
Capability – Insight Mining
Capability – Insight Mining
Capability – Insight Mining
Role Play.
Imagine you’re Airbnb. Your goal is to increase bookings made through
your platform.
Based on the survey responses:
1. What follow up questions should you ask to ascertain Australian’s
optimism toward travel and tourism?
2. How can analytics help to create and adjust an impactful marketing
message that will spur positive sentiment and behaviour that benefits
Airbnb?
Trends…
‘Top heavy’
demography and
longevity has
socio-economic
implications.
Energy
diversification
create supply
opportunities and
risks.
Technology on
scalable efficiency
and productivity
gains
Climate change
on energy
consumption
Increasing
Trends
Decreasing
Trends
Driving Change Counter Trends
What are some of
the trends that are
driving change?
What are the
counter trends
being provoked as
the result of these
(initial) trends?
Capability – Trends, Issues, Stability
Emerging issues – what are the new ideas, issues and technologies that
are latent at present but could mature into powerful and transformational
drivers of change?
Capability – Trends, Issues, Stability
Stabilities – factors that slow down or prevent change.
Examples:
• Rules, customs and traditions.
• Physical and/or logistical constraints.
• Patterns of behaviour.
• Powerful stakeholders and incumbents.
The 5 facets of stability – STEEP:
• Social
• Technological
• Economic
• Environmental
• Political
SMA Capabilities (to monitor and measure change):
• Influence analysis – identify the key people or
communities that have made significant contributions to
a particular issue.
• Competitive analysis – track and monitor comments
about brands and products of competitors.
• Marketing initiative measurement – track and
monitor comments about particular brands and
products related to marketing campaigns.
• Crisis analysis – track and monitor comments that
contain negative sentiment about brands and products.
Q1. Discuss how SMA capabilities can help detect trends and counter trends.
Q2. How can SMA capabilities detect emerging issues well in advance of their
maturation?
Q3. How can SMA capabilities be deployed to gauge stability factors and
inform decision-makers as to how and when to take action?
What is the extent to which SMA capabilities contribute to the success of
individuals, groups and the organisation?
For digitally enabled businesses, benefits from SMA innovation include:
Financial
• Increased sales
• Decreased costs
• Improved fundraising
prospects
• Improved capital
utlisation
Perceptual
• Customer satisfaction
• Market efficiency
• Improved brand
awareness.
• Improved product –
customer value
alignment
Behavioural
• Integrated application
of BA insights
• Improved, data-
enhanced, decision-
making
• Better attract and
develop balanced
teams
SMA Benefits
• Marketing strategy improvement – create and refine
marketing strategies, initiatives and channels in order to
effectively deliver messages to targeted customers.
• Better customer engagement – provide two ways of
communication with targeted customers, based on their
values and preferred channels.
• Customer service improvement – provide timely and
appropriate responses to customer feedback.
• Better brand awareness and reputation
management – monitor and maintain brand and
product reputation in the market.
SMA Benefits
• Product development and improvement – innovate
products based on customer inputs.
• Social media metrics development – develop
instruments for measuring the effectiveness and ROI of
social media initiatives.
• Business process improvement – improve and
optimise decision- making processes, business
operations and optimise value chain.
• New business opportunities – ongoing monitoring of
emerging opportunities to generate more revenue and
income.
Q1. Briefly describe the three major categories of SMA benefits.
Q2. Pick two SMA benefits and briefly discuss the organisation motivation and
types of SMA capabilities that would enable those benefits.
Q3. Form two groups. In your group discuss how SMA capabilities can lead to
benefits. Use practical examples to supplement your views.
This is what we have covered…
Know Your
Audience
SMA & Business
Opportunities
Motivations &
Capabilities
Sentiment
Analysis
Insights
Mining
SMA Costs &
Benefits
✅
DONE!
✅
DONE!
✅
DONE!
✅
DONE!
✅
DONE!
✅
DONE!
Social Media Analytics – Risks & Opportunities
• SMA applications in competitive digital marketing.
• Use of influence as a market share and revenue leading
indicator
• Real time capabilities of social media monitoring.
• Applications of SMA in formulating crisis response measures.
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Motivation – Customer Insights
The End State…
Lecture Roadmap
A Growing Audience
The phenomenon
Business Impact
Business Impact
SMA leverages AI and ML
Intelligence & Learning
Traditional Programs vs. ML
Traditional Programs vs. ML
Traditional Analytics
Evolving Data Landscape
Creating SMA Capabilities
Business Impact Framework
Activity 1
Organisational Motivation
Motivation – Strategies & Initiatives
Motivation – Spur New Ideas
Motivation – Social Influencers
Motivation – Social Influencers
Motivations and Capabilties
Activity 2
Capability Building
Capability Deployment
SMA Capabilities – The Customer
Capability – Sentiment Analysis
Capability – Sentiment Analysis
Activity 3
Capability – Insight Mining
Capability – Insight Mining
Capability – Insight Mining
Capability – Insight Mining
Capability – Insight Mining
Capability – Insight Mining
Capability – Trends, Issues, Stability
Capability – Trends, Issues, Stability
Capability – Trends, Issues, Stability
Identify & Measure Change
Activity 4
SMA Benefits
SMA Benefits
SMA Benefits
Activity 5
Lecture Roadmap (Revisited)
Next Week
W5: DATA4500
Twitter-verse Sentiment
Analysis using Natural
Language Toolkit (NLTK)
PART 1
A case study look into how machines process and
understand human language using Twitterverse
data.
DATA4500 Roadmap
Week 1
A Brief History
Week 2
User Behaviour &
Monetisation
Week 3
Methods of
Analysis
Week 4
Commercial
Opportunities
Week 5
Sentiment
Analysis –
Part 1
Week 6
Assessment:
Week 7
Sentiment
Analysis –
Part 2
Week 8
Insights Mining
Part 1
Week 9
Insights Mining
Part 2
Week 10
Ethical
Considerations
Part 1
Week 11
Ethical
Considerations
Part 2
Week 12
Assessment:
Case Study 2
Lesson Learning Outcomes
1 Develop an understanding of the applications of
Natural Language Processing.
2 Examine the differences between structured and
unstructured data regimens.
3 Evaluate the steps involved in developing an NLP
sentiment classifier.
4 Analyse and discuss the metrics used to measure
model efficacy for the sentiment classification
algorithm.
5 Consider the implications of language analysis for
revealing hidden features and clusters.
Abraham Lincoln
Reformed theologian, ethicist,
commentator on politics and
public affairs
“Public sentiment is
everything. With public
sentiment nothing can
fail. Without it nothing
can succeed.”
Case Study 1
Company Profile
Australian based boutique tech consulting
firm founded in 1994 by Pippa Ingram.
55 employees and growing.
Website: www.ionplus.io
Focus Areas
• Accelerated commercialisation.
• Research & Development assistance.
• Cryptocurrency & blockchain.
• AI risk management.
• IT governance & compliance.
• Data security & privacy.
• Sustainability.
http://www.ionplus.io/
Tech consulting is the nexus between a high end work force and the broader
business community, providing subject matter expertise to innovators with
great ideas and connecting entrepreneurs with tech talent.
AGILE firmware development
for 5G infrastructure in
Singapore (Singtel)
Artificial Intelligence in next
generation service chatbots
and recommendation engines.
Blockchain technology for
smart contracts in exchange
settled products (EuroNEXT).
Building cybersecurity
resilience in mission critical
systems and supply chains.
Cloud computing and SAAS
innovations to rebuild cost
structures and minimize total
cost of ownership.
Leverage the Network of
Networks concept in platform
integration to extend influence.
Technology is empowering
and endangering our way of
life.
Those who fail to keep pace
with the rate of change will
be made irrelevant and
redundant.
Miniaturised chipsets for cryptocurrency
mining.
This chip is one tenth the surface area of a
fingertip.
It can be implemented at scale to mine for
Bitcoin with an efficient energy profile.
Amazon Inc. is a US tech conglomerate focusing on cloud
computing, e-commerce, digital streaming, and artificial
intelligence.
Deep learning algorithms implemented through
artificial neural networks provide real time feedback on:
• Flows across in Just-in-Time logistics networks.
• The totality of customer reviews on every product
and service offered by Amazon.
• Competitor intelligence and market sentiment.
• Fraud detection & transactional forensics.
Cloud ComputingArtificial Intelligence
Amazon Inc is one of a handful of companies that have
benefited from the COVID-19 pandemic.
• Direct sales – strong growth.
• Significantly increased demand for web service
products.
Despite the obvious good news, some analysts have detected
potential negative macro headwinds:
• Elevated delays and errors in delivers.
• Discontent amongst warehouse and administrative staff who
feel overworked and underpaid.
Crisis & Opportunity Monitoring
How a company responds during times of crisis can have far reaching
consequences…
Amazon has engaged Ion Plus to help them develop an omni-channel social
sentiment analyser.
This tool will be used to provide Amazon with real time data on public
sentiment toward the company.
Your approach is to develop a prototype sentiment analyser for Twitter using
relevant tweet handles and hashtags to evaluate public sentiment.
STEP ONE
Scrape AMZN related tweets.
STEP TWO
Analyse language patterns within tweets for sentiment clues.
STEP THREE
Develop summary statistics on Amazon tweets.
Q1.
Discuss the differences in communication style and format between:
• Social media (comments, posts, personal blogs)
• Professional writing (emails, letters, articles)
Q2.
Amazon has an AI language tool that’s currently used to review formal
documents such as business plans, internal memos and outgoing company
mail.
Discuss whether it is possible to modify Amazon’s AI language tool so that it
can be used to analyse informal communications such as those over social
media.
The following high level roadmap outlines the seven development stages for
building and testing the Twitter sentiment analyser prototype…
ONE
Get the tweets
TWO
Break up tweets
THREE
Sort words
FOUR
Reduce noise
FIVE
Word frequency
SIX
Build model
SEVEN
Visualise!
FACT: A vast amount of data that is generated today is unstructured.
FACT: Increasingly, gaining competitive advantage requires generating insights
from unstructured data sources.
Examples of unstructured data: news articles, social media posts, search
history, chat logs and audio / visual media.
NOTE:
Unstructured data
still has some
structure to it.
The ability to store
and process data
relies on there
being some form of
underlying
structure.
Businesses like Amazon must manage structured and unstructured data to
scale up…
The process of analysing natural languages and deriving sense, context and
meaning of it falls under the field of Natural Language Processing (NLP).
There are FIVE key segments in the application of NLP.
Machine Translation
Algorithmic translation
between natural languages.
Q&A Chatbots
Automation of FAQs and
generic queries.
Info. Retrieval
Search queries and search
engine optimisation.
Info. Extraction
Identification of key markers
and references across content.
Sentiment Analysis
Algorithmic translation
between natural languages.
Applications
NLP is not a new concept. it has been widely implemented across multiple
application with early examples in word process and, later, in predictive
algorithms such as auto-correct.
Autocomplete helps users with search query and narrative
suggestions.
Google search’s predictive typing helps users through next
word’ recommendations.
Spell checker in your email application saves users from
typing errors (mixed results here).
Spam detection in your mail box separates spam and
phishing mails from regular mail.
Q1.
What are the differences between structured and unstructured data?
Q2. What is the reason behind unstructured data analysis being able to deliver
more competitive advantage?
Q3. What are the major applications of Natural Language Processing for an
organisation like Amazon?
To have a conversation about NLP, we need to have a basic understanding of
the lingo used in the industry…
Tokenisation — Breaking up sentences into individual words.
Corpus / Corpora — A (usually very large) collection of text documents.
Stemming — Extracting the ‘stem’ of a term by removing modifiers.
Bag of Words — A list of words (usually a VERY long list) and their
frequency of occurrence.
Stop Words — Joining words that don’t have meaning on their own.
Word Boundaries — Identifying the start and stop of sentences in audio and
visual recordings.
NLP Terminologies
To have a conversation about NLP, we need to have a basic understanding of
the lingo used in the industry…
tf-idf — Short for ‘term frequency-inverse document frequency’. A statistic
used to measure the RELEVANCE of a word.
Term Frequency — How often a word appears in a document or a corpus.
Inverse Document Frequency — A measure of the importance of a word.
Disambiguation — Resolving the meaning of a word that has multiple
meanings.
Topic Model — A statistical representation of abstract topics.
Before we move on, let’s do a quick review. Match the following terms with
their definitons…
word
boundaries
stop words
topic model
disambiguation
tf-idf
term
frequency
corpus
bag of words
stemmingtokenisation
break up sentence
into words
collection of text
documents
remove word
modifiers
list of words and
their frequencies
joining words –
don’t make sense
on their own
features identifying
sentence stop and start
a statistic that
measures word
RELEVANCE
a word’s frequency
of appearance
inverse
document
frequency
a statistic that
measures word
IMPORTANCE
resolve multiple
meanings in a word
statistical
representation of
abstract topics
Twitter API is a software tool provided by Twitter for developers to automate
the collection of twitter feed data … Standard API is free.
Step 1: Twitter API
Data requested from Twitter’s public sources are stored in a data frame.
Twitter has premium APIs that give businesses access to real time tweets.
Is this structured
or unstructured
data?
Q1. Stemming removes language modifiers.
E.g. Handling Handle , Verified Verify , Steadily Steady
Why is this necessary?
Q2. Word boundaries.
The chart above refers to an audio recording. The height (amplitude) of the
signal is proportional to the sound level (measured in Decibels).
How can we use this signal sequence to identify word boundaries?
Machines cannot process raw text – some pre-processing is required.
Tokenisation is the process of splitting sentence strings into individual words
called ‘tokens’.
{{ Tokenisation },{ is },{ the },{ process },{ of },{ splitting },{ sentence },
{ strings },{ into },{ individual },{ words },{ called },{ ‘ },{ tokens },{ ’ },{ . }}
A token is a sequence of characters in text that serves as a unit.
A token is a sequence of characters in text that serves as a unit.
Words have different forms—for instance, “ran”, “runs”, and “running” are
various forms of the same verb, “run”.
Depending on the requirement of your analysis, all of these versions may need
to be converted to the same form, “run”. Normalization or Lemma in NLP is the
process of converting a word to its canonical form.
v v
[(‘#FollowFriday’, ‘JJ’),
(‘@France_Inte’, ‘NNP’),
(‘@PKuchly57’, ‘NNP’),
(‘@Milipol_Paris’, ‘NNP’),
(‘for’, ‘IN’),
(‘being’, ‘VBG’),
(‘top’, ‘JJ’),
(‘engaged’, ‘VBN’),
(‘members’, ‘NNS’),
(‘in’, ‘IN’),
(‘my’, ‘PRP
JJ = Adjective
NNP = Proper Noun, Singular
IN = Preposition or subordinating conjunction
VBG = Verb, gerund or present participle
VBN = Verb, past participle
NNS = Noun, plural
PRP$ = Possessive Pronoun
NN = Noun, singular or mass
DT = Determiner
The percent of these language components in text can indicate the type of
communication taking place.
Corpus data (evidence from large collections of different document types) can
be used to provide statistical insights.
Corpus evidence used in the ‘Longman Grammar of Spoken and Written
English’ provides approximate frequencies of thousands of words per million.
This data can be found in the file WK7_EnglishCompositionText.xlsx.
Conversational
Constructs Formal Constructs
Lemmatisation – why do we care?
Data file WK5_EnglishCompositionText.xlsx contains the term frequencies of
English language constructs split across 3 dimensions…
We would expect conversational English to have different term frequencies
than formal / academic English.
WORDS
LEXICAL FUNCTIONAL
CONVERSATION ACADEMIC
Lemmatisation – why do we care?
Insert a new Pivot Table and select range ‘A1:D25’.
Place the Pivot Table in a new worksheet.
Lemmatisation – why do we care?
Place Level 1 in the Filter category.
Place Level 2 in the Columns category.
Place Level 3 in the Rows category.
Place Frequency in the Values category (set calculation to ‘Sum’).
Lemmatisation – why do we care?
Filter for Level 1 should select ‘All’ by default.
Highlight range ‘A4:C16’ and insert a 2D column chart.
By setting the Level 1 filter to ‘All’, data across functional and lexical words are
combined.
Across functional
and lexical
terms, how
would we
determine
whether a text is
academic or
conversational?
By setting the Level 1 filter to ‘Function Words’, we are able to focus on
practical English datasets.
Examining the
functional terms
subset only, how
would we
determine
whether a text is
academic or
conversational?
By setting the Level 1 filter to ‘Lexical Words’, we are able to focus on
definitional English datasets.
Examining the
lexical terms
subset only, how
would we
determine
whether a text is
academic or
conversational?
Noise is any part of the text that does not add meaning or information to data.
Noise is specific to each project, so what constitutes noise in one project may
not be in a different project. For instance, the most common words in a
language are called stop words.
Some examples of stop words are “is”, “the”, and “a”. They are generally
irrelevant when processing language, unless a specific use case warrants their
inclusion.
Step 4: Noise Reduction
We will need to remove the following artefacts from our tweets:
• Hyperlinks – All hyperlinks in Twitter are converted to the URL shortener
t.co. Therefore, keeping them in the text processing would not add any
value to the analysis.
• Twitter handles in replies – These Twitter usernames are preceded by a @
symbol, which does not convey any meaning.
• Punctuation and special characters – While these often provide context to
textual data, this context is often difficult to process. For simplicity, you will
remove all punctuation and special characters from tweets.
I added a video to a @YouTube playlist http://t.co/HVVPhSYakA. I’m back
on twitch and today it’s going to be league 🙂 – 1 / 3
The most basic form of analysis on textual data is to calculate the word
frequency. Sentiment is established by calculating the positive : negative ratio.
A single tweet is too small of an entity to find out the distribution of words,
hence, the analysis of the frequency of words would be done on all tweets
within a category [positive, negative and neutral].
Q1. What is the difference between tokenisation and lemmatisation?
Q2. What type of content is removed in noise reduction?
Q3. How can we use word density to gauge sentiment?
Q4.
1. Find a tweet.
2. Apply noise reduction to remove no-context artefacts.
3. What does the lemmatised string look like?
Case Study 1!
We continue with Sentiment Analysis in week 7.
- Slide Number 1
- Slide Number 2
- Slide Number 3
A finger on the pulse…
Case Study 1
Project Categories
Ready or Not…
The Client
Crisis & Opportunity Monitoring
Crisis & Opportunity Monitoring
Monitoring Sentiment
Activity 1
Prototype Roadmap
A Structureless World
Guiding Principles
Natural Language Processing (NLP)
Applications
Applications
Activity 2
NLP Terminologies
NLP Terminologies
Terms & Definitions
Step 1: Twitter API
Step 1: Twitter API
Activity 3
Step 2: Data Tokenisation
Step 3: Normalisation
Lemmatised Tweet Example
Lemmatisation – why do we care?
Lemmatisation – why do we care?
Lemmatisation – why do we care?
Lemmatisation – why do we care?
Lemmatisation – why do we care?
Functional & Lexical Combined
Functional Only
Lexical Only
Step 4: Noise Reduction
Step 4: Noise Reduction
Step 5: Word Density
Activity 4
Next Week
),
(‘community’, ‘NN’),
(‘this’, ‘DT’),
(‘week’, ‘NN’),
(‘:)’, ‘NN’)]
JJ = Adjective
NNP = Proper Noun, Singular
IN = Preposition or subordinating conjunction
VBG = Verb, gerund or present participle
VBN = Verb, past participle
NNS = Noun, plural
PRP$ = Possessive Pronoun
NN = Noun, singular or mass
DT = Determiner
Lemmatisation – why do we care?
The percent of these language components in text can indicate the type of
communication taking place.
Corpus data (evidence from large collections of different document types) can
be used to provide statistical insights.
Corpus evidence used in the ‘Longman Grammar of Spoken and Written
English’ provides approximate frequencies of thousands of words per million.
This data can be found in the file WK7_EnglishCompositionText.xlsx.
Conversational
Constructs Formal Constructs
Lemmatisation – why do we care?
Data file WK5_EnglishCompositionText.xlsx contains the term frequencies of
English language constructs split across 3 dimensions…
We would expect conversational English to have different term frequencies
than formal / academic English.
WORDS
LEXICAL FUNCTIONAL
CONVERSATION ACADEMIC
Lemmatisation – why do we care?
Insert a new Pivot Table and select range ‘A1:D25’.
Place the Pivot Table in a new worksheet.
Lemmatisation – why do we care?
Place Level 1 in the Filter category.
Place Level 2 in the Columns category.
Place Level 3 in the Rows category.
Place Frequency in the Values category (set calculation to ‘Sum’).
Lemmatisation – why do we care?
Filter for Level 1 should select ‘All’ by default.
Highlight range ‘A4:C16’ and insert a 2D column chart.
Functional & Lexical Combined
By setting the Level 1 filter to ‘All’, data across functional and lexical words are
combined.
Across functional
and lexical
terms, how
would we
determine
whether a text is
academic or
conversational?
Functional Only
By setting the Level 1 filter to ‘Function Words’, we are able to focus on
practical English datasets.
Examining the
functional terms
subset only, how
would we
determine
whether a text is
academic or
conversational?
Lexical Only
By setting the Level 1 filter to ‘Lexical Words’, we are able to focus on
definitional English datasets.
Examining the
lexical terms
subset only, how
would we
determine
whether a text is
academic or
conversational?
Step 4: Noise Reduction
Noise is any part of the text that does not add meaning or information to data.
Noise is specific to each project, so what constitutes noise in one project may
not be in a different project. For instance, the most common words in a
language are called stop words.
Some examples of stop words are “is”, “the”, and “a”. They are generally
irrelevant when processing language, unless a specific use case warrants their
inclusion.
Step 4: Noise Reduction
We will need to remove the following artefacts from our tweets:
• Hyperlinks – All hyperlinks in Twitter are converted to the URL shortener
t.co. Therefore, keeping them in the text processing would not add any
value to the analysis.
• Twitter handles in replies – These Twitter usernames are preceded by a @
symbol, which does not convey any meaning.
• Punctuation and special characters – While these often provide context to
textual data, this context is often difficult to process. For simplicity, you will
remove all punctuation and special characters from tweets.
I added a video to a @YouTube playlist http://t.co/HVVPhSYakA. I’m back
on twitch and today it’s going to be league 🙂 – 1 / 3
Step 5: Word Density
The most basic form of analysis on textual data is to calculate the word
frequency. Sentiment is established by calculating the positive : negative ratio.
A single tweet is too small of an entity to find out the distribution of words,
hence, the analysis of the frequency of words would be done on all tweets
within a category [positive, negative and neutral].
Activity 4
Q1. What is the difference between tokenisation and lemmatisation?
Q2. What type of content is removed in noise reduction?
Q3. How can we use word density to gauge sentiment?
Q4.
1. Find a tweet.
2. Apply noise reduction to remove no-context artefacts.
3. What does the lemmatised string look like?
Next Week
Case Study 1!
We continue with Sentiment Analysis in week 7.
Slide Number 1
Slide Number 2
Slide Number 3
A finger on the pulse…
Case Study 1
Project Categories
Ready or Not…
The Client
Crisis & Opportunity Monitoring
Crisis & Opportunity Monitoring
Monitoring Sentiment
Activity 1
Prototype Roadmap
A Structureless World
Guiding Principles
Natural Language Processing (NLP)
Applications
Applications
Activity 2
NLP Terminologies
NLP Terminologies
Terms & Definitions
Step 1: Twitter API
Step 1: Twitter API
Activity 3
Step 2: Data Tokenisation
Step 3: Normalisation
Lemmatised Tweet Example
Lemmatisation – why do we care?
Lemmatisation – why do we care?
Lemmatisation – why do we care?
Lemmatisation – why do we care?
Lemmatisation – why do we care?
Functional & Lexical Combined
Functional Only
Lexical Only
Step 4: Noise Reduction
Step 4: Noise Reduction
Step 5: Word Density
Activity 4
Next Week