the assignment instructions This is the lecture video of my professor where he explains the assignment in detail Have you done assignment before in software development program
1
COMP 10261: The FAQ Bot Plus Project
Sam Scott, Mohawk College, January 2022
OVERVIEW
An FAQ Bot answers questions about a particular topic. It is a
conversational interface to a stock set of questions and answers.
When an FAQ Bot receives an utterance, it determines the
user’s intent by matching that utterance to one of its stored
question and answer pairs. If it succeeds in determining intent in
this way, it uses the answer as its response. In the example on
the right (from Vajjala et al.’s Practical Natural Language
Processing) the FAQ Bot has determined that the first two
utterances have the same intent and has responded with the
same text in both cases.
If an FAQ Bot fails to determine intent, it usually outputs a
standard message to let the user know that it does not know the
answer. But your FAQ Bot Plus will use linguistic knowledge
from spaCy to get a bit chattier in this case.
This handout brings together all the project requirements for
the final project submission.
PHASE 1: FAQ BOT
In this phase, the goal is to update your Phase 0 FAQ Bot using fuzzy regular expressions to determine a
user’s intent.
1. From Phase 0 (Should already be complete). Determine your FAQ Bot’s knowledge domain and
prepare a set of 20 question and answer pairs. One easy way to do this is to find a long
Wikipedia page and copy sections of 1 to 3 sentences as each answer and generate a question
to go with each answer. Make sure you reference all online sources in comments.
2. Generalize by generating at least one more possible question for each answer. Ideally, the new
question should have a different wording, representing another way a user might ask for the
information in the answer.
3. Create a fuzzy regular expression for each answer that is capable of matching key parts of both
possible questions and is tolerant to a limited number of typos in each question.
4. Store questions, answers, and regular expressions in text files.
5. Create a Python program (or modify your Phase 0 FAQ Bot) to load the answers and regular
expressions from files, then allow the user to make utterances. Try to find the best match for
the user’s utterance from your list of regular expressions and output the corresponding answer
Are there limits to the size of
dataset I can use for training?
Amazon Machine Learning can
train models on datasets up to
100GB in size.
What is the maximum size of
training dataset?
Amazon Machine Learning can
train models on datasets up to
100GB in size.
What algorithm does Amazon
Machine Learning use to
generate models?
Amazon Machine Learning
currently uses an industry
standard logistic regression
algorithm to generate models.
2
as a response. When there are multiple matches, you should have some strategy for
determining which match is better.
6. The bot should also respond to “hello” by greeting the user, and “goodbye” or “quit” by ending
the program. If it fails to match an utterance, the bot should politely let the user know that it
didn’t recognize their question.
Test your bot as much as possible. Use the original question, the alternate wordings, and any other
wordings you can think of. If possible, give the bot to a friend or family member to play with and see
how well it works for them. Tweak your regular expressions as necessary to get the best possible
performance.
PHASE 2: FAQ BOT PLUS
In this phase, the goal is to make the FAQ Bot a bit chattier or human-like using linguistic knowledge
from the spaCy module. It should still answer the user’s questions as before, but if it fails to figure out a
user’s intent, it should employ a range of strategies to try craft an appropriate response. This part of the
project is open-ended and creative, but you must make use of the spaCy pattern matcher with parts of
speech and/or lemmas in at least one part of
your bot.
NAMED ENTITY RECOGNITION AND NOUN CHUNKS
When the bot don’t know what the user is talking about, Named Entity Recognition or even Noun
Chunks could help implement a fallback strategy. Here are some examples:
Utterance: Does the college have a relationship with Twitter?
(SpaCy reports that Twitter is an organization – label ORG)
Response: Sorry I don’t know. I don’t work for Twitter.
Utterance: Does Chicago have any colleges?
(spaCy reports that Chicago is a geo-political entity – label GPE)
Response: Sorry, I don’t know. I’ve never been to Chicago.
Utterance: Where is the general store located?
(spaCy finds the noun chunk “the general store”)
Response: Sorry, I don’t know anything about the general store.
SPEECH ACT CLASSIFICATION
To make the bot seem chattier or more human-like when it fails to match a user intent, you could
attempt to classify the speech act of the utterance. You can think of a speech act as a very high-level
intent that indicates what kind of action is the user trying to accomplish with their utterance. For
example, they could be asking a question, making a command, promising something, agreeing or
disagreeing with the bot, greeting the bot, etc. You might be able to figure this out by developing some
linguistic patterns in spaCy.
If the bot cannot determine the user’s intent using fuzzy regular expressions, it would at least be useful
to figure out if they are asking a question, trying to give you a command, or simply making a statement.
3
You could respond to questions with “Sorry, I don’t know the answer to that.” Or even “Sorry, I don’t
know about ___” if you can identify some noun phrase that represents what the user is asking about.
Commands could be responded to differently. “Sorry, I don’t know how to do that.” Or if you can figure
out what they want the bot to do, you could say “Sorry, I don’t know how to ___”.
EXAMPLE QUESTIONS
To get you started, here’s a list of questions – see any patterns here?
Do you know anything about Jujitsu?
What is the capital of Albania?
How did you know that?
Where is my phone?
Why won’t you answer my questions?!?!?!
You’re what kind of bot, now?
Do I really have time for this…
(Note: The question marks are obviously a useful clue about whether something is a question or not, but
users will not always type them, and speech recognition systems might not include them when they
transcribe voice to text. Make sure you create patterns that will still work when there is no
punctuation.)
EXAMPLE COMMANDS
And here’s a list of commands…
Give me info about Jujitsu.
Tell me something interesting.
Don’t say “I don’t know” again.
Go get me some useful information.
Make me a cup of coffee.
Drive me to the airport, please.
OTHER IDEAS
What other things do you think a user might say to your bot? Can you use spaCy patterns to identify
more things you could respond to, or even plant some fun easter eggs for the user to find by saying
something that fits the right pattern? Feel free to implement any other ideas you may have on how to
make the bot chattier using linguistic knowledge. Have fun with it.
PHASE 3: DISCORD
Once the bot is working well in the Python shell, you should repackage it as a Discord bot and include a
link to add the bot to a server. If you want to host your Discord bot on CSUNIX or some other server, go
for it, but it’s not necessary as long as you hand in the code so that the instructor can run it themselves.
4
HANDING IN
You should place all the following into a single project folder, then zip it up and hand it in on Canvas.
1. A folder containing all the code and supporting files for your bot. It should be possible to run the
bot (both Discord and standalone) from this folder using Anaconda Python 3 with spaCy and the
English language models installed.
2. A text file called “phase 1.txt” containing the questions and answers that you used when
developing the FAQ Bot. There should be two questions for each answer, and it should be clear
which answer goes with which questions. I will use the questions in this file when I’m testing
your bot.
3. A text file called “phase 2.txt”. This file should contain any special instructions needed to get the
most out of the “chattier” aspects of your bot. How should we test your bot to see all the cool
stuff you included? Describe what kinds of utterances your bot can respond to and give us some
sample utterances that show your bot behaving at its chatty best.
4. A test file called “phase 3.txt”. This file should contain the link to the discord version of your bot
along with any special instructions required to talk to it (prefixes, etc.), or any other special
features you want to show off that are unique to this version of the bot.
5
EVALUATION
Your project will be marked out of 20 using the following Rubric.
Category Level 4: 100% Level 3: 75% Level 2: 50% Level 1: 25%
Phase 1: FAQ
Bot (4 points)
Uses regex efficiently
and effectively to
answer all questions
identified by the
developer. Use fuzzy
regex efficiently to
tolerate of a small
number of typos.
Uses regex to
answer most
questions
correctly. Offers
useful responses
to novel
questions some of
the time. Uses
fuzzy regex to
tolerate of a small
number of typos.
Uses regex and/or
fuzzy regex to
answer some
questions
correctly.
Correctly
answers some
questions.
Phase 2: FAQ
Bot Plus
(4 points)
Uses linguistic pattern
matching and other
linguistic knowledge
to respond
appropriately when
user intent is
unknown. Exhibits a
range of responses
and echo back
phrases from the
utterance in some
cases.
Uses linguistic
pattern matching
or other linguistic
knowledge to
respond
appropriately
when user intent
is unknown.
Exhibits a range of
such responses.
Uses linguistic
pattern matching
or other linguistic
knowledge to
respond
appropriately
sometimes when
user intent is
unknown. Exhibits
some range of
such responses.
Responds
appropriately
sometimes
when user
intent is
unknown.
Exhibits a
limited range of
such responses.
Phase 3:
Discord
(2 points)
Bot can be added to a
discord server and
functions as well as
the Python shell
version.
Bot can be added
to a discord
server and
functions almost
as well as the
Python shell
version.
Bot can be added
to a discord
server and
responds to
utterances.
Bot can be
added to a
discord server.
Code Structure
(6 points)
Highly effective and
efficient use of regex,
fuzzy regex, and spaCy
pattern matching.
Uses highly modular
and well-structured
code. Discord and
shell versions of the
bot are identical other
than the interface
code.
Effective use of
regex, fuzzy
regex, and spaCy
pattern matching
and/or mostly
modular code,
shared between
the two bot
versions.
Uses regex, fuzzy
regex, and spaCy
pattern matching
and/or somewhat
modular code.
Limited use of
regex, fuzzy
regex, and
spaCy pattern
matching
and/or limited
modular
structure.
6
Category Level 4: 100% Level 3: 75% Level 2: 50% Level 1: 25%
External
Documentation
(2 points)
Phase 1, 2, and 3 text
files are present and
complete. Instructions
and test cases are
complete enough to
coax the best possible
behavior from the
bot.
Some of phase 1,
2, and 3 text files
are present
and/or the
instructions and
test cases are
somewhat
complete.
Internal
Documentation
(2 points)
Commenting and
naming conventions
are consistent the
course standards
(based on the PEP-8
and PEP-257). All files
contain a docstring
with a description,
author information,
and links to original
sources. All functions
contain a docstring
with description of
behavior, parameters,
and return values.
Commenting and
naming
conventions are
somewhat
consistent with
course standards
and/or docstrings
are missing or
incomplete for
some files.