PROJECT
DESCRIPTION
You need to code a program in Java that takes an HTML document file in the English language as input and produce a text document as output. The output document contains only the text information that a web browser display by reading the file, without the presentation details the markup language specified.
· Your program needs to Extract Titles, section headers, paragraphs, and tables.
· The output of the program would be recorded in a text file containing the text without any format.
· Tables rows would be rendered like a sequence of values separated by commas for each row in the table.
HTML in Browser |
Matador Song
Fight, Matadors, for Tech! Songs of love we’ll sing to thee, Bear our banners far and wide.
Ever to be our pride, Fearless champions ever be.
Stand on heights of victory.
Strive for honor evermore.
Long live the Matadors!
Music by Harry Lemaire, words by R.C. Marshall
HTML in Browser |
Text file contains |
Our Customers
Company, Contact, Country
Alfreds Futterkiste, Maria Anders, Germany
Centro comercial Moctezuma, Francisco Chang, Mexico
Ernst Handel, Roland Mendel, Austria
Island Trading, Helen Bennett, UK
Laughing Bacchus Winecellars, Yoshi Tannamuri, Canada Magazzini Alimentari Riuniti, Giovanni Rovelli, Italy
RESTRICTIONS
To solve this project you can use arrays, files, Strings and any numbers methods you need. You can use one data structure, like linked list. You cannot use prebuild or downloaded classes that remove markup language or use regular expressions. The purpose of this project is for you get familiar with coding and using Java. You are not required to use multiple classes; future projects will provide practice for that, once we cover more material and concepts.
SUBMISSION
Your solution should include the following documentation, missing elements would result in missing points.
· Each file needs to include a header containing the following.
· Course, Section and Semester
· Your Name
· The title “Project 1”
· Methods / Functions should include a description of its purpose.
· Explaining input (parameters) and outputs (returns)