n Sagar Vikmani | Work | Voice Driven Dynamic Generation of Webpages

Voice Driven Dynamic Generation of Webpages

Problem Description

For our final undergrad year project, me and my team wanted to build a software that would simplify creating webpages for not-so-technical users. So we decided to harness the usability of natural language interface for a user to easily communicate with the computer and create webpages without needing much technical expertise.

Challenges and constraints

Scoping the project was really essential, as the number of tags that even a simple HTML webpage could encompass is significantly high. To add it to it, we wanted to incorporate CSS styling as well. So, to limit the number of tags to a feasible number and planning for the project over the course 8 months was a challenge. Though our main aim was code automation, we knew that working with voice recognition wasn't going to be easy because of the accuracy issues that would eventually crop up. Also, natural language processing was a first for my entire team hence, that was another aspect we had to be wary of.

With the long project duration in mind, we started early with our research particularly regarding implementing accurate voice recognition and natural language processing.

Project Description

Type: Group project
Role: Software Developer
Course: Final year B.E. project
Duration: 8 months
Tools used: Java, JavaFx, Google Speech API

Proposed Solution

Researching all the available techniques to carry out the various stages of the project, we chalked out how the system should behave. Following is the gist of the proposed system:

  • Text conversion: The speech of the user captured by the system should simultaneously be converted to text for faster and better processing.
  • Text pre-processing: The text converted from speech should be then pre-processed to remove the unwanted redundant phrases. The dependencies would be established between words to better represent the knowledge from the natural language texts.
  • Knowledge Extraction: This step would extract the important data from the text. This should be the basic understanding module.
  • Code generation: This step would use the extracted information as an input. The extracted knowledge should be stored in a file. The data in this file should be then used to generate HTML/CSS codes.
Here is the proposed Use case and Activity flow diagrams from the beginning phase of the project:

Final System

We studied a few voice recognition techniques and started with the Sphinx4 library for voice recognition in Java. But soon realized that offline methods are hard for achieving decent levels of accuracy and getting them to learn different accents. Thus, to focus on our code automation part more we then diverted to the Google Speech API for voice to text recognition. It not only had better accuracy rates, but also made the whole text processing part a lot easier to manage. The following are the implementation details:

  • Speech Acquisition: The user speech are considered as commands for the system. As the user starts speaking, the speech is simultaneously acquired via a microphone.
  • Text conversion: The speech of the user captured by the system is simultaneously converted to text for faster and better processing. The system uses an online Google Speech API. The spoken commands are recorded into a sound (.wav) file which is sent over the Internet to Google, who responds with the corresponding text conversion for the speech file. Also, the text converted phrase is displayed to the user to ensure accurate conversion, in case of anomalies the user can either edit the text or speak again.
  • Text processing: The text converted from speech is then pre-processed by a tagger which has been incorporated by a specifically designed word tagging system. In here, each and every word in the given phrase is associated with a tag similar to a ‘Parts-of-speech’ tag, which is fetched from a predefined dictionary, built for this very purpose.
  • Knowledge Extraction: In this step, the system extracts the important data from the text. This is the basic understanding module where the system interprets the requirements of the user from the processed text. The input given to this stage will be the output generated from the previous text pre-processing stage. After receiving this input, this phase is responsible for searching for specific tags. Searching for specific tags in the given command enables our program to extract knowledge efficiently. For e.g. Verb specifies what action to perform, Noun specifies what component to draw, etc. which enables the identification of the necessary action to perform.
  • Code generation: This step uses the extracted information as an input. The extracted knowledge is stored in several data structures. The data in this data structures is then used to generate HTML/CSS codes. Once the user commands, the generated code is then saved with an ‘.html’ extension with the styling implemented as inline/internal CSS. The user has the freedom to see the code and make manual changes to the code.
The final system architecture and snapshots of the running system :

What I learned

  • Developing interfaces with JavaFx
  • Natural Language Processing techniques
  • Strategizing long-term projects