For our final undergrad year project, me and my team wanted to build a software that would simplify creating webpages for not-so-technical users. So we decided to harness the usability of natural language interface for a user to easily communicate with the computer and create webpages without needing much technical expertise.
Scoping the project was really essential, as the number of tags that even a simple HTML webpage could encompass is significantly high. To add it to it, we wanted to incorporate CSS styling as well. So, to limit the number of tags to a feasible number and planning for the project over the course 8 months was a challenge. Though our main aim was code automation, we knew that working with voice recognition wasn't going to be easy because of the accuracy issues that would eventually crop up. Also, natural language processing was a first for my entire team hence, that was another aspect we had to be wary of.
With the long project duration in mind, we started early with our research particularly regarding implementing accurate voice recognition and natural language processing.
Researching all the available techniques to carry out the various stages of the project, we chalked out how the system should behave. Following is the gist of the proposed system:
We studied a few voice recognition techniques and started with the Sphinx4 library for voice recognition in Java. But soon realized that offline methods are hard for achieving decent levels of accuracy and getting them to learn different accents. Thus, to focus on our code automation part more we then diverted to the Google Speech API for voice to text recognition. It not only had better accuracy rates, but also made the whole text processing part a lot easier to manage. The following are the implementation details: