I require someone to create an automated lipreading software for me.
The software should analyse the input from a connected webcam, then, when there is a face in front of the webcam, output the mouth shape to a txt file.
I have listed the 34 mouth positions the software will need to detect in the attached file "[login to view URL]".
There should be a txt file named [login to view URL] as part of the software. It should write the currently detected mouth position to it (such as 'a','ee', 'st' etc). There should only be one mouth position written to this txt file at any time. It should update every 0.2 seconds, replacing the mouth position formerly written there.
Obviously, I am aware that there will not be a flawless system which reads every mouth shape correctly all of the time as current technology isn't quite there yet, but it should have a reasonable amount of accuracy.
Extra notes :
There should also be a txt file named face.txt. When there is a face detected by the webcam, the software should write a 1 to it. When there is no face detected, it should write a 0.
The face the software must detect will always be in the approximate centre of the webcam's view. The software should be made in a way where, if a second face appears on the webcam away from the centre, it will not interfere with detecting and analysing the first face.
The purpose of this software is to work passively while other softwares check the txt files. It must therefore be one which works automatically, without anyone needing to press buttons while it is in use. I should be able to just start it up, then not need to press anything again until I need to close the program (however, if it’s easier, a start and stop button within the software is permitted, as long as it’s not something which must be pressed frequently while the software is working).
Finally, it is worth mentioning that I will be using this software with manycam, so it will need to be compatible with it.
This software can be written in any programming language. I will need the source code in case any alterations needs to be made in the future.
The software must be able to run without the need of additional software, such as matlab.
The software will run on Windows 7.
Hi Greg, this is kind of problem that would be rather interesting to work on. Anyway I did an initial research and I can handle face detection, together with mouth area recognition. From there it would require building a list of possible shapes and mapping those shapes to phonemes. However, handling in realtime person's lips movement and converting to words as he's talking is hard problem, especially if you want to handle normal talk (not over pronouncing words) and to catch all words.
Also can you provide a list of faces pronouncing (for building phoneme-shape mapping)?
Hi,
i have good knowledge of MATLAB and simulink. I have been using MATLAB for last 6 years and have experience over different toolboxes of MATLAB like:-
-Communication Toolbox
-Filter Design Toolbox
-Embedded MATLAB Toolbox
-Image Acquisition TOolbox
-Image Processing Toolbox
-Signal Processing Toolbox
-Computer Vision Toolbox
I would also like to mention that i have implemented face recognition system using webcam in MATLAB, where an image database is developed and later an image is acquired using webcam which is compared with the images in database and based on comparison, a person can be identified. The core concept utilized in this algorithm is Eigenvectors and Eigenvalues of the images.
I have also implemented Text Extraction from images for Car Plate Number Recognition using MATLAB. Which is actually Optical Character Recognition. I have also implemented the recognition of type, number and color of a playing card using MATLAB.
All the functionalities you have mentioned can be implemented. I can make this application using MATLAB and can provide you the executables so that you can install it in your computer and it will not need MATLAB.
Can you please provide me couple of sample videos showing the location of face and background. I can forsee that this application would require a training data to detect the pronounced word. Will you provide me the sample videos for all 34 phenems.
Looking forward to discuss in detail with you.
Regards