A. What I am trying to implement.
A web application allowing real-time speech recognition inside web browser (like this).
B. Technologies I am currently thinking of using to achieve A.
- JavaScript
- Node.js
- WebRTC
- Microsoft Speech API or Pocketsphinx.js or something else (cannot use Web Speech API)
C. Very basic workflow
- Web browser establishes connection to Node server (server acts as a signaling server and also serves static files)
- Web browser acquires audio stream using getUserMedia() and sends user's voice to Node server
- Node server passes audio stream being received to speech recognition engine for analysis
- Speech recognition engine returns result to Node server
- Node server sends text result back to initiating web browser
- (Node server performs step 1 to 5 to process requests from other browsers)
D. Questions
- Would Node.js be suitable to achieve C?
- How could I pass received audio streams from my Node server to a speech recognition engine running separately from the server?
- Could my speech recognition engine be running as another Node application (if I use Pocketsphinx)? So my Node server communicates to my Node speech recognition server.