And just a thought:
1. ⁠Verification: A preview feature to look at the first few dozen lines of the transcript would be nice for the user to verify the transcription works as expected.
2. ⁠Index and search: A user can search within the video where a particular word or sentence appears to locate the segment in the video the user is interested in.
Use case:
I’m interested in adding a Youtube video into knowledge and interact with an agent to have in-depth discussion on the subjects further. I want the agent to cite the segment in the video using the transcription where a particular segment is mentioned.
The application of this feature would be integration of external knowledge sources (e.g. expert discussion on various subjects) into a users’s proprietary knowledge.
I would prefer to have it provided as a tool (because I can create a command prompt if I wanted to), so I can easily create a transcription with minimal number of taps on the phone.
I must be able to do this within seconds of watching a YouTube video while I’m having a conversation with someone so the person doesn’t think I’m not paying attention to him/her/my child. That is the real life constraint!