Find Jobs
Hire Freelancers

software development: transcribe speech to text using public APIs

$250-750 USD

Cancelado
Publicado hace más de 7 años

$250-750 USD

Pagado a la entrega
1. Task You are asked to write a program that: - takes an audio file as input, - chops it into clips at sentence boundaries, - sends these audio clips one by one to three different public speech recognition services, - saves audio clips, together with their text transcribed by the above services, into MySQL database: timestamp, length, audio clip, Google, Baidu, iFlyTek, flag where: timestamp (4-byte time): audio clip starting time in original audio file length (4-byte integer): audio clip length in millisecond audio clip (binary): 16-bit 16KHz single channel PCM Google transcription (text): utf-8 Baidu transcription (text): utf-8 iFlyTek transcription (text): utf-8 flag (integer): 0 if all three transcriptions are the same, 1 if two matches, 2 if all different. 2. Audio Source The audio could be in mp3/m4a/aac/ogg/wma format. It's extracted from youtube video. Our target is educational lectures. One example is this youtube video: [login to view URL] you can extract audio with [login to view URL] the downloadable mp3 result is at [login to view URL] You can use this for any YouTube content. 3. Audio Segmentation If you view audio file with a tool (many out there), you will visually see separation between silences and voices. Some silences are merely word boundaries or even just syllable boundaries. The rule we ask to implement is, either the silence is enough long, or the "sentence" is already 7 seconds long. In the latter case we need to chop at a locally longest silence gap. I see this sentence boundary identification as the most challenging one to those not familiar with audio signal processing. So I outline the logic above. Still, the next question is, how to really calculate "silence"?! Please follow up with methods listed in this page: [login to view URL] As one of this project acceptance criteria, we will randomly (use a random number generator on the Internet) select 50 audio clips, listen to them, and confirm the sentence boundary error rate is less than 5%. 4. Speech Recognition The three speech recognition engines are: Google: [login to view URL] Baidu: A python wrap for Baidu Yuyin API [login to view URL] [login to view URL] iflytek (Xunfei): Integrate iflytek SDK to Implement Chinese Voice Recognition in AOSP [login to view URL] Note, it is required to integrate with all above three speech recognition engines. That is, you need to do three integrations, each with its own complexities, such as applying for a free account and receiving tokens/keys. For both Baidu and iFlyTek, you are encouraged to use Google Translate, as lots of content are in Chinese. Both Google and Baidu are simple REST APIs, which allows you to implement in essentially any platform and language. But iFlyTek API is really an SDK. The best example I found is the above given Android version. So put together your only choice is Android application. 5. Implementation We are open to suggestions. But given the above, we expect a pure Android APK implementation. I will first push/copy several extracted/converted audio files into an Android phone or tablet, and then run your Android APK and get results in corresponding set of files, either in MySQL database or simply CSV format. I will then pull/copy these files back to my computer. You shall provide a way for me to randomly go to a clip, play out its audio clip, and read the transcribed text, place it into, say, Google web service and see results.
ID del proyecto: 11391108

Información sobre el proyecto

15 propuestas
Proyecto remoto
Activo hace 8 años

¿Buscas ganar dinero?

Beneficios de presentar ofertas en Freelancer

Fija tu plazo y presupuesto
Cobra por tu trabajo
Describe tu propuesta
Es gratis registrarse y presentar ofertas en los trabajos
15 freelancers están ofertando un promedio de $582 USD por este trabajo
Avatar del usuario
I want to discuss this project with you further, let me know the best suitable time for you to schedule the meeting, Feel free to message me at any time, i used to be online 14 hrs in a day on this website so probably you will get a quick response from my end.
$773 USD en 10 días
5,0 (11 comentarios)
6,5
6,5
Avatar del usuario
do u have any api in mind to implement ?
$555 USD en 5 días
4,7 (8 comentarios)
4,6
4,6
Avatar del usuario
I am a person with strong Analytical ability in Mathematics / Statistics/Economics/Finance having BSc. (specialized in statistics), MBA (specialized in Finance), MSc. (specialized in Financial Mathematics). On time delivery, clear communication, hardworking I my key attributes. I have performed number of data analysis for qualitative and quantitative research. Correlation Analysis (Correlation test/Crosstabulation/Chi-square test / Granger Causality test), Variance Analysis (ANOVA/MANOVA), Regression Analysis (Simple Liner Regression / Logistic Regression / Multiple Regression/Logistic Regression),Time Series Analysis (ARIMA),Econometric Analysis (VAR),Experimental Design (DOE) , Factor analysis ( CFA /EFA) I am familiar with SPSS, Stat, MINITAB, Eviews , LISREL, EQS
$250 USD en 10 días
5,0 (11 comentarios)
4,0
4,0
Avatar del usuario
I'm interested, but no project description so I don't know what to write here. Message me back with info if you up for it. Cheers, Alek
$555 USD en 10 días
5,0 (1 comentario)
3,0
3,0
Avatar del usuario
Hello, Professional developers with similar expertise here. We are posting our bid as an expression of interest and appreciate further discussion in private message board. We are waiting for your message to communicate further in this regard so i can provide you with the detailed proposal with pricing and timeline.
$526 USD en 10 días
4,4 (1 comentario)
2,4
2,4
Avatar del usuario
A proposal has not yet been provided
$500 USD en 8 días
0,0 (0 comentarios)
2,4
2,4

Sobre este cliente

Bandera de UNITED STATES
Cupertino, United States
5,0
1
Forma de pago verificada
Miembro desde ago 29, 2016

Verificación del cliente

¡Gracias! Te hemos enviado un enlace para reclamar tu crédito gratuito.
Algo salió mal al enviar tu correo electrónico. Por favor, intenta de nuevo.
Usuarios registrados Total de empleos publicados
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Cargando visualización previa
Permiso concedido para Geolocalización.
Tu sesión de acceso ha expirado y has sido desconectado. Por favor, inica sesión nuevamente.