OCI Speech is a service that uses automatic speech recognition (ASR) to convert speech to text. The service enables developers, business units, content providers, tinkerers, and other users to transcribe audio files. With OCI Speech, users can transcribe call center calls or meetings, generate closed captions, and index and search audio and video content.
You should use OCI Speech if you need a fast, accurate, time-stamped transcription service. If you are using OCI to store your audio files, you will also enjoy lower latencies and no network costs associated with transcription.
Start here to create your first transcription, or read more about the service here.
We currently support file-based asynchronous transcription. We do not offer real-time transcription at this time.
Transcription comes with pretrained models for the following languages: English, Spanish, and Portuguese.
No. We only transcribe your content and keep no information from the file.
Like any other transcription service, the quality of the output depends on the quality of input audio file. Speakers' accents, background noises, switching between languages, using fusion languages (such as Spanglish), and multiple people speaking simultaneously can all impact the quality of transcription. We are constantly working to improve the performance of the service to provide more accurate transcriptions for all inputs and speakers.
Not currently (but soon).
We support single-channel, 16-bit PCM WAV audio files with a 16kHz sample rate. We recommend Audacity (GUI) or ffmpeg (command line) for audio transcoding. Additional audio formats are coming soon.
We support JSON (as default) and SRT (as the option with no further costs).
We use precision billing which means we charge you $0.50 for every hour of transcription, but we use seconds to measure the aggregated usage. For example, if you uploaded three files with the following durations: 3,600 seconds, 4,575 seconds, and 1,421 seconds, your monthly bill will by calculated by the sum of your seconds (9,596) divided by 3,600 (the number of seconds in an hour), multiplied by $0.50. In other words, you will be charged $1.332 or 9,596/3,600 x $0.50 = $1.332.
We named our billable metric “transcription hour.” Transcription hour measures the number of audio hours transcribed during a given month of the service.
No. OCI Speech does not have any setup charges or minimum service commitments. And there’s no hardware required.
Yes. We offer five hours of free transcription every month per tenancy.
Punctuation is a free service just like SRT. Storing SRT files may increase your storage fee.
Speech works with any recording device, and is not device-specific.
We recommend using the ffmpeg utility with the following command: ffmpeg -i <input.ext> -fflags +bitexact -acodec pcm_s16le -ac 1 -ar 16000 <output.wav>.
See the Speech Policy Setup.