Like the good little technophile that I am, Alexa has snuck into our family in the guise of an Amazon Echo (or 4). My wife claims that Alexa gets more attention than she does. I think that's a bit harsh - it would be unfair to expect that Sarah could keep track of three different timers and play Radio 6 Music while I prepare dinner. At some point in the mid-90's I read an autobiography by Bill Gates. In it he describes some of the tech in his house. The details are hazy but I remember a sense of awe that visitors to his house would don a badge and they could be recognised in different rooms and their personal preferences of temperature and music etc. could be automatically catered for (I was a geeky child ok?). It's taken a while to catch up but that house of the 'future' is no longer the sole preserve of a multi-billionaire. I can happily bark orders at Alexa along the lines of "Alexa, turn on the lounge lights", "Alexa, turn on the heating", "Alexa, play me the best of Showaddywaddy", "Alexa, how many seconds are there in a year" etc. What astounds me more than the activities Alexa can perform is the level of comprehension when I use the "wake word" to call her (it, whatever - the lines are blurred at this point!). If I ask her to set a pasta timer for thirteen minutes then that's just what I get. Not thirty minutes but thirteen. Like I asked. It's genuinely amazing and the way that these machines are constantly learning from the input provided to them means that it will keep on getting better and better. So that leads to thinking about what other applications this speech recognition might have. And directly related to my industry is the use of transcripts for qualitative interviews. Is the technology good enough to start challenging the traditional transcription service?
One of the reasons that automated transcription had piqued my interest, beyond just the introduction of speech recognition into our daily lives, was that I already had a capability to use it through the web conferencing system we employed. I could check a box in my meeting preferences and an automated transcription of the meeting would be provide to me shortly after the meeting ended and for no additional cost. Seemed too good to be true. Is it pie in the sky to think that now, or in the future, we wouldn't need the cost and turn around times of manual transcription? Before I'd begun any type of proper evaluation as to the quality of automated transcription I asked one of the transcribers that we used how she viewed automated transcription. I wondered whether she considered it a threat or else a potential opportunity. To my amazement she told us that she'd never heard of it before but had zero concerns it would impact on her. Let's see if I think she's right to be so seemingly nonplussed.
But before I get started with my own findings I first want to credit a LinkedIn post that I saw recently. It came at an opportune time as I was in the process of doing my own evaluations and this blog post was not even a twinkle in my eye. That post is called Automated transcription - are we there yet? By Saul Dobney. You can find it here.
Saul utilises a three point scale to assess the auto transcription quality. It's a good, practical scale (I'd have thrown in a "0" for garbage!).
I don't want to ruin Saul's narrative if you haven't read his article but he concludes that the current level lies somewhere between points 2 and 3. And that is my finding as well but with a few important caveats.
Saul tested this with an upload of a podcast. Whilst it is spoken word it is a fair assumption that the audio quality is high. The podcast in question was by a wealthy and influential individual - it wouldn't be a huge leap to assume that he used professional recording equipment. Why's that important? Well it comes back to the old adage of "Garbage in, garbage out". In the real world do we always have control over the audio quality of the file that we want transcribed and are we at a position where automated speech recognition can match human speech recognition under such conditions?
So while I agree with Saul's initial findings on the audio that went directly through our web conferencing system, the story is somewhat different when I played a poorer quality recorded interview from an external system. Saul does recognise that his evaluation is based on the assumption of good quality audio but doesn't state whether he tested any audio of variable quality. I did, and what we got back from some of the systems we tested in that scenario was garbage or, in the case of our web conference provision, an outright refusal to transcribe the audio - just a returned message that transcription was unavailable. If we want to consider this route then it gives weight to the idea that we need to be in control of the interview audio quality, using our system which we know will assist in providing audio of the required standard.
The next specific consideration was how well auto-transcription would work in the specialised healthcare field. How would it cope with disease areas, drug names etc? Not too bad as it turns out - there were some notable comical missteps (the text "polar bears" repeatedly crept into the text of one transcription) but on the whole not too bad. At the very least the transcriptions that had been passed a quality audio recording were a firm level 2 on Saul's scale. And actually, at this point and for the expected purpose that is entirely adequate. The nice thing about the auto-transcription we received was that it is synced to the audio/video recording of the interview. Whilst it might not have been word for word perfect it wasn't far off and the ability to be able to search for words and have the transcript take you to the point in the interview where those words were spoken is a useful tool. As things stand it's not viable to consider using auto-transcription to provide a proper client delivery but as an additional value-add then I believe that it is a interesting option so long as the limitations are clearly understood. We can make no promises on accuracy, and if an accurate word for word transcription is required then one should be provided through the normal, manual route to delivery.
The other option to consider is whether these transcriptions can be a stepping stone to a fully reviewed transcript. Might transcribers utilise them in this way and tidy them up as appropriate? How much of a timesaver would that be and what impact would that have on cost? That's beyond what I set out to discover but I daresay that the transcriber who claimed not to pay any attention to auto-transcription might be well advised to start doing so.