Guest article from Navigant: Let’s talk about Voice

Navigant logoI recently recorded a video with Tanya Gross of Navigant in London called The technology changes which allow proportionate disclosure of audio data. You can find it here and at the foot of this page. At the same time, Tanya Gross and Alex Dunstan-Lee of Navigant wrote an article which was a useful survey of the development of audio technology as it has marched in step with the volumes, and the importance, of audio as disclosable evidence. I give it here:

Let’s talk about Voice

A structured and methodical approach to audio analysis and review in e-discovery by Tanya Gross and Alex Dunstan-Lee of Navigant in London

Over the last 5 years, the need to gather and review audio data (voice recordings) as part of an e-discovery exercise has increased dramatically. Technology capable of helping lawyers search and review audio data is evolving from obscurity into the mainstream.

This evolution has been driven by two key factors: first, legal requirements (predominantly in the UK and US) stipulating that certain recordings must be made and kept for a certain time (most specifically in relation to telephone conversations connected to the trading of financial instruments); and, second, the increase in financial services investigations over that period, which has called for this data to be discovered.

However, this evolution is nascent. Parallels might be drawn with the evolution of tools designed to handle e-mail review. The evolution of audio review tools is perhaps 10 years behind that of traditional review tools. 10 years ago, technology did exist to support large-scale e-mail review, but the ease of use, evidential integrity, search functionality and reliability of those tools was far away from what can be found today. Even more importantly, people with experience and deep understanding of the tools themselves were limited. As with those early review tools, the audio technology needs to be treated with caution – it can be highly effective, but it needs to be in the right hands.

We were recently asked by a lawyer: ‘so what’s the best audio technology?’ Our answer was (predictably) that ‘it depends’ – but the question itself belies a common misconception (typically made by clients who are new to e-discovery) that success hinges on the technology alone. In our view, it’s not a simple decision between one piece of technology and another. The efficient and safe discovery of data is not just about technology – it’s about the right process and the right people. Now more than ever, as regards audio data, clients need reminding of that.

This article provides an overview of the key areas lawyers and their support teams need to consider when approaching an investigation or litigation involving audio data.

Identification, Collection and Preservation

 As with any data capture exercise, the capture of audio requires the cooperation of a number of parties.

  • The in-house legal team, to understand the context of the regulatory request and the time periods in question,
  • The IT team to ensure the appropriate data is harvested, and
  • The business operations/HR to obtain historical information such as employee records.

Before the data is extracted, it’s prudent to examine a sample data set in order to understand the information available. This will arm the legal team with the appropriate information to respond to the regulator in advance of disclosure and provide them with the detail to assist with their review once the data is harvested.

Most regulatory requests will state a particular date range of interest, and possibly the names of several suspect traders (custodians) and potentially even the content of the suspect target audio data. For the sampling exercise one may wish to select two or three custodians of interest to analyse what data content is available before all data is extracted for the period in question.

Detailed below are some areas that should be considered for the sampling exercise and the wider extraction and preparation of the entire data set in question:

(1) Restricting Time Periods:

Identifying accurate time periods of relevance will be a major factor in restricting the volume of data. A traditional date “x” to date “y” is likely to be over inclusive and contain large levels of non-relevant audio. Uncovering patterns of activity that deviate from expected levels can help design date range tranches. These tranches can initially be the preceding days/hours leading up to the periods of deviation and then expanded as necessary. Each case will differ in the methodology used to identify unusual patterns of activity. The nature of the issue will dictate where and what type of information might be used to define expected levels. In recent financial services market abuse cases, the underlying rate submission data could be used to identify pertinent time periods. Trading profile reviews can also be implemented to detect irregular orders and activity by traders.

(2) Custodians and Metadata:

 Identifying which audio data to extract can be challenging. Many audio systems store data by audio channel; therefore it’s important to understand which channel relates to which custodian that is required for analysis. Dependant on business policies, this information may not be readily available as historical employee records detailing the desk locations of where each individual sat during the period in question may not be maintained.

 Furthermore, telephone extension numbers could be shared between a number of potential custodians. Close attention to how the callers are identified (ID, name, extension number, date of the call), whether outgoing calls are recorded differently from incoming calls and how the calls divert if not picked up, are all aspects that should be considered. Extracting out the audio without the metadata provides very little context and will make responding to a regulator’s request very difficult.

 (3) Audio input devices:

 Desk phones are typical input devices for audio recording but in a trading floor scenario, ‘Squawk Box’ (intercom systems used by broker/dealers and other financial professionals to broadcast offer, bid and other market information to traders and other market participants) will also need to be taken into consideration.

 Squawk boxes can be configured in a number of ways. Some trading floors choose to record audio data for 24 hours a day, seven days a week; other trading floors set them to be motion activated or configure them to be controlled by specific keys on the keyboard for activation.

 This means the audio recordings from these devices can be quite long (minutes rather than seconds) with long areas of silence. One way to limit the amount of squawk box data for legal review purposes is to seek agreement to process only data recorded during working hours during the working week.

 (4) Audio storage:

 Audio data is typically stored in an encrypted and compressed format which needs to be converted to a useable format that can be run on a windows environment for analysis. The need to carry out this conversion (which can be time-consuming) needs to be taken into account when estimating timelines for the review.

 Data Preparation and Processing

 At this stage, it’s important to consider how to reduce the volume of data and refine it for the review.

The data can be de-duplicated, which will reduce the volume of data to be indexed as tools charge per hour for the data that is loaded.

Further steps can be taken to remove white noise or audio silence using audio forensic tools to both reduce the volume of data and improve the review experience.


 There are two key challenges to the searching of audio data, distinct from the challenges involved in a traditional data search. The first relates to the quality of the data. No matter how good the searching of the technology may be, if the recording quality is poor then there may be very little that can be done. That said, today’s technology coupled with an iterative approach (as discussed below) can help ‘train’ the system to recognise poor quality sounds that it might otherwise not.

The second key challenge is that verbal language is very different to written language. In general, it is less formulaic and the sort of words and phrases used will depend on the context and the individuals themselves. Again, the iterative process of optimising the technology as part of the searching technique is the key to managing this challenge.

There are two main searching techniques currently used in the e-discovery industry for audio data.

(1) Phonetic Searching

 This technology enables the audio to be searched on the basis of sound, where a ‘phoneme’ is the smallest component of speech. This enables language and dialects to be indexed and can be very effective if you are searching for unique words or phrases. Phonetic searching becomes challenging when searching for keywords that are phonetically similar to other words e.g. ‘realise’, ‘real eyes’, ‘real lies’, which can be adjusted on the basis of the threshold of results that are returned, this can be further adjusted by optimising the audio by validating the sounds of the words you are trying to identify. Often it means validating all of the results which are yielded from the search.

 (2) Automatic Speech Recognition

 This enables audio to be transcribed as text, where the computer transcribes the spoken word from the audio to text. The transcription of the audio relies on a library of terms to convert the audio to text. Once the text is transcribed it can be searched using keywords as with any other text based document. The challenge with this method is that the computer requires training to increase the accuracy of the results, along with the supplementation of other data sources to widen the library of terms to ensure the output reflects the audio. As with phonetic searching, it requires sampling and validation that the transcription output reflects the audio. One benefit to this approach is that the text can be indexed and clustered by concept or theme.

 Review and Categorisation

 If the previous stages have been managed correctly, then by this stage the legal team should be reviewing a significantly reduced volume of audio than which was extracted at the outset. Many of the available tools charge on the volume of hours loaded – so it is in the client’s interest to reduce the volumes prior to loading in to a review tool. (The technology expense is also likely to be dwarfed by the expense of having legal reviewers reviewing large sets of irrelevant material.)

The metadata along with the audio data can be uploaded to a system to enable review. The legal team can then listen, classify and annotate the results on a database.

The initial sampling exercise can be useful in understanding the likely costs and timelines involved in reviewing the data.   Having that information early in the process can help with negotiations with the regulator or the other side in litigation over the extent of the search and the deadlines for producing results.


 As with traditional data discovery, it is important to remember that there are options when it comes to audio data.   There is no ‘one size fits all’. The sophisticated technology on the market can be very helpful but, in some cases, the expense of such tools may not be required – there may be a simpler way.   At Navigant, we had a case where the lawyers knew that they were going to have to listen to every single conversation, and so we developed a fairly simple dashboard to help them move through the conversations and tag them as interesting or not – the cost of this tool was minimal.

In our experience, the regulators do not have fixed views on how to handle audio data as part of an investigation.   Certainly we have seen the CFTC and the FCA open to an approach involving well-considered searching techniques (phonetic or otherwise). There is an opportunity to guide the regulator here.

Current regulatory rules will help to define the scope, timeframes and requirements for storing and retrieving trade and other regulated information, but are unlikely to guide organisations on how to prepare for or respond to regulatory action. Audio will continue to be a major form of business communication and, while the storage and recovery options will likely improve over time, the overall need to maintain, analyse, and report will remain complex. Developing and maintaining a pragmatic, sensible and proportional approach, both proactively and reactively, is critical to saving costs and time when regulatory challenges arise.

Tanya Gross and Alex Dunstan-Lee of Navigant

My video interview with Tanya Gross on the same subject is given below:


About Chris Dale

I have been an English solicitor since 1980. I run the e-Disclosure Information Project which collects and comments on information about electronic disclosure / eDiscovery and related subjects in the UK, the US, AsiaPac and elsewhere
This entry was posted in Discovery, eDisclosure, eDiscovery, Electronic disclosure, Navigant and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s