7Safe advert


Metadata; The Silent Witness


April 2009 | Author: Adrian Cassidy(7Safe)

Migration of litigation to an electronic age has brought about many challenges but these are equally confronted by the technology of which it is borne. Although metadata analysis will not always be crucial to every case it has to remain a fundamental consideration by all eDiscovery practitioners at each stage of the investigation model. The ‘silent witness’ that resides within electronic documents speaks volumes, we just need to listen to what it has to say.

The purpose of this paper is to explore the evidence potential of a new stream of data that is brought about by the electronic migration. This data, known as metadata, is a characteristic and critical component of electronically stored documents. To the mainstream computer user this data could be regarded as ‘silent’ because the user is typically unaware of its existence and in many cases oblivious to its relevance. Metadata is the crucial difference between electronic and printed or typed documents. All the available information in a paper document is displayed on its face, but this is not so with electronic documents. Electronic documents carry their history with them. In other words, paper shows a document in its entirety—metadata tells us where the document went and what it did.

In the age of paper-only documents, litigation lawyers were reliant solely on document content in order to assess and sort for relevant evidence. Indeed there were already issues to overcome relating to ownership and, of course, duplicate and near-duplicate documents. However, the age of electronic documents, brings a whole new set of challenges when searching, locating and presenting relevant evidence. Many of these challenges are a redesign of original issues. For example, the sheer nature of the electronic environment easily leads to tens, or even hundreds of duplicate copies of a single document. In addition, user documents are now stored within, and alongside, electronic data containers and system files that have no importance to the content of the document. Fortunately, the very same technology that has bred these challenges has equally produced powerful and efficient solutions to confront the masses of data that even a relatively small investigation would produce. This has brought new techniques of assessment and culling; including indexing for speedy keyword searching and filtering based on document type or on a custodian-specific basis.

What is Metadata?
Metadata (also known as “embedded data or embedded information”) literally means “data about data.” Nearly all modern day software applications will embed various categories or fields of metadata within the documents that a user will create. This metadata may describe how, when, and by whom an electronic document was created, modified, or even transmitted. This clerical information not only assists in data retrieval, but also potentially reveals a document’s history. Both lawyers and investigators practicing in the age of eDiscovery cannot ignore or underestimate this source of evidence. Lawyers can use it to strengthen their cases and present a fuller and more comprehensive account, whilst analysts and investigators could streamline document recovery and review through attributes such as date and time, custodian, author, subject field or in the case of email, the service provider.

Types of Metadata
As stated, files created within different software applications store data about the data. For instance, Microsoft Word automatically creates a number of metadata fields that contain personal data that could prove crucial to an investigation. A single, well-documented case shows how the British government inadvertently circulated sensitive security information on the Downing Street Website. To go back in time to February 2003, Downing Street published a dossier in a native Microsoft Word format on its website. The document had no restriction for download and therefore was freely available inclusive of its embedded metadata. The government suffered some embarrassment in the following days as the embedded metadata held revision logs detailing the names of four government officials along with details of the “Communication Information Centre”, a unit of the British Government. It could also be seen that portions of the document where plagiarised from a previously saved document. The document was eventually removed and replaced with a secure PDF version.

As a forensic investigator I have myself analysed metadata on a number of occasions to successfully prove custody and ownership of documents and images. As a simple example, during one investigation I recovered a Microsoft Word document that had been created as a fraudulent motor insurance document. The embedded metadata held the creation date along with the offenders name and the last printed date. Initial denials rescinded to red faced admissions.

From an eDiscovery approach one of the most valuable examples of metadata is that embedded within email. An email carries information about its author, creation date, attachments and recipients (including bcc and cc recipients). When an email is used as a means of transport for documents, such as word processed files, spreadsheets, or images, links to those attachments become part of the email’s metadata. Recovering these links can assist a reviewer in establishing what document, and which version of it, was attached to a particular email.

With an accumulation of specialised software tools coupled with an increasing appreciation of the evidentiary value of metadata there is a growing awareness of the importance of metadata within litigation. It has become essential that the metadata is preserved throughout the collection, processing, review and presentation processes. If at any point the printed document is provided for production then this metadata is lost. As previously highlighted, a printout displays only the content of the document and fails to appreciate any of the embedded metadata. To an un-appraised litigation team it would be a simple error to believe they are conducting “electronic discovery” when in fact they are merely working with electronic images of documents. The processes undertaken must have included an appreciation and assessment of any relevant metadata fields. There are a number of forensics applications which will readily extract and present this data but let us just imagine the consequences of a failure to acknowledge or consider this information. It may be that litigators are requested or even ordered to produce vital information that may have not even been considered or even not collected through bad practices.

Efficient searching and review
Metadata has such an impact on eDiscovery that its presence must be utilised efficiently for streamlined practices. Accurate and reliable keyword searching across a large data set is of course invaluable to the eDiscovery process and therefore the use of metadata to harvest relevant information is yet another powerful and resourceful procedure if undertaken with correct consultation. For example, a litigation review team may ask for specific documents based on custodian (authors and recipients), creation or sent date or email with CC or BCC activity. A forensics team will be able to evaluate the relevant metadata fields to create dynamic searches based on the content alongside traditional keyword techniques. Metadata analysis will also show a graphical representation of the exchange of information. The following image shows how metadata can be utilised to illustrate communication in an investigation of email correspondence.

It takes little imagination to see the worth of metadata. It will be invaluable during the eDiscovery process, whether searching for relevant documents, collating evidence, or simply assisting with the review of masses of documents to a given deadline. It potentially could evidence details that may otherwise create disputes or a need for further discovery.

Metadata within the EDRM
The Electronic Discovery Reference Model (EDRM) was created to develop and establish practical guidelines and standards for electronic discovery.

As the diagram is intended as a basis for discussion we can examine the practical consideration of the presence of metadata throughout the process. The data is embedded so it exists within the native documents that are themselves subject to the model so therefore the question is raised as to which stage of the EDRM consideration of metadata must be given? Firstly, consideration should be throughout and constant. As previously stated, metadata is a hidden wealth of information that is both potentially evidential and searchable. I will highlight a few examples of such considerations toward metadata for each point of the EDRM.

Information Management -Metadata will exist by its definition upon creation of documents and electronic communication. Effective information management will assist in mitigating risk and expense should electronic discovery become an issue, and it may very well be dependent on reliable metadata.

Identification – Early and effective identification of relevant electronically stored information may be assisted by the analysis of embedded data. There may be a case for document production in accordance to creator, author or custodian or communication between specific date ranges. Such specific details may be culled from the analysis of embedded metadata.

Preservation and Collection – There are a number of metadata removal applications. It would be evidential to locate any such software applications along with any evidence indicating the use of such tools. To confirm the preservation of any a forensics approach there may be need to investigate the use of any such tools at the point of collection.

Processing, Review and Analysis – Any processing of collected data should effectively account for the extraction and presentation of all metadata fields to the review platforms. Review teams will need all relevant material displayed to assess value in terms of metadata.

Production and Presentation -A “native data” file is a file in the original file format in which it was created (i.e., in the specific software applications used to create each individual document). Examples are Microsoft Word, Microsoft Excel and WordPerfect. Should metadata be highly relied upon as evidential, then its native presentation would perhaps be best replaced with a more preferable electronic presentation or TIFF conversion with a full representation of any metadata.

As we work through the ERDM model process it becomes obvious that the presence of metadata has a significant impact throughout the eDiscovery process. There has been a recent acknowledgement of the importance of the presence of metadata within the eDiscovery process. In February 2009 the EDRM released a metadata dictionary along with an XML metadata spreadsheet. The document has laid out a guidance of metadata fields that should be considered throughout the EDRM model. As practitioners we can utilise this information to ensure that the export and presentation of documents are at a minimum, in accordance with the ERDM ideology. The welcomed creation of standards relating to metadata will undoubtedly generate a raised awareness of embedded data amongst practitioners within eDiscovery. Maybe then the challenges are not of analysing or producing this data but of bridging the knowledge gap of professionals working within the eDiscovery process. A depth of knowledge and experience of computer forensics has to continue to be the foundation of efficient practices however this has to be complimented with an understanding of the impact of this knowledge within the field of eDiscovery. As we all know small changes to procedure can result in a substantial impact to both finances and resources. An effective eDiscovery solution will always be based on respectful consultation to allow the free flow of knowledge between lawyers and forensics practitioners.

References
1. Government word bytes Tony Blair, found at http:// www.computerbytesman.com/privacy/blair.htm (date accessed 18th February 2009)

2. Windows Incident response metadata and eDiscovery, found at http://windowsir.blogspot.com/2006/09/metadata-and-ediscovery.html (date accessed 23 February 2009)

3. The hidden dangers of documents, found at http://news. bbc.co.uk/1/hi/technology/3154479.stm (date accessed 18th February 2009)

4. EDRM (Electronic Discovery Reference Model) found at http//www.edrm.net/ (date accessed 24 February 2009)

5. Version 1.1 EDRM Schema – metadata tags. Found at http//www.edrm.net/blog/archives/153 (date accessed 24 February 2009)

Contact: Adrian Cassidy is a Lead Information Security Consultant with 7Safe. Tel: +44 870 600 1667

 

PCI DSS More About PCI DSS 7Safe Training Passports More About 7Safe Training Passports Computer Forensics Computer ForensicsPenetration Testing More About Penetration Testing