Clarify Requests for Native ESI

Categories:   Syndicated Posts
Article By:  

Poring over Requests for Production this morning, I was gratified to see the client sought native forms of electronically-stored information; but the request said only, “All documents shall be Bates stamped and provided in native format.”  Is that sufficient? To me, specifying forms of production is best done via an agreed ESI production protocol, but failing that, requesting parties should supply more detail than simply asking for “native format.”  I believe requests need to lay out the forms sought for particularized types of ESI and specify the essential ancillary metadata to be produced in load files. 

Requesting native forms in discovery demands a few adaptations versus the way hard copy documents were sought in years past.  Take that request, “All documents shall be Bates stamped and provided in native format.”  If a document is supplied natively and not printed out or “flattened” to a static TIFF, where do you “stamp” the Bates number?  The solution is simple (in the file name and load file), but not obvious to lawyers unschooled in e-discovery.

Specifying more than “native format” in the request is sensible because much ESI doesn’t lend itself to production in its “true” native forms.  The “true” native form of email is typically a database of multiple user accounts holding messages, calendars, contacts, to-do lists, etc.  An opponent need not (and won’t) produce such a massive, undifferentiated blob of data.  So the better practice is to specify preferred near-native forms be produced–forms that preserve the integrity and utility of the evidence but support the granularity needed for discovery of only relevant, non-privileged material.  As well, providing a load file specification ensures you obtain metadata values that only the producing party can give you (like Bates numbers, originating hash values, source paths and custodians). Too, you want that metadata in a structure suited to your needs and tools.

Native productions are more utile and cost-effective, but only when requesting parties are prepared to reap their superior utility and savings.  One reason why producing parties have gotten away with producing inefficient and unsearchable static image formats (TIFFs) for so long is because TIFF images can be viewed in a browser; hence, recipients of TIFF productions can read documents page-by-page without review software.  Yet, that easy access comes at a perilous cost.  TIFF productions are many times larger in byte volume than a native production of the same material, making it significantly more costly for requesting parties to ingest and host the evidence.  Moreover, TIFF images tend not to work well for common formats like spreadsheets and PowerPoint presentations and don’t work at all for, e.g., video and sound files.  Finally, evidence is shorn of metadata and searchable electronic content when produced as TIFF images requiring that the stripped metadata and searchable content be produced separately and reconstructed using software to comprise, at best, a degraded “TIFF Plus” facsimile of the evidence. 

For these reasons and more, requests for production must either succeed the entry of an agreed- or court-ordered production protocol or requesting parties must include useful and practical instructions about the forms of production right in the body of the Request.

To simplify my client’s task, I drafted an Appendix to be grafted onto the Requests for Production and suggested my client take out “All documents shall be Bates stamped and provided in native format” and substitute the phrase: “All production should be produced in accordance with the instructions contained in Appendix A to this Request.” It’s not perfect, but it should get the job done.

The Appendix I supplied reads as follows, and I don’t offer it as a paragon of legal draftsmanship.  Each time I create something like this, it’s a struggle deciding what details to omit versus supplying all features of a full-fledged production protocol.  I’ve kept it to about 1,000 words, and a tad verbose at that.  It’s for you to decide if it adds substantial value over simply asking for “native format.”  Tell me what do you think in the comments. If you’d like a Microsoft Word version of Appendix A to play with, you can download it from this link:

Appendix A: Forms of Production

I. Definitions

“Electronically Stored Information” or “ESI” includes communications, presentations, writings, drawings, graphs, charts, photographs, posts, video and sound recordings, images, and other data or data compilations existing in electronic form on any medium including, but not limited to: (i) e-mail, texting, social media or other means of electronic communications; (ii) word processing files (e.g., Microsoft Word); (iii) computer presentations (e.g., Microsoft PowerPoint); (iv) spreadsheets (e.g., Microsoft Excel); (v) database content and (vi) media files (e.g., jpg, wav).

“Metadata” means and refers to (i) structured (fielded) information embedded in a native file which describes the characteristics, origins, usage, and/or validity of the electronic file; (ii) information generated automatically by operation of a computer or other information technology system when a native file is created, modified, transmitted, deleted, or otherwise manipulated by a user of such system; (iii) information, such as Bates numbers, created during the course of processing documents or ESI for production; and (iv) information collected during the course of collecting documents or ESI, such as the name of the media device, or the custodian or non-custodial data source from which it was collected.

“Native Format” means and refers to the format of ESI in which it was generated and/or as used by the producing party in the usual course of its business and in its regularly conducted activities. For example, the native format of an Excel workbook is a .xls or .xslx file and the native format of a Microsoft Word document is a .doc or .docx file.

“Near-Native Format’ means and refers to a form of ESI production that preserves the functionality, searchability and integrity of a Native Format item when it is infeasible or unduly burdensome to produce the item in Native Format.  For example, an MBOX is a suitable near-native format for production of Gmail, an Excel spreadsheet is a suitable near-native format for production of Google Sheets, and EML and MSG files are suitable near-native formats for production of e-mail messages.  Static images are not near-native formats for production of any form except Hard Copy Documents.

II. Production

1. Responsive electronically stored information (ESI) shall be produced in its Native Format with Metadata.

2. If it is infeasible to produce an item of responsive ESI in its Native Format, it may be produced in a Near-Native Format with options for same set out in the table below:

Source ESI Native or Near-Native Form or Forms Sought
Microsoft Word documents .DOC, .DOCX
Microsoft Excel Spreadsheets .XLS, .XLSX
Microsoft PowerPoint Presentations .PPT, .PPTX
Microsoft Access Databases .MDB, .ACCDB
WordPerfect documents .WPD
Adobe Acrobat Documents .PDF
Photographs .JPG, .PDF
E-mail Messages should be produced in a form or forms that readily support import into standard e-mail client programs; that is, the form of production should adhere to the conventions set out in RFC 5322 (the internet e-mail standard).   For Microsoft Exchange or Outlook messaging, .PST format will suffice.  Single message production formats like .MSG or .EML may be furnished, if source foldering data is preserved and produced.  If your workflow requires that attachments be extracted and produced separately from transmitting messages, attachments should be produced in their native forms with parent/child relationships to the message and container(s) preserved and produced in a delimited text file.
Social Media Social media content should be collected using industry standard practices incorporating reasonable methods of authentication, including but not limited to MD5 hash values.  Social media and webpages should be produced as HTML faithful to the content and appearance of the native source, or as JPG images with a searchable, document-level files containing textual content and delimited metadata (including “likes” and comments)

3. Paper (Hard-Copy) documents or items requiring redaction shall be produced in static image formats scanned at 300 dpi e.g., single-page Group IV.TIFF or multipage PDF images. If an item uses color to convey information and not merely for aesthetic reasons, the producing party shall not produce the item in a form that does not display color. The full content of each document will be extracted directly from the native source where feasible or, where infeasible, by optical character recognition (OCR) or other suitable method to a searchable text file produced with the corresponding page image(s) or embedded within the image file.  Redactions shall be logged along with other information items withheld on claims of privilege.

4. Each item produced shall be identified by naming the item to correspond to a Bates number according to the following protocol:

i. The first three (3) characters of the filename will reflect a unique alphanumeric designation identifying the party making production.

ii. The next eight (8) characters will be a unique, consecutive numeric value assigned to the item by the producing party. This value shall be padded with leading zeroes as needed to preserve its length.

iii. The final six (6) characters are reserved to a sequence consistently beginning with a dash (-) or underscore (_) followed by a five-digit number reflecting pagination of the item when printed to paper or converted to an image format for use in proceedings or when attached as exhibits to pleadings.

iv. This format of the Bates identifier must remain consistent across all productions. The number of digits in the numeric portion and characters in the alphanumeric portion of the identifier should not change in subsequent productions, nor should spaces, hyphens, or other separators be added or deleted except as set out above.

5. If a response to discovery requires production of discoverable electronic information contained in a database, you may produce standard reports; that is, reports that can be generated in the ordinary course of business and without specialized programming.  All such reports shall be produced in a delimited electronic format preserving field and record structures and names.  If the request cannot be fully answered by production of standard reports, Producing Party should advise the Requesting Party of same so the parties may meet and confer regarding further programmatic database productions.

III. Load Files

Producing party shall furnish a delimited load file in industry-standard Opticon and Concordance formats supplying the metadata field values listed below for each item produced (to the extent the values exist and as applicable):

CUSTODIAN Name of person or source from which data was collected.  **Where redundant names occur, individuals should be distinguished by an initial which is kept constant throughout productions (e.g., Smith, John A. and Smith, John B.)
ALL_CUSTODIANS  If deduplication employed, name(s) of any person(s) from whom the identical item was collected and deduplicated.
BEGBATES Beginning Bates Number (production number)
ENDBATES End Bates Number (production number)
BEGATTACH First Bates number of first attachment in family range
ENDATTACH Last Bates number of last attachment in family range (i.e. Bates number of the last page of the last attachment).
ATTACHCOUNT Number of attachments to an e-mail.
ATTACHNAMES Name of each individual attachment, separated by semi-colons.
PARENTBATES BEGBATES number for the parent email of a family (will not be populated for documents that are not part of a family)
ATTACHBATES Bates number from the first page of each attachment
PGCOUNT Number of pages in the document
FILENAME Original filename at the point of collection, without extension of native file
FILEEXTENSION File extension of native file
FILEPATH File source path for all electronically collected documents and emails, which includes location, folder name, file name, and file source extension.
NATIVEFILELINK For documents provided in native format only
TEXTPATH File path for OCR or Extracted Text files
FROM Sender
TO Recipient
CC Additional Recipients
BCC Blind Additional Recipients
SUBJECT Subject line of e-mail. 
DATESENT (mm/dd/yyyy hh:mm:ss AM) Date Sent
EMAILDATSORT (mm/dd/yyyy hh:mm:ss AM) Sent Date of the parent email (physically top email in a chain, i.e. immediate/direct parent email)
MSGID Email system identifier assigned by the host email system. 
IRTID E-mail In-Reply-To ID assigned by the host e-mail system.
CONVERSATIONID E-mail thread identifier.
HASHVALUE MD5 Hash Value of production item
TITLE Title provided by user within the document
AUTHOR Creator of a document
DATECRTD (mm/dd/yyyy hh:mm:ss AM) Creation date
LASTMODD (mm/dd/yyyy hh:mm:ss AM) Last Modified Date

The chart above describes the metadata fields to be produced in generic, commonly used terms.   You should adapt these to the specific types of electronic files you are producing to the extent such metadata fields are exist in the original ESI and can be extracted as part of the electronic data discovery process. Any ambiguity about a metadata field should be discussed with the Requesting Party prior to processing and production.

Written by Craig Ball

Craig Ball of New Orleans and Austin is a Texas trial lawyer, computer forensic examiner, law professor and noted authority on electronic evidence. He limits his practice to serving as a court-appointed special master and consultant in computer forensics and electronic discovery and has served as the Special Master or testifying expert in computer forensics and electronic discovery in some of the most challenging and celebrated cases in the U.S.

All author posts   |