Run a Premium eDiscovery Investigation

📋 Overview

About This Lab

When legal investigations or regulatory inquiries arise, organisations must be able to identify, preserve, collect, and review electronic evidence in a defensible manner. In this lab you will conduct a full eDiscovery (Premium) investigation in Microsoft Purview. creating cases, adding custodians with legal holds, building targeted search queries across Exchange, SharePoint, OneDrive, and Teams, loading results into review sets, applying machine-learning relevance training to prioritise documents, and producing final exports for external legal counsel.

🏢 Enterprise Use Case

A healthcare company’s legal team is investigating a potential data breach involving patient records. They need to identify all communications and documents related to the incident across email, Teams chats, SharePoint sites, and OneDrive accounts of six key custodians over a three-month window. The investigation must produce a defensible collection, apply privilege review to protect attorney-client communications, and export a production set in industry-standard format for external regulatory review. all within a tight court-imposed deadline.

🎯 What You Will Learn

Understand the end-to-end eDiscovery Premium workflow from case creation to export
Create and configure eDiscovery cases with appropriate access controls
Add custodians and place them on legal hold to preserve relevant data
Build targeted search queries using KQL, date ranges, and content conditions
Load search results into review sets for detailed analysis
Apply analytics including near-duplicate detection, email threading, and themes
Use relevance training with machine learning to prioritise responsive documents
Tag, annotate, and review documents for privilege and responsiveness
Produce final exports in standard formats (PST, PDF, native) for legal counsel
Manage legal holds and ensure compliance throughout the investigation lifecycle

🔑 Why This Matters

eDiscovery is legally required for litigation, regulatory investigations, and internal compliance inquiries. Failure to preserve and produce relevant evidence can result in court sanctions, adverse inference instructions, and significant fines. With the average enterprise storing over 2.5 petabytes of data across Microsoft 365, the ability to efficiently search, cull, and review electronic evidence is critical. Mastering eDiscovery Premium ensures defensible data collection, reduces legal risk, and can cut document review costs by up to 70% through machine-learning relevance training compared to traditional linear review.

⚙️ Prerequisites

Completed Lab 03. Insider Risk Management configured
eDiscovery Manager or eDiscovery Administrator role. assigned in the Purview compliance portal
Microsoft 365 E5 license. or E5 eDiscovery add-on for Premium features
Audit logging enabled. with 1-year retention minimum for legal defensibility
Test custodian accounts. with sample mailbox and OneDrive data for search testing
Exchange Online PowerShell module. for compliance search automation

💡 Pro Tip: Always coordinate with legal counsel before beginning any eDiscovery investigation. Improper evidence handling can result in court sanctions and adverse inference instructions.

Step 1 · Navigate to eDiscovery (Premium)

Open compliance.microsoft.com > eDiscovery > Premium. This is the full-featured investigation tool for legal holds, evidence collection, document review, and production in regulatory and litigation scenarios.

Review existing cases. Each case is a self-contained investigation with its own custodians, searches, review sets, and exports. Cases can remain active for months or years depending on the investigation.

Step 2 · Create a New Case

Click Create a case. Enter the case name using a consistent naming convention: YYYY-MM-InvestigationType-ShortDescription (e.g., 2024-06-Litigation-ContractDispute). Set the case description, assign case members, and configure the case settings.

Case members control who can access the investigation data. Limit membership to authorized investigators and legal counsel. All case member activity is logged for audit purposes.

Step 3 · Add Custodians

Open the case and navigate to Data sources > Add custodians. Add the individuals whose data sources need to be searched and preserved: select the users, map their mailboxes and OneDrive accounts, and add any additional data sources.

For each custodian, you can add non-custodial data sources: shared mailboxes, Teams channels, SharePoint sites, and Yammer groups that the custodian participates in.

Step 4 · Place Legal Holds

For each custodian, enable Hold. This places a legal hold on all mapped data sources, preserving content from deletion or modification. Legal holds override user and retention policy deletions.

Configure hold conditions if needed: hold all content (safest) or hold content matching specific conditions (date range, keywords). For litigation holds, always hold all content unless counsel directs otherwise.

Pro Tip: Document the date and time of hold placement and the scope of data held. You may need to demonstrate to opposing counsel or the court that a proper litigation hold was placed promptly after the duty to preserve was triggered.

Step 5 · Create Search Queries

Build search queries using KQL (Keyword Query Language). Start with broad queries and refine iteratively to narrow the result set.

Portal Instructions

In the case, navigate to Searches > New search
Name the search: Initial-BroadSearch-ContractDispute
Select data sources: Custodians (auto-includes their mailbox + OneDrive)
Enter your KQL query in the search box
Click Save & run
Review search statistics: total items, data size, items per location

Sample KQL Queries

# Query 1: Broad keyword search with date range
# WHAT: Finds emails with "contract dispute" in the subject within a specific timeframe
# WHY: Start broad to establish the universe of potentially relevant documents
# OUTPUT: All emails mentioning contract disputes sent/received in H1 2023 through mid-2024
Subject:"contract dispute" AND Received:2023-01-01..2024-06-30

# Query 2: Specific sender/recipient with confidentiality terms
# WHAT: Finds all communications to/from a specific custodian containing sensitive terms
# WHY: Targets a key person of interest and filters for documents likely to be relevant
# OUTPUT: Emails where John was a sender or recipient discussing confidential matters
(From:john@contoso.com OR To:john@contoso.com) AND 
("confidential" OR "proprietary" OR "trade secret")

# Query 3: Document types in SharePoint/OneDrive with M&A terms
# WHAT: Searches for Word, Excel, and PDF files related to mergers and acquisitions
# WHY: Key deal documents are typically in these formats; narrows results significantly
# OUTPUT: Documents in SharePoint/OneDrive containing merger-related terms
filetype:docx OR filetype:xlsx OR filetype:pdf AND
"merger" AND "acquisition"

# Query 4: Teams chat messages from a specific participant
# WHAT: Searches Teams chat and channel messages for a project codename
# WHY: Teams chats are often where informal but critical discussions happen
# Kind:microsoftteams limits results to Teams content only
# Participants: filters to chats involving a specific user
Kind:microsoftteams AND "project alpha" AND 
Participants:jane@contoso.com

# Query 5: Large email attachments within a date range
# WHAT: Finds emails with attachments larger than 5MB (5,242,880 bytes)
# WHY: Large attachments may contain data exports, spreadsheets, or document packages
#      that are central to the investigation. Helps identify bulk data transfers.
# CONCERN: Size-based queries can surface large but irrelevant files - combine with keywords
HasAttachment:true AND Size>5242880 AND 
Sent:2024-01-01..2024-06-30

PowerShell: Compliance Search

# Connect to Security & Compliance PowerShell
# WHY: Establishes the session required for eDiscovery and compliance search cmdlets
Connect-IPPSSession -UserPrincipalName admin@contoso.com

# Create a compliance search scoped to specific custodian mailboxes
# WHAT: Defines a search targeting two custodians' Exchange mailboxes
# -ExchangeLocation: Lists specific mailboxes to search (not "All" - scoped for defensibility)
# -ContentMatchQuery: KQL query filtering by subject and date range
# WHY: Scoped searches are more defensible in court - they show targeted, proportional collection
New-ComplianceSearch -Name "ContractDispute-Search01" `
  -ExchangeLocation "john@contoso.com","jane@contoso.com" `
  -ContentMatchQuery 'Subject:"contract dispute" AND Received:2023-01-01..2024-06-30'

# Start the search (runs asynchronously in the background)
# WHAT: Begins scanning the specified mailboxes for matching content
# NOTE: Searches can take minutes to hours depending on data volume
Start-ComplianceSearch -Identity "ContractDispute-Search01"

# Check search status and result count
# OUTPUT: Name, Status (NotStarted/InProgress/Completed), Items (total matches),
#         Size (total data volume of matches in bytes)
# EXPECT: Status = "Completed" before proceeding to review results
Get-ComplianceSearch -Identity "ContractDispute-Search01" | 
  Select-Object Name, Status, Items, Size

# View detailed search statistics including the query and full results
# OUTPUT: Full details including ContentMatchQuery (confirms the right query ran),
#         Items, Size, and any errors encountered during the search
# USE: Document this output as evidence of your search methodology for legal defensibility
Get-ComplianceSearch -Identity "ContractDispute-Search01" | 
  Format-List Name, Status, Items, Size, ContentMatchQuery

💡 Pro Tip: Start broad, then narrow. If your initial search returns 50,000 items, add date ranges, specific keywords, or participant filters to reduce to a manageable review set. Document each iteration of your search methodology for defensibility.

Step 6 · Refine Search with Query Builder

Use the Conditions builder for more precise queries. Add conditions: Date (sent/received window), Participants (To/From/CC/BCC), File type (documents, emails, chats), Message kind (email, meeting, chat), and Compliance label.

Run the search and review the statistics: total items found, total data size, items per location, and common file types. If the result set is too large, add conditions to narrow the scope.

Step 7 · Add Search Results to a Review Set

Create a review set: a curated collection of items for detailed review. Select your search results and click Add to review set. Choose to add all search results or a subset matching additional conditions.

Adding items to a review set triggers processing: content extraction, text recognition (OCR for images), metadata indexing, and near-duplicate detection. This can take minutes to hours depending on data volume.

Step 8 · Explore the Review Set

Open the review set and explore the navigation tools: filter panel (by author, date, file type, tags), search bar (full-text keyword search), analytics panel (near-duplicates, email threads, themes).

The review set provides a document-by-document review experience. Each item shows: the rendered document, raw text, metadata properties, and the review history (who reviewed it, when, what tags were applied).

Step 9 · Configure Review Set Analytics

Run analytics on the review set: click Settings > Analytics. Enable Near-duplicate detection (identifies documents that are substantially similar), Email threading (groups related email threads), and Themes (groups documents by topic).

Near-duplicate detection dramatically reduces review volume. if 100 documents are near-duplicates, review one thoroughly and spot-check the rest. This can reduce review time by 40-60 percent.

Step 10 · Train the Relevance Model

Navigate to the Relevance module. Create an issue (the matter being investigated) and begin training: the system presents sample documents, you tag each as Relevant or Not Relevant, and the model learns your criteria.

Complete multiple training rounds. After each round, the system shows: richness (percentage of relevant documents), recall (percentage of relevant documents found), and the overall relevance score distribution. Continue training until the model stabilizes.

Step 11 · Apply Tags and Review Workflow

Create a tagging structure for the review workflow: Relevance (Relevant / Not Relevant / Needs Review), Privilege (Attorney-Client / Work Product / Not Privileged), Responsiveness (Responsive / Non-Responsive / Partially Responsive).

Configure review assignments: assign batches of documents to reviewers, set quality control checkpoints (senior reviewer validates a percentage of junior reviewer tags), and track review progress per reviewer.

Step 12 · Review Email Threads

Use the Email threading view to review full conversation threads as a single unit rather than individual messages. This provides context and reduces redundant review. only review the latest inclusive message that contains all prior messages.

Identify key conversation threads that are central to the investigation. Tag entire threads consistently and document the significance of key communications in your case notes.

Step 13 · Handle Privileged Content

Use the Attorney-Client Privilege detection model to identify potentially privileged content. Review flagged items with legal counsel and apply the Privileged tag. Privileged content must be withheld from production.

Create a privilege log documenting each withheld item: document date, author, recipients, subject matter (without revealing privileged content), and the privilege claim basis. The privilege log may be shared with opposing counsel.

Step 14 · Configure Redactions

For documents that contain both responsive and privileged content, or responsive content with personal information that must be redacted, use the Redact feature to black out sensitive sections before production.

Document every redaction: which document, which sections, and the basis for redaction (privilege, privacy, relevance). Be consistent. apply the same redaction standards across all documents.

Step 15 · Prepare Production Sets

Navigate to Exports > Add export. Configure the production: select which review set items to export (tagged Responsive, not tagged Privileged), choose the output format (native files, PST, PDF), and set the naming convention.

Production formats matter: native files preserve metadata and functionality, PDFs provide a consistent review experience, and PSTs are efficient for email collections. Discuss format requirements with legal counsel before producing.

Step 16 · Generate Production Reports

After export, generate production reports: document count, total data volume, file type breakdown, Bates number range, and a load file mapping document IDs to file names.

Create a production cover letter documenting: what was produced, date range covered, custodians included, search methodology, and any known limitations or errors in the collection or processing.

Step 17 · Close and Archive the Case

When the investigation concludes, release all legal holds (only after counsel confirms the duty to preserve has ended), archive the case (preserving the audit trail), and document the case closure in your case management system.

Export the case audit log before archiving: all searches run, review set activities, tag changes, and export activities. This audit trail demonstrates that the investigation was conducted thoroughly and defensibly.

Step 18 · Establish eDiscovery Operations

Document your eDiscovery standard operating procedures: case creation requirements, legal hold procedures, search methodology standards, review workflow templates, production standards, and case closure checklist.

Train your legal and compliance teams on the eDiscovery workflow. Create role-based training: case managers learn case setup and hold management, reviewers learn tagging and document review, and administrators learn system configuration.

📚 Documentation Resources

Resource	Description
eDiscovery solutions in Microsoft Purview	Overview of eDiscovery capabilities
Get started with eDiscovery (Premium)	Setup and configuration guide
Add custodians to an eDiscovery case	Manage custodial data sources
Create and manage holds	Preserve content for investigation
Search for content in a case	Build and run content searches
Review set analytics	Near-duplicate detection, email threading, and theme analysis

Summary

What You Accomplished

Created eDiscovery cases for legal and compliance investigations
Configured legal holds to preserve relevant content in place
Ran content searches across Exchange, SharePoint, and OneDrive
Reviewed and exported search results for legal review
Managed the full eDiscovery case lifecycle from creation to closure

Next Steps

Next Lab: Configure Communication Compliance, Audit & Data Lifecycle Management
Extend content searches to Teams and Yammer conversation data
Integrate with external review platforms for large-scale document review