Hash the Universe: Differentially Private Text Extraction with Feature Hashing

In this work, we show how differential privacy can be applied to feature hashing. Feature hashing is a common technique for handling out-of-dictionary vocabulary, and for creating a lookup table to find feature weights in constant time.
Read More

Towards Protecting Sensitive Text with Differential Privacy

Traditionally, differential privacy involves adding noise to hide the true value of data points. We show that due to the finite nature of the output space when using feature hashing, a noiseless approach is also theoretically sound. This approach opens up the possibility of applying strong differential privacy protections to NLP models trained with feature hashing.
Read More

The Utility of Context When Extracting Entities From Legal Documents

When reviewing documents for legal tasks such as Mergers and Acquisitions, granular information (such as start dates and exit clauses) need to be identified and extracted. Inspired by previous work in Named Entity Recognition (NER), we investigate how NER techniques can be leveraged to aid lawyers in this review process.
Read More

Dancing with the AI Devil: Investigating the Partnership Between Lawyers and AI

In this paper, we present the insights that we have gleaned from a qualitative user study conducted with nine of our software’s users who are all legal professionals. We find that as our participants become more accustomed to the system they begin to subtly alter their behaviors and interactions with the system.
Read More

Spectator: An Open Source Document Viewer

Building a high-quality document viewer often exceeds the resources of many researchers and so, in this paper, we describe the design and architecture of our new open-source document viewer, Spectator. In particular, we provide a look into the algorithmic details of how Spectator accomplishes tasks like mapping annotations back to the canonical document.
Read More

On Tradeoffs Between Document Signature Methods for a Legal Due Diligence Corpus

We present an examination of the tradeoffs that document signature methods face in the due diligence domain. In particular, we quantify the trade-off between signature length, time to compute, number of hash collisions, and number of nearest neighbours for a 90,000 document due diligence corpus.
Read More

A Reliable and Accurate Multiple Choice Question Answering System for Due Diligence

We propose a question answering system which first identifies the excerpt in the contract which potentially contains the answer to a given question, and then builds a multi-class classifier to choose the answer to the question, based on the content of this excerpt. 
Read More

On Interpretability and Feature Representations: An Analysis of the Sentiment Neuron

Using adversarial examples, we show that the generated representation containing the Sentiment Neuron (i.e., the final hidden cell state in a LSTM) is particularly sensitive to the end of a processed sequence. 
Read More

From Bubbles to Lists

Following an iterative design methodology, we conducted several user studies with different versions of a document-level clustering feature consisting of three distinct phases and 27 users. We found that the interface should adapt to a user’s understanding of what “similar documents” means so that trust can be established in the feature.
Read More

Variations in Assessor Agreement in Due Diligence

In this paper, we present a study of 9 lawyers conducting a simulated review of 50 contracts for five topics. We find that lawyers agree on the general location of relevant material at a higher rate than in other assessor agreement studies, but they do not entirely agree on the extent of the relevant material.
Read More

A Dataset and an Examination of Identifying Passages for Due Diligence

We present and formalize the due diligence problem, where lawyers extract data from legal documents to assess risk in a potential merger or acquisition, as an information retrieval task. Furthermore, we describe the creation and annotation of a document collection for the due diligence problem that will foster research in this area. 
Read More

Redesigning Document Viewer for Legal Documents

This paper reports on the user-focused redesign of our document viewer that is used by clients to review documents and train machine learning algorithms to find pertinent information from these contracts.
Read More

Automatic and Semi-Automatic Document Selection for Technology-Assisted Review

In this work, we investigate the extent to which the observed effectiveness of the different methods may be confounded by chance, by inconsistent adherence to the Track guidelines, by selection bias in the evaluation method, or by discordant relevance assessments.
Read More

Services Resources

Find more resources about professional services.
Blog

How Reed Smith Modernized their Law Firm with Upper Sigma CRM

Reed Smith is a Global Top 50 law firm with 31 offices worldwide. As a results-driven firm, they're focused on collaboration and linking...
Blog

New in cleanDocs: iManage Work 10 integration and User Action Logging for Better Business Reporting

The cleanDocs product team has been hard at work for months transforming how cleanDocs protects users. The transformation continues with the...
Whitepaper

Mastering Modern Records Management: Automate, Secure, and Comply

In today's data-driven world, effective records management is more crucial than ever. With compliance costs increasing 60%, legal...