Hash the Universe: Differentially Private Text Extraction with Feature Hashing
In this work, we show how differential privacy can be applied to feature hashing. Feature hashing is a common technique for handling out-of-dictionary vocabulary, and for creating a lookup table to find feature weights in constant time.
Read More
Towards Protecting Sensitive Text with Differential Privacy
Traditionally, differential privacy involves adding noise to hide the true value of data points. We show that due to the finite nature of the output space when using feature hashing, a noiseless approach is also theoretically sound. This approach opens up the possibility of applying strong differential privacy protections to NLP models trained with feature hashing.
Read More
The Utility of Context When Extracting Entities From Legal Documents
When reviewing documents for legal tasks such as Mergers and Acquisitions, granular information (such as start dates and exit clauses) need to be identified and extracted. Inspired by previous work in Named Entity Recognition (NER), we investigate how NER techniques can be leveraged to aid lawyers in this review process.
Read More
Dancing with the AI Devil: Investigating the Partnership Between Lawyers and AI
In this paper, we present the insights that we have gleaned from a qualitative user study conducted with nine of our software’s users who are all legal professionals. We find that as our participants become more accustomed to the system they begin to subtly alter their behaviors and interactions with the system.
Read More
Spectator: An Open Source Document Viewer
Building a high-quality document viewer often exceeds the resources of many researchers and so, in this paper, we describe the design and architecture of our new open-source document viewer, Spectator. In particular, we provide a look into the algorithmic details of how Spectator accomplishes tasks like mapping annotations back to the canonical document.
Read More
On Tradeoffs Between Document Signature Methods for a Legal Due Diligence Corpus
We present an examination of the tradeoffs that document signature methods face in the due diligence domain. In particular, we quantify the trade-off between signature length, time to compute, number of hash collisions, and number of nearest neighbours for a 90,000 document due diligence corpus.
Read More
A Reliable and Accurate Multiple Choice Question Answering System for Due Diligence
We propose a question answering system which first identifies the excerpt in the contract which potentially contains the answer to a given question, and then builds a multi-class classifier to choose the answer to the question, based on the content of this excerpt.
Read More
On Interpretability and Feature Representations: An Analysis of the Sentiment Neuron
Using adversarial examples, we show that the generated representation containing the Sentiment Neuron (i.e., the final hidden cell state in a LSTM) is particularly sensitive to the end of a processed sequence.
Read More
From Bubbles to Lists
Following an iterative design methodology, we conducted several user studies with different versions of a document-level clustering feature consisting of three distinct phases and 27 users. We found that the interface should adapt to a user’s understanding of what “similar documents” means so that trust can be established in the feature.
Read More
Variations in Assessor Agreement in Due Diligence
In this paper, we present a study of 9 lawyers conducting a simulated review of 50 contracts for five topics. We find that lawyers agree on the general location of relevant material at a higher rate than in other assessor agreement studies, but they do not entirely agree on the extent of the relevant material.
Read More
A Dataset and an Examination of Identifying Passages for Due Diligence
We present and formalize the due diligence problem, where lawyers extract data from legal documents to assess risk in a potential merger or acquisition, as an information retrieval task. Furthermore, we describe the creation and annotation of a document collection for the due diligence problem that will foster research in this area.
Read More
Redesigning Document Viewer for Legal Documents
This paper reports on the user-focused redesign of our document viewer that is used by clients to review documents and train machine learning algorithms to find pertinent information from these contracts.
Read More
Automatic and Semi-Automatic Document Selection for Technology-Assisted Review
In this work, we investigate the extent to which the observed effectiveness of the different methods may be confounded by chance, by inconsistent adherence to the Track guidelines, by selection bias in the evaluation method, or by discordant relevance assessments.
Read More
Services Resources
Find more resources about professional services.