AI Research

We believe that science and technology are advanced through the ongoing, free dissemination of research and best practices. Kira, in part, has been built on the work of others and as technology leaders, we continue the tradition of sharing our research.

Hash the Universe: Differentially Private Text Extraction with Feature Hashing

In this work, we show how differential privacy can be applied to feature hashing. Feature hashing is a common technique for handling out-of-dictionary vocabulary, and for creating a lookup table to find feature weights in constant time.

Towards Protecting Sensitive Text with Differential Privacy

Traditionally, differential privacy involves adding noise to hide the true value of data points. We show that due to the finite nature of the output space when using feature hashing, a noiseless approach is also theoretically sound. This approach opens up the possibility of applying strong differential privacy protections to NLP models trained with feature hashing.

The Utility of Context When Extracting Entities From Legal Documents

When reviewing documents for legal tasks such as Mergers and Acquisitions, granular information (such as start dates and exit clauses) need to be identified and extracted. Inspired by previous work in Named Entity Recognition (NER), we investigate how NER techniques can be leveraged to aid lawyers in this review process.

Dancing with the AI Devil: Investigating the Partnership Between Lawyers and AI

In this paper, we present the insights that we have gleaned from a qualitative user study conducted with nine of our software’s users who are all legal professionals. We find that as our participants become more accustomed to the system they begin to subtly alter their behaviors and interactions with the system.

Spectator: An Open Source Document Viewer

Building a high-quality document viewer often exceeds the resources of many researchers and so, in this paper, we describe the design and architecture of our new open-source document viewer, Spectator. In particular, we provide a look into the algorithmic details of how Spectator accomplishes tasks like mapping annotations back to the canonical document.

On Tradeoffs Between Document Signature Methods for a Legal Due Diligence Corpus

We present an examination of the tradeoffs that document signature methods face in the due diligence domain. In particular, we quantify the trade-off between signature length, time to compute, number of hash collisions, and number of nearest neighbours for a 90,000 document due diligence corpus.

A Reliable and Accurate Multiple Choice Question Answering System for Due Diligence

We propose a question answering system which first identifies the excerpt in the contract which potentially contains the answer to a given question, and then builds a multi-class classifier to choose the answer to the question, based on the content of this excerpt.

On Interpretability and Feature Representations: An Analysis of the Sentiment Neuron

Using adversarial examples, we show that the generated representation containing the Sentiment Neuron (i.e., the final hidden cell state in a LSTM) is particularly sensitive to the end of a processed sequence.

From Bubbles to Lists

Following an iterative design methodology, we conducted several user studies with different versions of a document-level clustering feature consisting of three distinct phases and 27 users. We found that the interface should adapt to a user’s understanding of what “similar documents” means so that trust can be established in the feature.

Variations in Assessor Agreement in Due Diligence

In this paper, we present a study of 9 lawyers conducting a simulated review of 50 contracts for five topics. We find that lawyers agree on the general location of relevant material at a higher rate than in other assessor agreement studies, but they do not entirely agree on the extent of the relevant material.

A Dataset and an Examination of Identifying Passages for Due Diligence

We present and formalize the due diligence problem, where lawyers extract data from legal documents to assess risk in a potential merger or acquisition, as an information retrieval task. Furthermore, we describe the creation and annotation of a document collection for the due diligence problem that will foster research in this area.

Redesigning Document Viewer for Legal Documents

This paper reports on the user-focused redesign of our document viewer that is used by clients to review documents and train machine learning algorithms to find pertinent information from these contracts.

Automatic and Semi-Automatic Document Selection for Technology-Assisted Review

In this work, we investigate the extent to which the observed effectiveness of the different methods may be confounded by chance, by inconsistent adherence to the Track guidelines, by selection bias in the evaluation method, or by discordant relevance assessments.