An academic article published this year by the Stanford Technology Law Review demonstrates how machine learning and natural language processing techniques can provide new insights within the study of law. Machine learning, a technique developed in the field of artificial intelligence, typically uses computer algorithms to identify (i.e. “learn”) patterns in large datasets. In many applications, including the one in this paper, these patterns are subsequently used to make predictions about new data. Common commercial applications of these techniques include the predictive analytics that underlie recommendations provided to Netflix and Amazon customers.
In the case of this particular paper, the authors developed a model to predict which Supreme Court Justice wrote per curiam (i.e. unsigned and anonymous) opinions. The predictions are based on only the texts of the opinions, which are decomposed into three types of word groupings called “n-grams”: (i) unigrams, which consist of a single word, (ii) bigrams, and (iii) trigrams. The following table contained in the paper shows an example of this decomposition process:
Based on the frequency with which the Justices use these n-grams, the researchers correctly predicted the authorship of 95 out of the 117 signed opinions (81.2% accuracy) that are contained in the test dataset. The following table from the paper shows the most common n-grams associated with each of the current Justices:
The researchers also used their model to predict the authorship of one of the most prominent and controversial per curiam opinions issued in recent memory: opinions related to National Federation of Independent Business v. Sebelius, the case that generally upheld the Affordable Care Act (ACA). Both the majority and dissenting opinions were left unsigned by the Court, leading to speculation about the authorship of each. One prominent theory is that Chief Justice Roberts, who switched his support in favor of upholding the ACA through the course of the Court proceedings, actually penned both the dissenting and majority opinions. The paper’s model suggests that this theory is false, and instead indicates that the majority opinion in Sebelius was almost certainly written by Chief Justice Roberts while the dissenting opinion was most likely written by Justice Scalia. The paper’s figures below show the probability of authorship for each Justice and opinion. Results are reported for two models and the more reliable model (MaxEnt-IG) is shown on the right.
This paper not only offers a glimpse of how machine learning can provide novel information on important legal topics, but also showcases an approach that might be the key for holding Judges and Justices more accountable for their legal decisions.