The previous post was written two weeks ago. And I've been working primarily with Paul Deschner of the HLSL Innovation Lab to design an algorithm that can mine cases from a full text law review database. There are some interesting challenges in doing this.
We've been able to acquire test files with which to perform test searches and here are some of the interesting challenges that we've come up against. First off, we've discovered that the uniform system of citation to case law, which we take for granted today did not exist prior to its widespread adoption in the mid-1930's. Case names were not very regular using the "v." or "vs." between party names. In fact, I've found many footnotes where both forms were used in the same footnote! Designing an algorithm to capture all cases depends on knowing all forms of case citations. For early law reviews, this may present some tricky challenges. In addition to case names with "v." or "vs." we also need to account for other case names, such as "in re".
But, because we're trying to not only find case citations, but to also use the citations to link to the cases themselves, we also need to account for where the case's citation actually falls within the writing. For example, it's fairly common for cases discussed in an article to be mentioned by name in the text of the article, with the citation to the case in the footnote. Linking the name with the citation, so the case itself can be used in some way poses a challenge.
Another challenge is the common use of short form citations to cases. We hope to design our database so that it will rank cases by the numbers of times that it's been referred to in each article and in other articles. The use of short form names means that when we identify a citation, we'll need to design an algorithm that can identify later references to the case, even when the full name is not used. This can be tricky when the short form is either a common name or word, or even when it's not even comprised of a party name....
I've also began to notice two additional factors that present challenges. First, it was fairly common to include tables of cases in law reviews and journals. This could be very helpful in designing our algorithm, or it can present a large challenge. Tables of cases are given a variety of titles in the journals, such as Table of Cases Discussed or, simply Tables of Cases. Sometimes the cases are even listed in indexes that were once regularly published for each volume. When cases are listed in indexes, they are sometimes listed by jurisdiction or court, making the task of identifying the cases cited in articles tricky since we're looking for numbers of times cases are discussed as well as whether they're cited at all.
Another interesting challenge that I hadn't anticipated is the practice among many law reviews and journals (what's the difference between a law review and a law journal, any way?), to run a regular feature usually called something like a "Survey of Recent or Notable" cases. These surveys usually discuss cases in very brief form and amount to abstracts of cases that the law review editors feel are noteworthy for one reason or another. It seems that cases merely mentioned in regular surveys of recent cases don't qualify as "leading" cases. Therefore, references to these cases should probably be discounted.
Myths of the Digital Native (Part 2) - 90% of people don't know how to use CTRL+F to find a word in a document or web page. Instead, they search the old-fashioned way, manually skimming the text...
49 minutes ago