Announcement of the First o1 pro Guided Federal Litigation | Steve Sokolowski

Announcement

Today, we are announcing what we believe is the first OpenAI o1 pro guided litigation, and perhaps one of the first AI guided lawsuits filed in the Federal system.

https://shoemakervillage.org/temp/complaint_as_filed.pdf

The action, Sokolowski et al v. Digital Currency Group, Inc. et al (4:25-cv-00001-WIA) alleges that defendants Digital Currency Group, its CEO Barry E. Silbert, and former Genesis Global Trading CEO Michael Moro, defrauded plaintiffs Stephen Sokolowski and Christopher Sokolowski by willingly and recklessly signing a fraudulent $1.1 billion promissory note, which was then used to generate fraudulent balance sheets that the plaintiffs relied upon in deciding to renew their loans to Genesis. The suit seeks the return of coins defrauded from the plaintiffs, pursuant to the treble damages allowed by the Pennsylvania Unfair Trade Practices and Consumer Protection Law (73 P.S. §§ 201-1 - 201-9.2), of a current value of $26,070,000.

Introduction

As the case progresses, I will be providing lengthy blog-like updates on X (https://x.com/SteveSokolowsk2). While we won't discuss motions, the fraud, or cryptocurrency in general, we will focus on how AI is being used to guide me and the other plaintiff in this case. As we spent over 200 hours in December alone on this case, the initial post cannot cover all of the intricacies of AI usage in a legal setting, but I will try to provide an initial general overview so that details can be filled in later.

History

This case is an example of the democratizing effects of AI. When this fraud occurred in 2022, and we were quoted $800,000 to pursue this case, we had no idea that we would ever be able to obtain justice. As the Complaint states, we were people who had worked their entire lives and lived frugally with a plan to retire in the future. We spent years calling various attorneys and litigation finance firms, but these firms operate in a circle. Litigation financers want a complaint, while attorneys want money to create a complaint. There's no way to break into this circle - if you can get any of these companies to call you back at all. When I called Leech Tishman, a Pittsburgh law firm, their phone system suddenly failed, and then it took the attorney some two weeks to return my call, and then I called back, and never got a response. That's the way the legal industry works.

Thus, a key weakness in the legal system is that when the defendants themselves are the ones who take all of your money, you can't afford to sue them. Had the defendants taken less, money would have remained to hire an attorney. In essence, the bigger the fraud, the more likely you are to get away with it - and this undoubtedly encourages defendants in general to "go big or go home."

Eventually though, Claude 3.5 Sonnet was released, and it was finally capable of evaluating the law (but it still made errors in interpreting the precedential value of cases in its training data.) Then, OpenAI changed all that with o1 pro. OpenAI's o1 pro is an artificial general intelligence (AGI) system that is smarter than any lawyer I've talked to.

Workflow - o1

When o1 was made available, we quickly signed up and compared it to Gemini Experimental 1206. We determined that both were acceptable for moving forward, but o1 was clearly superior in understanding case law and anticipating defenses.

We settled on a workflow - Chris created a database of evidence and combed through dockets, writing Python to create thousands of rows containing every single entry in the Genesis bankruptcy case, along with docket entries from other actions against Genesis and DCG. We were then able to use o1 to summarize the gist of the most important documents (ignoring entries like certificates of service and notices of appearance.) We were left with summaries that could be put into a single context window, which the models could reason over and determine where the most pertinent evidence lies. Nearly all of the quotes from the paragraphs between 50 and 90 were spotted by o1 as being useful evidence for the case, and it always quoted them correctly verbatim.

Workflow - Gemini

The Experimental 1206 version of Gemini, perhaps because it doesn't "think" or use multiple runs, is more prone to hallucinations compared to o1. However, for an unknown reason, Gemini is much more argumentative and negative when it comes to evaluating a user's work output.

Gemini's skill at evaluating user work is useful across a variety of work products. I input this complaint into Gemini at least 100 times and simply asked it for its feedback, and it would give it - down to "this isn't written like an attorney would write it, and here's a suggestion." While o1 is exceptional at being precise, Gemini is better at creative tasks where there isn't a single correct answer.

A good example usage of Gemini being generally useful for user feedback is in evaluating "Pretend to Feel," (https://soundcloud.com/steve-sokolowski-2/16-pretend-to-feel), which is also being released today and will be discussed in a separate post. In that case, the song was input into Gemini 25 times during its development, and I didn't stop until it finally said I had succeeded in creating a song that pulled users into its own world.

The key to using Gemini is that it will often be extremely negative at first. It evaluated the first version of the complaint as likely to be dismissed. It tore apart the first version of the song. Gemini 1206 (but not earlier versions) is remarkably consistent across runs, allowing a product to be evaluated with the same prompt over time.

Simulations

Once the complaint was nearing completion, we embarked on a set of simulations, and these simulations took up most of the time between December 20-30. We took advantage of Christmas Day, when OpenAI must not have had much traffic, to power through massive usage of o1 pro.

While o1 is good at drafting, o1 pro's reasoning is what has finally made it feasible for litigants who can't afford an attorney to proceed by themselves - and the way to do that is through simulations. At first, I conducted the simulations by simply pasting the complaint into o1 pro and asking it to evaluate the strength of defense arguments to dismiss it. But then I had an insight almost by accident - o1 pro is much more accurate if you ask it to actually generate the motion to dismiss first. So, the prompt is something like this.

"You are an expert defense attorney and this is very important to my career. Think about all the possible reasons to dismiss this complaint, no matter how strong or weak they are. Then, write the most comprehensive motion for dismissal you can think of, on behalf of defendant [insert the name of each defendant here, run 3 times]. Output your comprehensive motion for filing on the docket and consideration by the judge."

Then, after the motion is created, the next prompt is:

"You are a Federal judge. Evaluate this complaint and the defense's motion to dismiss. Output a comprehensive ruling with your decision about whether you will allow this case to proceed to discovery or not. Make sure you explain the reasoning behind every part of your decision."

I ran this simulation many times, and the last "judge" denied the motion 0/10 times. With Gemini, the "judge" denied the motions based on the final complaint 2/10 times, but used reasoning that suggested it misunderstood the facts. We had reached a point of diminishing returns at that point, but I can't be certain whether the dismissals were due to Gemini's weaker reasoning (it clearly is weaker at reasoning), or whether a human judge will also misunderstand the complaint because we forgot to include a key fact that we take for granted.

Assessment

We also used o1 pro to run probability analyses on various positions we could have taken - from the claims we could have made to the strategies used to litigate. o1 pro was directed to assign winrates if specific positions were taken and if specific facts were uncovered. We cross-checked these winrates with Claude 3.5 Sonnet and Gemini 1206. While we continue to run everything by multiple models because the rest of our lives are at stake, at this point we're realized that o1 pro is so accurate that, if it became necessary, it would likely be possible to rely solely on its analysis.

One of the most useful analyses o1 pro was able to perform was to take the defendants' own positions, since they have been involved in so much litigation, and use that to predict their arguments for this case. We were also able to create a database such that when a motion is made, we can quickly determine if the defendants have contradicted what they claimed in any previous court filing over the past three years.

Finally, we assessed the defenses the defendants were going to make by spending two days pitting different models against each other in a sort of chess game. We saw what the AI defendants did. Then we took it up a notch, telling the models to intentionally adopt strategies like "file frivolous motions," "threaten the plaintiffs with sanctions," "the plaintiffs will give up if you wear them down," "implead as many additional defendants as possible," "file many cross-claims and counterclaims," "blame each other," and so on. If any of these strategies are used, we will be ready for them.

Planning

Before the complaint was filed, we used o1 to create a plan for the litigation. The model, for example, predicted that the litigation would require about 1300 hours of work. It predicted 160 hours for the creation of the complaint, which was very close to the 220 that was spent, given that a lot of that work was simply spent re-reading over and over documents like the Federal Rules of Evidence.

We also asked o1 and o1 pro to produce a master timetable and to determine what we need to further discover. Fortunately in this case, most of the evidence we need is already in public (as the complaint states.) What little discovery is required was advised by the models and we edited o1 pro's plan based upon knowledge it didn't have in its context window.

Addressing "naysayers"

Although the details were not made public until the complaint was filed, I've made it clear online for some time that this case was incoming and that it would be AI guided. One of the most common criticisms was that a case like this takes years of effort and "you don't know what you're getting yourself into."

To address that criticism, the first point is that we obviously know that the case will take years of effort; o1 pro actually estimated fewer hours than the 2000 we had originally predicted. However, there is no greater economic value of our time than pursuing this case, given that 90% of our net worth was taken by the defendants.

As to the criticism that we are putting a lot on the line by taking on such a massive case without attorneys, we aren't putting anything on the line at all. The case was worth zero before, since no attorney would take it on without a significant deposit and, as we said earlier, the defendants took all our money. Now, having been filed, it's worth more than zero. Even if the odds of winning were low, the leveling of the playing field for AI make it worth pursuing.

But speaking about the odds of winning, the odds are actually not low, by neither our own estimate nor by any of the models' estimates, and we are very optimistic about winning. We have cut back our spending, have cleared our schedules for the next several years, and I spend my evenings and holidays now reading the Federal Rules of Civil Procedure and case law on fraud. We will take this all the way through a bench trial, seek the treble damages we are entitled to due to the defendants' willful conduct, will defend an appeal if necessary, and force the defendants into bankruptcy if it comes to that. AI is only going to improve from here out.

We are in this to win.

Conclusion

I want to thank Sam Altman and OpenAI. While they often receive criticism for other reasons, in this circumstance they have allowed two people to have a shot at living the life they had spent 20 years planning for. They said they wanted to make the world better for people with AGI, and they now have a concrete example of their efforts.

I will be continuing my posts here and on X about this case over the next few weeks to discuss the period during the preparation of the complaint. One of the posts will discuss the specific prompts and strengths and weaknesses of each model. While we won't be available for discussion about the facts of the case from here on out, I encourage everyone to read the complaint and view its exhibits in full to make their own judgements.