Discover your dream Career
For Recruiters

Why some Jane Street traders are "sloppy" data scientists

Jane Street has been expanding its capabilities in the machine learning space, buying thousands of GPUs and hiring machine learning talent. Those hires need to be data science experts, but people trading with other strategies can be more lax with their principles. Speaking on the trading firm's podcast, Signals and Threads, Jane Street trader In Young Cho says some colleagues use "sloppy data science."

Click here to follow our new WhatsApp channel, and get instant news updates straight to your phone 📱.

Sloppily processed data can be fine for models using techniques like linear regression, says Cho. There's often very low amounts of data for these "brutally simple" models, but she says they can still work if you're "very careful about the number of hypotheses that you test." To get the best from the models she says to maximize in-sample data at the cost of out-of-sample, even though this is a sloppy approach to data analysis that risks overfitting.

A 2025 study from Imperial College and hedge fund Qube suggests that this approach can work. While it says that "low true Sharpe ratio signals are particularly vulnerable to overfitting," it says out-of-sample performance is more consistent with in-sample performance when you minimize the number of signals you're looking for and the number of assets you're trading with your hypothesis. Less is more.

In contrast, Cho says that it would be "fatal" for Jane Street's machine learning models to have such a sloppy approach to data science. Machine learning models deal with very large datasets, to the point where "you can't understand the data you're putting in, much less the interaction effects you are modelling." Here, the problem is that you must ensure that teams aren't duplicating hypotheses in a way that undermines the results of other teams, as this would be bad data science "in a way that's kind of scary." 

Machine learning traders at Jane Street do exhibit some sloppiness of their own; they're encouraged to work with GPUs using PyTorch for their machine learning models which lets them evaluate hypotheses very quickly, but the code itself can sometimes be very inefficient.

Machine learning is a rapidly expanding part of Jane Street's culture. Cho says the firm has access to "over the mid-thousands of very high-end GPUs" (we presume this means more than 5,000), and "veraciously" consumes "tens of terabytes" of data per day. Still, that hardware fleet pales in comparison to rival XTX Markets which has miles more, literally. The UK-based firm has 25,000 GPUs which, if you laid them out in a line one by one, would stretch ~1.4 miles longer than Jane Street's cluster.

Have a confidential story, tip, or comment you’d like to share? Contact: Telegram: @AlexMcMurray, WhatsApp: (+1 269 237 3950)Click here to fill in our anonymous form, or email editortips@efinancialcareers.com.

Bear with us if you leave a comment at the bottom of this article: all our comments are moderated by human beings. Sometimes these humans might be asleep, or away from their desks, so it may take a while for your comment to appear. Eventually it will – unless it’s offensive or libelous (in which case it won’t.)

Photo by Ricardo Viana on Unsplash

author-card-avatar
AUTHORAlex McMurray Reporter

Sign up to Morning Coffee!

Coffee mug

The essential daily roundup of news and analysis read by everyone from senior bankers and traders to new recruits.

Recommended Articles
Recommended Jobs

Sign up to Morning Coffee!

Coffee mug

The essential daily roundup of news and analysis read by everyone from senior bankers and traders to new recruits.