Computational Bayesian Statistics And Applied Mathematics Expert

We're building a large-scale benchmark to test how well advanced AI systems can solve hard scientific and engineering problems. As a task designer, you'll create challenging computational problems that check whether AI can use real scientific software to do research-level work — running simulations, interpreting results, designing experiments, and uncovering hidden information from data.

This isn't a typical data-labeling job. You'll design original, graduate-level problems based on real scientific workflows, test them against cutting-edge AI models, and fine-tune them until the difficulty is just right.

You'll create problems that require skilled use of specialized scientific software. Some will ask the AI to compute exact answers from a fully defined setup — testing whether it can correctly carry out complex, multi-step workflows. Others will be harder: the AI must plan a series of queries or experiments to uncover information that isn't directly visible, which means thinking strategically about what to measure, how to read partial results, and how to narrow down the possibilities efficiently.

Each problem goes through a testing loop against state-of-the-art AI models, and you'll refine it until it hits the target difficulty.

We're especially interested in experts with deep, hands-on experience in:

Computational Bayesian Statistics and Applied Mathematics — working with libraries such as:

Bayesian statistics: PyMC, PyStan, PyJAGS, CmdStanPy
Applied mathematics and numerical PDEs: FEniCS, FEniCSx, DOLFINx, scikit-fem, FiPy, Devito, Dedalus
Computational topology: GUDHI
Differential algebra: DACEyPy

Experience with MCMC, Bayesian modeling, finite element or finite difference methods, mesh-based numerical modeling, computational topology, differential algebra, or other specialized Python-based math and statistics methods is valuable. You don't need experience with all of these — solid experience with even one will be highly regarded.
Experience with other specialized software in this domain will also be considered.

You have graduate-level expertise (MS or PhD preferred) in the domain above, with real hands-on experience using these tools — not just theoretical knowledge. You've written code using these libraries to solve actual research problems, and you understand where they break, what their edge cases are, and what makes a problem genuinely hard rather than just complicated.

Beyond domain expertise, the best candidates think like puzzle designers: building problems where the challenge comes from smart reasoning rather than raw computation, where several approaches seem plausible but only careful analysis reveals the right one, and where surface-level pattern matching won't get you to the answer.

Requirements:

Graduate-level training in a relevant STEM field (MS, PhD, or equivalent research experience)
Proven proficiency with at least one of the listed scientific software libraries, shown through research publications, open-source contributions, or professional work
Strong Python skills — you'll be writing problem setups, oracle functions, and solution validators
Ability to work independently and refine problem designs based on feedback
Comfortable working in a Linux/terminal environment with remote compute sandboxes
Available for at least 15–20 hours per week

Nice to have:

Experience across multiple listed domains or tools
Familiarity with benchmark or evaluation design
Background in scientific teaching or exam/problem-set design
Experience with computational reproducibility and containerized environments

Please note: This application includes a coding assessment as part of the evaluation process.

We consider all qualified applicants without regard to legally protected characteristics and provide reasonable accommodations upon request.

Contract and Payment Terms:

You will be engaged as an independent contractor.
This is a fully remote role that can be completed on your own schedule.
Projects can be extended, shortened, or concluded early depending on needs and performance.
Your work at Mercor will not involve access to confidential or proprietary information from any employer, client, or institution.
Payments are weekly on Stripe or Wise based on services rendered.
Please note: We are unable to support H1-B or STEM OPT candidates at this time.

About Mercor:

Mercor partners with leading AI labs and enterprises to train frontier models using human expertise. You will work on projects that focus on training and enhancing AI systems. You will be paid competitively, collaborate with leading researchers, and help shape the next generation of AI systems in your area of expertise.

Tagged as: Data Science, Mathematics

Computational Bayesian Statistics and Applied Mathematics Expert Part-time

Mercor