mini_imo is a complete, minimal implementation of an IMO-style mathematics evaluation pipeline, featuring:
- 🔢 Math Solver (LLM-based or your tiny Transformer)
- 🧩 Short-Answer AutoGrader with equivalence checking
- 📚 Proof AutoGrader using IMO-style rubric
{0,1,6,7} - 🚀 End-to-End Evaluation Script
- 🧪 Ready-to-run sample benchmark
- 🔬 Optional: tiny GPT math model (PyTorch)
This project mimics modern math-evaluation pipelines used in LLM reasoning research.
- Extracts final answer
- Algebraic / numeric equivalence
- Strict grading (Correct/Incorrect)
- Four-level rubric:
IncorrectPartialAlmostCorrect
- Scoring mapped to
{0, 1, 6, 7} - Judges correctness & completeness
- GPT-style LLM via OpenAI API
- Or your own
mini_gpt_math.pymodel
- Reads JSONL file
- Solves → Grades → Produces CSV report
- Summary: accuracy & proof score