Tidy, Tested, Safe. A Brief Guide for Productionizing Data Science Code
New operations research graduates enter a world where they are expected to implement solutions, yet the typical OR curriculum fails to discuss implementation details. This talk, from a software developer with 30 years of experience, seeks to bridge that gap.
The speaker eschews an “art of modeling” approach in favor of a clear and unambiguous checklist. Solutions should meet the following criteria:
- Organized into a library that can then be imported by another engineer.
- Associated with a rich suite of testing data that fully exercises the production code.
- Bullet proofed against input data that fails basic sanity checks.
The speaker motivates these requirements based on actual work experience and references public examples based on common instructional models like diet and network flow. He deliberately avoids the temptation to frame real-world challenges in terms of ever more complex mathematical requirements. Instead, he uses simple, well-understood models to better focus on the boilerplate code that makes data science logic truly useful.
The ideal audience member is someone who is proud of what their computer programs can do, but nervous about sharing their code with someone else. OR graduates are often incredibly sophisticated in their ability to craft mathematical logic that correctly addresses complex modeling requirements. Yet their work product often lacks basic software hygiene. This is a challenge that’s easily addressed, given clear instructions and a willingness to commit to a disciplined practice.
Essential