Best Practices
Posted on 12 February 2022
This is a list of best practices for programming. These are non-exhaustive, ever evolving and should not be considered final. Also, every rule has many exceptions; they are not meant to be followed to the letter. Almost every rule should be followed by "when possible".
General
- Discuss best practices with the team.
- Include best practices into the repository.
Tooling
- Enforce a linter and formatter for every language in the project.
- Choose linters and formatters that are as opinionated as possible.
- Linter warnings should be errors, unless explicitly silenced.
- Set up CI/CD for linters and formatters on every PR.
- Use Git.
- In projects with heavy data processing, use a DAG pipelining tool.
- Don't jump on the latest technology immediately, but don't be afraid to try new things either.
- Pick tooling that is not specific to a single editor (e.g. no Org-mode)
- Team members should pick their own editor.
- Use strictly typed (variants of) languages (e.g. TypeScript, Python with MyPy, Rust)
- Tooling, frameworks and libraries are often more important than the language.
Comments & Documentation
- Use doc comments
- Comments should describe "why", not "what". Comments may also describe "how", but only when the "how" is complicated.
- Doc comments should not repeat the function signature
- Generate documentation from function signatures
- The
README
should be focused on usage, not development, of the project. - Large projects should document the structure of the project.
- Markdown is the lingua franca for documentation.
- Outdated documentation is worse than no documentation.
Issues & Pull Requests
- Pull requests should be focused on a single change.
- Always link to the issue the PR is solving.
- Describe both the problem and the solution.
- Don't shy away from changes just because they might cause a merge conflict.
- Pull Requests should pass all tests.
- Issues should be specific (e.g. not "Refactor X and Y", but "Decouple functions X and Y")
Reviews
- Reviews should always acknowledge the work put in by the contributor and thank them, regardless of the quality of the code.
- Reviews should always assume the best intentions.
- A review should often be a discussion instead of forcing changes.
Tests
- There is no need write all tests up front.
- Add tests for every fix, so that any regression is caught.
- Test both successes and failures.
- Tests should be trivial to read and understand.
Performance
- Every claim about performance should be backed up by measurements.
Specifics
Rust
- Think in
Iterator
s unsafe
for performance reasons should be put into a separate crate exposing a safe API when possible.
HTML & CSS
- Use plain HTML & CSS for small projects.
- HTML/CSS are a compilation target in large projects.
- Use flexbox and grid.
Jupyter Notebooks
- Imports that apply to multiple blocks should go into a separate code block at the top.
- Imports that you use once can be put in the block where they are needed.
- Structure the notebook with headings (see also here).
- Long operations should be in separate code blocks.
- Running the code from top to bottom should always work.
- Don't make the code blocks too dependent on each other. Normally, you have some preparation steps (loading and preparing the data) and then some things that you tried. Ideally, those exploratory parts are only dependent on the preparation part and not on each other. This makes it easier to run specific parts without having to wait on the entire thing. If that is not feasible, then make it clear what the minimum amount of code is to run a certain block.
- Make sure that the data is described in the notebook itself (which columns are we dealing with, what data types do they have & what do they represent).
- Data science is often about exploration, so it's nice to document the things you tried that failed. You can do this by making your notebook into a narrative.
- So you describe what you tried and why you tried it, then show the code and evaluate and repeat. (Of course, do make it very clear what didn't work)
- Notebooks should be reproducible (taken from here)
- The output of code cells should not be checked into Git.
- Notebooks are purely for exploration. Code for in the data pipeline should be copied into the rest of the codebase.