I read a paper- Best Practices for Scientific Computing-today. This article makes me impressed. The authors gave me a lot of informations and guidelines for practices in data science. They said: "However, while most scientists are careful to validate their laboratory and field equipment, most do not know how reliable their software is." Here is their summary of best practices: 1. Write programs for people, not computers. (a) A program should not require its readers to hold more than a handful of facts in memory at once. (b) Make names consistent, distinctive, and meaningful. (c) Make code style and formatting consistent. 2. Let the computer do the work. (a) Make the computer repeat tasks. (b) Save recent commands in a file for re-use. (c) Use a build tool to automate workflows. 3. Make incremental changes. (a) Work in small steps with frequent feedback and course correction. (b) Use a version control system. (c) Put everything that has been created manually in version control. 4. Don’t repeat yourself (or others). (a) Every piece of data must have a single authoritative representation in the system. (b) Modularize code rather than copying and pasting. (c) Re-use code instead of rewriting it. 5. Plan for mistakes. (a) Add assertions to programs to check their operation. (b) Use an off-the-shelf unit testing library. (c) Turn bugs into test cases. (d) Use a symbolic debugger. 6. Optimize software only after it works correctly. (a) Use a profiler to identify bottlenecks. (b) Write code in the highest-level language possible. 7. Document design and purpose, not mechanics. (a) Document interfaces and reasons, not implementations. (b) Refactor code in preference to explaining how it works. (c) Embed the documentation for a piece of software in that software. 8. Collaborate. (a) Use pre-merge code reviews. (b) Use pair programming when bringing someone new up to speed and when tackling particularly tricky problems. (c) Use an issue tracking tool. Reference: Wilson G, Aruliah DA, Brown CT, Chue Hong NP, Davis M, Guy RT, et al. (2014) Best Practices for Scientific Computing. PLoS Biol 12(1): e1001745. https://doi.org/10.1371/journal.pbio.1001745 Data science
0 Comments
|
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |