Research projects¶

There are some great resources on good data organization, such as the OSF’s guide. Here, I’ll document the aspects of my protocols that work well.

Main ideas¶

✅ DO: Designate data as either living or frozen. Code should never read from living files and should never write to frozen files.

✅ DO: Indicate manually curated / edited data as such. When ready, make a frozen copy and add a timestamp. Let scripts read the frozen copy.

❌ DO NOT: Have scripts read living data.

❌ DO NOT: Just organize into input/ and output/ directories.

✅ DO: Store basic machine-readable metadata alongside the data. (Don’t store information that can change, such as comments, project names, or people’s names.)

✅ DO: Reorganize your files if your current structure isn’t working.

Example¶

├── src/
│ └── pkg/
│ ├── __init__.py
│ ├── generate_raw_name_mapping.py
│ └── analyze.py
├── data/
│ ├── temp-output/
│ ├── raw-name-mapping.tsv
│ └── figures/
│ ├── living/
│ │ ├── name-mapping.tsv
│ │ └── figures/
│ │ │ └── samples.pdf
│ └── frozen/
│ ├── name-mapping-2022-01-14.tsv
│ ├── microscopy/
│ │ └── 20220615T164555.confocal3
│ │ ├── 20220615T164555.confocal3.tif
│ │ └── 20220615T164555.confocal3.json
│ └── reference/
│ └── weird-sample-analysis-2022-01-24.ipynb
│ └── weird-samples-2022-01-24.pdf
└── README.md

Here, temp-output/ and living/ both contain living data. temp-output/ can be overwritten any time, while living/ contains manually curated files. You should probably avoid checking output/ into a repository.

How the layout is used¶

src/pkg/generate_raw_name_mapping.py outputs data/temp-output/raw-name-mapping.tsv. Maybe it maps microscope filenames to sample names, but that mapping can’t be fully automated. So, we copy to data/living/name-mapping.tsv and edit it. When we’re done, we copy it to data/frozen/name-mapping-2022-01-14.tsv.

We then run src/pkg/analyze.py, which outputs figures to data/temp-output/figures/. They need a little tweaking, so we copy each to living/figures/ and align labels, etc. Or maybe we combine several figure panels into full figures. We want to keep this exact analysis, so we copy it to data/frozen/reference/. And we write a script to generate that figure (in this case, a Jupyter notebook). The script writes to data/temp-output/ (not data/frozen).

JSON metadata¶

What’s in 20220615T164555.confocal3.json?

{
  "id": "20220615T164555.confocal3",
  "instrument": "confocal3",
  "instrument-type": "microscope",
  "datetime": "2022-06-15T16:45:55",
  "duration-ms": 185.225075,
  "roi": {
    "type": "rectangle",
    "values": {
      "top": 4101,
      "bottom": 4358,
      "left": 528,
      "right": 744
    }
  }
}

Research projects¶

Main ideas¶

Example¶

How the layout is used¶

JSON metadata¶

Further information¶