Here are the steps I took:
1.Run one single-variant WGS analysis and one gene-based WGS analysis on one phenotype on about 9000 subjects as a test
2.Find out the cost after it’s done on https://console.cloud.google.com/billing (~$15 for single-variant analysis and $80 for gene-based analyses, total about $100 for one phenotype)
3.Multiple $100 by the number of phenotypes to be tested
4.Add extra budget for running interactive sessions using Jupyter Notebook, creating, and debugging workflows.
Hi Christopher,
if you are working on an image analysis project using deep learning, these are the steps I would take to estimate costs:
Here a few steps I took.
Run several smaller batches in increasing size (e.g 1, 10 ,100 samples) and then use those numbers to extrapolate cost for your proposed project understanding the larger the sample size the less accurate the initial estimates.
Always leave a buffer to make sure you don’t get stuck in the middle of job
Understand the scalability of your code and if you have certain choke points pay extra attention to their potential cost.
I'd add to:
1. Run multiple test samples, especially if using pre-emptible instances, because the cost might vary depending on how often they get pre-empted.
2. Don't forget to allocate funds for interactive analysis and also storage.
On Terra, I have also recently been using a notebook derived from the "Workflow Cost Estimator" to estimate the cost of my runs, including getting cost estimation at the task level. I think there is also a function to explore the effect of different configurations on the cost.
Christopher Erdmann
I am in the process of submitting a request for cloud credits on BioData Catalyst using the following form:
https://biodatacatalyst.nhlbi.nih.gov/resources/cloud-credits
Does anyone have any recommendations to share as I develop my justification?
Thanks!