Estimating Cloud Credits? : BioData Catalyst

C

Christopher Erdmann

started a topic over 4 years ago

I am in the process of submitting a request for cloud credits on BioData Catalyst using the following form:

https://biodatacatalyst.nhlbi.nih.gov/resources/cloud-credits

Does anyone have any recommendations to share as I develop my justification?

Thanks!

D

Dandi Qiao

said over 4 years ago

Here are the steps I took:

1.Run one single-variant WGS analysis and one gene-based WGS analysis on one phenotype on about 9000 subjects as a test

2.Find out the cost after it’s done on https://console.cloud.google.com/billing (~$15 for single-variant analysis and $80 for gene-based analyses, total about $100 for one phenotype)

3.Multiple $100 by the number of phenotypes to be tested

4.Add extra budget for running interactive sessions using Jupyter Notebook, creating, and debugging workflows.

P

Pietro Nardelli

said over 4 years ago

Hi Christopher,

if you are working on an image analysis project using deep learning, these are the steps I would take to estimate costs:

Identify the biggest batch size that can be used for your training/validation on a V100 GPU.
Identify the correct instance where you will run your analysis based on the CPU memory you require. Also, take into account that instances with more than one GPU will cost more.
Estimate the required time of each epoch by running your training/validation for a few epochs.
Multiply the time required to run a single epoch by the number of epochs you will run your training.
Understand the scalability of your code and take into account the trial & error process required at the beginning to improve your code, refine your neural network and choose the parameters.

Finally, I would add some extra budget for testing and for Jupyter Notebook sessions.

H

Harrison Brand

said over 4 years ago

Here a few steps I took.

Run several smaller batches in increasing size (e.g 1, 10 ,100 samples) and then use those numbers to extrapolate cost for your proposed project understanding the larger the sample size the less accurate the initial estimates.
Always leave a buffer to make sure you don’t get stuck in the middle of job
Understand the scalability of your code and if you have certain choke points pay extra attention to their potential cost.

J

Jean Monlong

said over 4 years ago

I'd add to:

1. Run multiple test samples, especially if using pre-emptible instances, because the cost might vary depending on how often they get pre-empted.

2. Don't forget to allocate funds for interactive analysis and also storage.

On Terra, I have also recently been using a notebook derived from the "Workflow Cost Estimator" to estimate the cost of my runs, including getting cost estimation at the task level. I think there is also a function to explore the effect of different configurations on the cost.