I don't do this kind of thing any more, but back when I did, the one thing that consistently bit me was exploratory analysis requiring one-hot encoding of categorical data where you might have thousands upon thousands of categories. Take something like the Walmart shopper segmentation challenge on Kaggle that a hobbyist might want to take a shot at. That's just exploratory analysis, not model training. Having to do that in the cloud would be quite annoying when your feedback loop involves updating plots that you would really like to have in-memory on a machine connected to your monitor. Granted, you can forward a Jupyter server from an EC2, but also the high-memory EC2s are extremely expensive for hobbyists, way more than just buying your own RAM if you're going to do it often.
I think there are studies showing that one hot encodings are not as efficient as an embedding, so maybe you would want to reduce the dimensions before attempting the analysis.