Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Training LLM from scratch is a super important issue that affects the pace and breadth of iteration of AI almost as much as the raw hardware improvements do. The blog is fun but somewhat shallow and not technical or very surprising if you’ve worked with clusters of GPUs in any capacity over the years. (I liked the perspective of a former googler, but I’m not sure why past colleagues would recommend Jax over pytorch for LLMs outside of Google.) I hope this newco eventually releases a more technical report about their training adventures, like the PDF file here: https://github.com/facebookresearch/metaseq/tree/main/projec...


If you’re doing research JAX makes some sense. Probably some Google bias in there too.


To be honest, most researchers in applied ML in the bay say the opposite. If you are trying to be nimble and prototype, use pytorch. If you're trying to gain some optimizations as you near deployment, rewrite in Jax.


Interesting perspective about possible Jax optimizations. Assuming these models are trained and deployed on non-TPU hardware, are there any real advantages in using Jax for deployment on GPU? I’d have assumed that inference is largely a solved optimization for large transformer based models (with any low hanging fruits from custom CUDA code already written) and the details are shifting towards infrastructure tradeoffs and availability of efficient GPUs. But I may be out of the loop with the latest gossip. Or do you simply mean that maybe there exist cases where TPU inference makes sense financially and using jax makes a difference?


Interesting. I’ve never heard that. I could see that argument going both ways as PyTorch has the larger ecosystem and is published the most.


Where does Tensorflow stand in this?


Tensorflow has been falling behind since they stopped caring about backward compatibility. PyTorch is the leading framework. Jax is getting some traction at Google and was used to train Gemini.


Somewhere next to Theano, Mxnet or Caffe.


So, obsolete?


What about Keras?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: