Found this the other day and wanted to share it here - doing automatic differentiation on code that has already been processed and optimized by the compiler is an amazing idea, and appears to work...
Found this the other day and wanted to share it here - doing automatic differentiation on code that has already been processed and optimized by the compiler is an amazing idea, and appears to work quite well.
Since it's as on topic here as it's ever going to be, does anyone know of research on or have an opinion about the limits of automatic differentiation when applied to programs in the general case?...
Since it's as on topic here as it's ever going to be, does anyone know of research on or have an opinion about the limits of automatic differentiation when applied to programs in the general case? As in, what kind of models can be usefully trained using AD and what is their theoretical capability in terms of computability theory? My intuition says we can't use gradients to train things that are more advanced than propositional logic. FOL already involves considered-untractable structure search that gradients are not (imo) useful for. From the lense of formal grammars / automatons, it seems reasonable to assume we can train FSMs, maybe even PDAs (doubtful imo), but we're definitely hosed if we want to train Turing Machines. (Perhaps I should clarify/generalize: What I mean is that it's possible/impossible, using gradients, to train a model to perform the same tasks as such an automaton, i.e. accept/reject words from the respective grammar. That means I think we can not use AD to find a model that accepts Type-0 grammar, but Type-3 sounds very doable. In between I'm unsure, and I would like to know for sure whether my intuition here is right.
Relatedly, I can't seem to decide whether to use logical calculi to quantify the (theoretical) power of a model or grammars/automatons. If someone could help me square those away, that would be cool too.
Found this the other day and wanted to share it here - doing automatic differentiation on code that has already been processed and optimized by the compiler is an amazing idea, and appears to work quite well.
Depending on whether this covers ALL llvm code, that would mean auto diff for all languages that compile through llvm — including C/Cpp!!
There is actually a thread somewhere in which the creators tried it on Rust, and it appeared to work!
Edit: see here and here.
Since it's as on topic here as it's ever going to be, does anyone know of research on or have an opinion about the limits of automatic differentiation when applied to programs in the general case? As in, what kind of models can be usefully trained using AD and what is their theoretical capability in terms of computability theory? My intuition says we can't use gradients to train things that are more advanced than propositional logic. FOL already involves considered-untractable structure search that gradients are not (imo) useful for. From the lense of formal grammars / automatons, it seems reasonable to assume we can train FSMs, maybe even PDAs (doubtful imo), but we're definitely hosed if we want to train Turing Machines. (Perhaps I should clarify/generalize: What I mean is that it's possible/impossible, using gradients, to train a model to perform the same tasks as such an automaton, i.e. accept/reject words from the respective grammar. That means I think we can not use AD to find a model that accepts Type-0 grammar, but Type-3 sounds very doable. In between I'm unsure, and I would like to know for sure whether my intuition here is right.
Relatedly, I can't seem to decide whether to use logical calculi to quantify the (theoretical) power of a model or grammars/automatons. If someone could help me square those away, that would be cool too.
This is awesome... that CUDA-clang support woulda saved me a year of my PhD!