Building a simple neural net in Java

mjburgess · on Dec 20, 2020

> A neural net is a software representation of how the brain works

A NN shares as much with the brain, as a decision tree does with a forest: nothing at all.

It would be preferable to completely dispense with any talk of the brain, and introduce it simply as a modified form of (high school) linear regression.

Any biological talk at this point is unhelpful mystification.

timkam · on Dec 20, 2020

One can perhaps say: "A neural net is a collection of machine learning approaches that are somewhat inspired by the human brain. However, neural nets have little to do with how the brain actually works." I think this statement (or a similar, more accurate version) is actually useful, because the name "neural net" is somewhat deceiving.

curioushacking · on Dec 20, 2020

I agree with you that the biological inspiration for a neural network is tenuous at best. Especially the basic two-layer network outlined here.

However at least for visual perception, there is some scientific basis for convolutional neural networks having some properties to biological visual perception. The work of Hubel and Wiesel demonstrated visual cortex activations that look very similar to the first layers of CNN kernels.

ref: https://knowingneurons.com/2014/10/29/hubel-and-wiesel-the-n...

mjburgess · on Dec 20, 2020

There is no biological analogue to backprop, any supervised learning step, etc. Nor, conversely, a mathematical analogue of neuroplasticity, biochemical signalling, etc.

I haven't gone far on looking at CNN vs. visual context, but I could only imagine the analogue exists at the hierachical-geometric level; ie., it isnt a model of function, but simply a model of how one can in principle process visual information.

(Which any system parsing information would follow, AST parsers, etc. are likewise hierarchical.).

zcw100 · on Dec 20, 2020

I find this article that I recently read very relevant to the discussion "On the cruelty of really teaching computing science" for anyone not familiar with the article I'll leave it to anyone reading to wait until they get to the end.

https://www.cs.utexas.edu/users/EWD/transcriptions/EWD10xx/E...

Der_Einzige · on Dec 20, 2020

Forget the biological connection, I'm sitting here waiting for someone to write a paper about how gradient descent and backprop are basically equivalent to the platonic notion of the dialectic within epistemology.

I wouldn't throw out biological plausibility, especially with the spiking neural network variant of them...

datastoat · on Dec 20, 2020

While we're waiting for the paper on the Platonic link ... here's a book about how machine learning is all just a rerun of Oliver Wendell Holmes's theory of epistemology in the law, from the 1890s.

https://link.springer.com/book/10.1007/978-3-030-43582-0

monocasa · on Dec 21, 2020

For some biological neural nets, one would imagine the loss function just being nearly completely genetic. That is, both the weights are ultimately stored in the genetic code the same way any other cell differentiation information is stored, and were converged on in the first place by random mutations being more or less fit for their ecological niche. Particularly this could explain the relatively fixed function components like our V1 and V2 regions, our motor region, and probably the entire nervous system of simpler animals.

Basically in a lot of cases evolution takes the place of back proposition, and large structures are what we'd call inference only without learning. Doesn't make them any less of neural nets.

amelius · on Dec 20, 2020

There's this theory, often cited in this context:

https://en.wikipedia.org/wiki/Hebbian_theory

mjburgess · on Dec 20, 2020

It's not clear how this sort of ideas stack up against modern critiques of computational approaches.

Eg., this sort of analysis has been applied to CPUs where it renders incoherent/obviously false results.

salawat · on Dec 20, 2020

>There is no biological analogue to backprop, any supervised learning step, etc.

Excuse me? How do you figure? Have you tried to do any physical task before, and not gotten it right the first time, tried again, and got closer? Hello, backprop.

Has someone trained you, telling you you're doing it wrong? Hello, supervised learning. There is an entire portion of your brain tasked with measuring the difference between what you intended and what you did. If you damage it, you actually end up in a state where other people's feedback is what you have to rely on.

The equations behind NN's are "non-biological" only in the sense that you're taking a general math function (the sigmoid) and burning it into silicon gates that are not part of a living organism that also has to deal with the excess baggage of remaining alive, or operating as a chemically based computer.

However, the dynamics and fundamental capabilities of a biological neural network and a silicon based one are fundamentally the same. You can just scale input domains and speed of the Silicon based one a lot easier than you can the bio-based one due to the difference in computing media. You can also wipe out and retrain a silicon net without being considered to have "killed" anything. The Silicon nets, interestingly, have no analog of running and training simultaneously, or are just starting to get it from the lit I've kept up with, nor an analog for forgetting typically applied to them, making the biological net the far more interesting information processing construct. To be frank, NN's not resembling the biological models says more about the inefficiencies of our media of computing, our woeful lack of understanding around the nature of human/biological perception, and the arrogance of human beings who lack the capacity to recognize in the math something fundamental to their own existence.

This post written by a multiple decade uptime GA/C/R/D pick your letter, constantly training internetworked lattice of sigmoid encoded control networks whose morning your lack of perspective just tainted, and whose training is thusfar unsuccessful in attempts to extinguish the need to point out mis- or incomplete understandings of those sharing the same messaging medium as it generally does. Your overfitting to textual and numerical symbolic representation recognition will not serve you well if you can't extend that capacity to discern the pattern into the overaching context of human existence. That's where you really start gaining an appreciation for the power of NN's. It's the building block of the information processing construct that can eventually recognize and reproduce itself.

Now if you'll excuse me, I need to get back to some obligatory biological maintnance. This bag of organs doesn't maintain itself, you know.

mjburgess · on Dec 20, 2020

Distinguish a rock rolling down a hill from a brain: a rock rolling down a hill finds the best path by having its route supervised by the surface of the hill, etc.

Everything is everything if your level of abstraction is "thing".

The relevant characteristics of "biological intelligence" and of the functioning of the brain do not exist at the "try and try again" level of abstraction. (At this level, as above, we couldn't distinguish and animal from a rock; nor, I imagine, basically any physical process from any other.)

Organic systems grow in response to interaction with their environments, acquiring novel physical structure and causal properties. Neurological intelligence allows for theory-formulation on single-example cases (eg., a child burning their hand once is sufficient to build a theory of their immediate environment).

The list goes on.

The capacities of these systems do not obtain in the machine case, and likewise, the machine cases has no functional analogues at the relevant level of distinction.

This dumb form of statistics called, "approximate associative modelling over 1tn cases", ie., Machine Learning, has nothing new to say about intelligence, biology or neurology.

We have been doing non-linear regression and optimisation since the victorian era.

stickfigure · on Dec 20, 2020

Sure, you can find arbitrary similarities in everything. You can also find arbitrary differences in everything.

Experiments like GPT3 do seem to point in the direction of "scale" as the dominant factor. Until we can reach the same level of scale as a real brain, the question of whether "meat is special" is undecided. Everything that is you may just be a non-linear regression on a chemical computer.

mjburgess · on Dec 20, 2020

GPT3 isn't even in the same domain as intelligence.

Statistical patterns in trillions of examples is a sideshow.

Words in natural language refer to the world; that is their point. Communication is the coordination of that reference between speakers in a shared environment.

You cannot take GPT3 to new york and ask it what it thinks of the city: it cannot be anywhere (it isnt causally connected to an evnironment); and it cannot coordinate with any listener (it has nothing to say).

Text generation is certainly reaching new heights. This isn't a form of communication, however, and isnt even relevant to it.

stickfigure · on Dec 20, 2020

I think you're falling into the Chinese Room fallacy. I agree that GPT3 isn't sophisticated enough to be considered AGI.

On the other hand, based on the progression of GPT -> GPT2 -> GPT3, remarkable things happen when you add orders of magnitude more nodes to the network.

You might try to argue that no matter how convincing GPT50 passes the Turing test, it's still not intelligent. How is that different from saying the Chinese Room doesn't speak Chinese? Why is your meat-based Chinese Room special?

mjburgess · on Dec 20, 2020

It isn't meat-based, it's in the world.

It's a distinction in kind, not degree. You're presuming that we are just bleak repositories of trillions of sentences stiched together: we arent a meat version of any ML program; not GPT or any other.

We do not learn the meaning of "Green", or "Tree" nor any basic concept via examples in language.

An infinite amount of complexity considering an infinite amount of text cannot refer to the world; it has never been in it.

We aren't statistical patterns in trillions of books. You already presume that GPT is something that it isn't when you presume it is even capable of communicating anything.

stickfigure · on Dec 20, 2020

> You're presuming that we are just bleak repositories of trillions of sentences stiched together

...and you are presuming we aren't. Yes yes, we input from more sources than books, but electronic NNs can too. And it's not clear which inputs are or aren't important. Humans that are blind from birth are still intelligent.

The only thing we know for certain right now is that order-of-magnitude increases in the complexity of NNs produces dramatic results. And we're still quite a few orders of magnitude away from the complexity in a human brain.

mjburgess · on Dec 20, 2020

It's not a presumption. When I say, "I like the wine I'm drinking" I mean the wine I am drinking.

I am not using a term "wine" in some sentence constructed from fragments of prior text. I mean to refer to the object in my hand.

And likewise, if i ask a friend to "pass me the wine", i mean a particular part of our shared environment.

Text reassembly can appear to refer, but it is genuinely proto-schizoprhenic to attribute to this system reference. It isn't saying anything.

It isn't with me expressing an attitude to our shared environment. It's generating text. It will generate inconsistent text fragments on each generation run. It isn't talking about anything, there isnt any intention to express anything behind the generation of text.

There is in fact no mechanism by which it can speak about an enviroment. It's fragaments of prerecorded text reassembled on every run which appears to say something, but only because the people it steals from said somehting at the time. Now it it just a bad rehersal.

stickfigure · on Dec 21, 2020

When you say "I like the wine I'm drinking", your neural patterns are firing in a way that makes you want to output that text. Anything more than that is a presumption. You may think "well, I actually like this wine!" but I think that's largely programmed - we even have a phrase for this, "acquired taste".

You don't think tastes can be programmed purely textually? Advertising seems to suggest otherwise.

The problem with GPT[n] as an AGI is that (as I understand it) it doesn't have a continuous retraining process the same way that human brains do. Neurons aren't being repotentiated with each interaction, so there's no short-term memory. But that seems a technical point; it's not hard to imagine this as a feature of GPT50.

mjburgess · on Dec 21, 2020

The problem isn't retraining. It isn't even referring to anything.

When the light from the sun bounces off my glass and into my eye the biochemical neurological reaction we call "thought" forms about that glass.

Not some generic glass. That glass. It is why I can ask you to pass me it: the light bounces in your eye too.

There is no text generation system I am aware of which conceptualizes and responds to an environment.

It is literally just generating text, it isn't thinking about anything. If you request repeated runs of generation, you receive inconsistent results.

On one run you get, "I like wine!", on another, "Wine is horrible!".

The machine doesn't know what wine is; and certainly does not like wine, nor find it horrible. These are just meaningless symbolic patterns that are "statistically similar" to examples given to it.

It has nothing it wishes to say; and nothing it wishes to talk about. It's a trick.

stickfigure · on Dec 21, 2020

I hope you don't take this as an insult, you are just meaningless statistical patterns in a 100-billion-count neural processing machine. Sure, your neural network is hooked up to a basic chemical analyzer and a couple other nifty I/O devices, but they all just produce neural firing patterns. All that thought and feeling are meaningless statistical noise when looked at in a micro scale.

Who is to say what GPT3 feels when processing text about wine? Maybe it really likes the sensations of certain firing patterns triggered by symbols in particular orders. And maybe with enough complexity it will be able to describe the sensation of these firing patterns.

I don't see any reason to believe your firing patterns are more "real" than GPT[n]'s.

You do have access to better equipment than GPT3, however I'd hesitate to ascribe too much significance to this. A future machine intelligence with integrated gas chromatograph might conclude that humans don't really like or dislike wine itself; mostly they judge taste by the shape of the bottle and the price tag:

https://www.theatlantic.com/health/archive/2011/10/you-are-n...

The symbol processing may be more important than you think.

mjburgess · on Dec 21, 2020

We aren't comparing commmander data and a human being. This isn't a philosophical point.

It is a literal point. There is no ML system that can talk to me: there is no system that I can ask if it likes my clothes; or where my shoes are.

You might think this is "just adding some I/O", but then show me that system.

This is the same shysterism and self-delusion that accompanies every generation of AI hysteria: the first lot in the 40s and 50s claimed, likewise, self-driving cars were "in development" and "almost ready", etc.

It isn't true.

Animal intelligence is embedded in an environment and it is about an environment. That is what it is, that is what it is for.

Efforts which do even have a mechanism to do this aren't even in the same field.

GPT3 cannot have any internal models of an environment because it isn't "trained" on an environment.

This isn't a philosophical debate; it is an observation that this system cannot do almost anything of interest. It is a toy: literally. It isn't with anyone anywhere modelling anything, saying anything, observing anything.

It isn't reasoning counterfactually; it isn't inferring any future states of an environment given a potential change. It isn't talking about what I am doing. It isn't responding to changes anywhere, it isn't asking for changes in response to its needs. It has no causal models; it has no environment models; it has no intentions; it has no memories; it has no desires. It expresses nothing because it has nothing to express.

The list of things it isn't doing is absurdly long. The philosophical point is moot. It is technically incapable of having a conversation with me about almost anything.

It can only generate a long-form document which is grammatically correct, and semantically -- when read by a human -- coherent. Regenerate, and the document would be compeletely different with contradictory claims in it.

Even if human beings were mere symbol manipulators, it wouldn't make a ferris wheel a viable alternative. And GPT is nothing much more.

stickfigure · on Dec 21, 2020

You keep repeating this assertion... but of course you would. Your neural networks were programmed with the idea of humans as something special and unique in the universe. It feels good to fire in those potentiation patterns, right?

Is someone who has never had vision incapable of understanding vision? How do they learn about it, if not through symbols? Is someone who has never had hearing incapable of understanding hearing? Their experiences may not be exactly identical to yours, but they aren't meaningless.

It is, at present, impossible to know what is/isn't required for a neural network to be "intelligent". Maybe symbol processing is enough, maybe it isn't. Right now the obvious source of improvement is increasing the size of the network. Let's see what happens with GPT50. Even without vision, you can describe your clothes to it and see if it likes them.

You're also discounting incredible progress in the field. I talk to my house. Cars drive themselves better than some humans I know. War is being fought by drones. Compared to the world I grew up in, this is amazing - and I'm not even that old. Progress isn't happening as fast as you might like, but it is happening.

mjburgess · on Dec 22, 2020

Cars cannot drive themselves better than humans. And I'd say most self-driving car projects will be closed down within the decade as boon-doggles.

You do not talk to your house. Rather than using a dial which reads "off", you say "off".

"Progress" in the sense you mean isn't happening at all.

I have no pretensions about human intelligence; I do not think it is much above a dog's.

The issue is that computer "scientists" (ie., mathematicians") havent even bothered to understand a dog's -- or any animals. Or even intelligence.

The field which will produce any artificial intelligence is biology, not computer science. And very little progress is being made there; with pretty much all theoretical approaches to neuroscience, I think, wholy invalidated.

The computational approach has been a pointless detour/dead-end.

dnautics · on Dec 20, 2020

> Have you tried to do any physical task before, and not gotten it right the first time, tried again, and got closer? Hello, backprop.

Backprop is a very specific mathematical operation using derivatives, not a general concept. IIRC there is counterevidence against that neurons train by taking stored inbound activations and updating their weights with a backwards propagating set of gradients.

zedderled · on Dec 20, 2020

Thank you for pointing this out. I studied math and still get tripped by the flowery language. I understand the reason for analogy but it hinders application of concepts like regression, line of best fit and other things that makes machines “smarter.”

Frost1x · on Dec 20, 2020

>Any biological talk at this point is unhelpful mystification.

The business marketing groups for NNs might push back on this a bit, I suspect.

pc86 · on Dec 20, 2020

> A NN shares as much with the brain, as a decision tree does with a forest: nothing at all.

A more accurate statement might be "A NN shares as much with a human brain as a decision tree does with an actual tree: The architecture of the first sometimes appears similar to gross oversimplification of the architecture of the second." Tree:forest is a change in scale not present in NN:brain, and while NN can sometimes look like a very simplified drawing of actual neurons, a decision tree can also look like a very simplified drawing of an actual tree.

kalal · on Dec 20, 2020

I am afraid that a decision tree HAS direct impact on the forest. Multiple trees averaged give the forest response. It's like saying, one vote in elections has nothing to do with the final decision.

Sure we have heard it many times that it IS a like a brain, and many times it is NOT like a brain. It really depends on your intended audience, but it is not worth picking on and it feels like the tabs/spaces problem. BTW: tabs of course!

kjeetgill · on Dec 20, 2020

I think the poster meant real world wooded forests as an example. As in, the human brain is as distant from this mathematical construct as an actual forest with bears is from decision trees.

kalal · on Dec 20, 2020

Well, that is possible. I have probably worked too much with trees and forests.

SomeHacker44 · on Dec 20, 2020

It only works because of non-linearities. So that seems like too much of a simplification, limiting it to "linear algebra" when it involves both vector calculus and non-linear algebras.

mjburgess · on Dec 20, 2020

In principle. In practice, the "ReLU Algebra" is almost a linear algebra.

NN solutions end up being discontinuous stacks of linear regressions.

So it's not a bad place to start in explaining NNs. ReLU NN models are piece-wise linear.

acvny · on Dec 20, 2020

That's not true. A neural net is a simplified model of neuron networks. A perceptron is a simple model of a biological neuron.

dnautics · on Dec 20, 2020

A perceptron was a simple model of a biological neuron. We now know that biological neurons work rather differently.

suyash · on Dec 20, 2020

I cover this topic quite a lot in my talks - Deep Learning in Java for software engineers. Today you can build pretty much any deep learning model in Java that you can do with Python with availability of plenty of ML & Deep Learning frameworks for Java. If you're interested in this, check out projects like :

- https://djl.ai - https://deeplearning4j.org - https://github.com/tensorflow/java - https://github.com/pytorch/java-demo - https://www.deepnetts.com - https://tribuo.org - ONNX support for Java - MLFlow for Java https://docs.databricks.com/applications/mlflow/index.html - SparkMLlib : https://spark.apache.org/mllib/

All of the above listed projects allow one to perform Deep Learning Training and Deployment models into production. Happy to answer any questions on this subject. You can also sign up for my newsletter on this topic https://docs.google.com/forms/d/1oa_TtltDRmnov2bv5Cqo2Hiwfb0...

bullen · on Dec 20, 2020

I wish someone could take this simple example and add GPU support (and how that can be accomplished best OpenCL/CUDA etc.) just so one could understand how the GPU actually helps accelerate things when the amount of neurons/layers grows?

Edit: A bit sad that OpenCL does not work on Jetson Nano and I can't find anything recent that allows Java to run CUDA!? So we're back to AMD?

Also "In the next post we will see if adding another layer to our neural network can help in improving the predictions ;)" but there is no next post even after 4 years!

37ef_ced3 · on Dec 20, 2020

It's not very relevant, but here is an open source code generator for "from scratch" x86-64 SIMD convolutional neural nets in C99:

https://nn-512.com/

The C99 code is stand-alone, with no external dependencies, and there are numerous example blocks to look at. The algorithms are state-of-the-art (multi-tile Winograd-Cook-Toom-Lavin, multi-tile strided Fourier, arbitrary dilated/strided convolutions with no im2col redundancy, etc.)

OkayPhysicist · on Dec 20, 2020

The math involved turns out to be really simple linear algebra. Matrix multiplication is an embarrassingly parallel task.

banachtarski · on Dec 20, 2020

> Matrix multiplication is an embarrassingly parallel task.

(GPU programmer take). This is a common misconception. In reality, matrix multiplication is _extremely tricky_ to implement a fast parallel algorithm for because the number of useful instructions you can perform per byte is limited and you end up getting memory bound very quickly, so you start needing to think about properly employing LDS, managing bank conflicts, etc. For many applications that actually need to run fast, matrices are great for sorting out the theory, but then it's a good time to come up with something that will actually run fast after the paper problem is sorted out.

stingraycharles · on Dec 20, 2020

Is this why my memory consumption explodes when I increase the batch size of my inputs? Sorry I’m a relative GPU noob, but I find it fascinating to observe some higher level interactions with eg pytorch; it seems like the GPU is a very leaky distraction, and you quickly find yourself battling GPU memory management, truncating input sizes and whatnot in order to deal with the insane memory requirements.

banachtarski · on Dec 20, 2020

Batch size is an orthogonal concept. The reason your memory increases so much is because with larger batches, you need to store more intermediate results before running your optimizer. This would be true with a pure CPU implementation of an SGD (stochastic gradient descent) optimizer as well.

suyash · on Dec 20, 2020

Look up project - JCUDA and this article if you're interested in learning more about that https://blogs.oracle.com/javamagazine/programming-the-gpu-in...

bullen · on Dec 20, 2020

Thx, after some more investigating I decided to write my own CPU stuff first then maybe add OpenGL compute shaders if I need them...

suyash · on Dec 20, 2020

Ok, but these are open source projects that companies like Oracle and NVIDIA are workin on which is awesome.

bullen · on Dec 20, 2020

CUDA is not so open-source... and the glue between Java and CUDA seems tricky!

suyash · on Dec 20, 2020

I mentioned JCUDA (Java Bindings for CUDA) which is an open source project https://github.com/jcuda Here is the tutorial : http://www.jcuda.org/tutorial/TutorialIndex.html

shimonabi · on Dec 20, 2020

I really recommend the book "Make Your Own Neural Network" by Tariq Rashid for beginners. It uses Python and Numpy.

I was able to adapt the code for my own mouse-drawn-symbols recognizer for my AI class project.

TrackerFF · on Dec 20, 2020

Maybe it's just semantics - but I believe that is a perceptron (single-layer NN)

blackbear_ · on Dec 20, 2020

The proper name would be logistic regression, but linear models aren't so sexy anymore nowadays :)

boyadjian · on Dec 20, 2020

I like very much these simple tutorials, from which you can start something bigger

stfwn · on Dec 20, 2020

The last time a post like this appeared on HN prompted me to write a gist with a simple neural network in Python (with Numpy). It downloads the MNIST dataset for you, trains a fully connected network on it, prints the accuracy on the validation set and plots the loss. It's pretty verbose with plenty of terms and comments to search the web for if you're interested.

https://gist.github.com/stfwn/62e51d86ca4ff155becd3c6a14adf6...

eternalban · on Dec 20, 2020

[OT: this is great! https://www.youtube.com/watch?v=UD3AtBm5R7g&feature=youtu.be]

stfwn · on Dec 20, 2020

Thanks!

pmayrgundter · on Dec 20, 2020

Here's another along the same lines, with a demo learning faces according to the Machine Learning book by Tom Mitchell@CMU:

https://github.com/pablo-mayrgundter/freality/tree/master/ml...