14 votes

Can I have some advice on the neural net I've been working on?

Apologies if this isn't an appropriate place to post this.

Inspired by a paper I found a while back (https://publications.lib.chalmers.se/records/fulltext/215545/local_215545.pdf), I tried my hand at implementing a program (in C#) to create ASCII art from an image. It works pretty well, but like they observed in the paper, it's pretty slow to compare every tile to 90-some glyphs. In the paper, they make a decision tree to replicate this process at a faster speed.

Recently, I revisited this. I thought I'd try making a neural net, since I found the idea interesting. I've watched some videos on neural nets, and refreshed myself on my linear algebra, and I think I've gotten pretty close. That said, I feel like there's something I'm missing (especially given the fact that the loss isn't really decreasing). I think my problem is specifically during backpropagation.

Here is a link to the TrainAsync method in GitHub: https://github.com/bendstein/ImageToASCII/blob/1c2e2260f5d4cfb45443fac8737566141f5eff6e/LibI2A/Converter/NNConverter.cs#L164C59-L164C69. The forward and backward propagation methods are below it.

If anyone can give me any feedback or advice on what I might be missing, I'd really appreciate it.

6 comments

  1. [2]
    sparksbet
    Link
    ...wow, this is giving me flashbacks to when I was a TA in grad school. Building a simple feed-forward network with backpropagation was a big assignment that a ton of people struggled with (and...

    ...wow, this is giving me flashbacks to when I was a TA in grad school. Building a simple feed-forward network with backpropagation was a big assignment that a ton of people struggled with (and which I struggled to grade). And they were coding in Python, not C#.

    I don't have time atm to give this enough attention to where my advice would be useful or I'd spot any mistakes. But good luck with it, I hope someone else can give suitable advice. Coding a neural net from scratch is tough work.

    5 votes
    1. a_sharp_soprano_sax
      Link Parent
      I can definitely understand why the students had difficulties! I didn't have the opportunity to take the machine learning class back when I was in college, but it's definitely something I would've...

      I can definitely understand why the students had difficulties! I didn't have the opportunity to take the machine learning class back when I was in college, but it's definitely something I would've like to take. Thanks!

      2 votes
  2. [3]
    archevel
    Link
    I had a quick peek at the code and didn't spot any obvious errors. Given a 10 minute review is probably not enough. Some general advice to narrow down the problem: Use a simpler problem to verify...

    I had a quick peek at the code and didn't spot any obvious errors. Given a 10 minute review is probably not enough. Some general advice to narrow down the problem:

    • Use a simpler problem to verify your training algorithm. E.g. train an xor network or a simple categorization network.
    • Remove the parallelism. You likely could do this without a lot of effort by reducing the number of parallel batches to 1.

    Finally, consider using a preexisting .net library for training NNs or consider using a non .net library like pytorch, keras or some such more widely used training tool and find a compatible .net library that can load the weights after training. Writing your own backprop is only really useful to understand the algorithm, so of that's your goal then go at it! If you goal is to use a NN to solve your actual problem, then it's likely a waste to implement the training algo (unless you enjoy it of course).

    5 votes
    1. [2]
      a_sharp_soprano_sax
      Link Parent
      Making the problem smaller is a good idea, if I reduce the number of ASCII glyphs to 2 I could probably even handwrite it to verify. Originally I was using ML.NET (the standard dot net machine...

      Making the problem smaller is a good idea, if I reduce the number of ASCII glyphs to 2 I could probably even handwrite it to verify.

      Originally I was using ML.NET (the standard dot net machine learning library), but I didn't know what I was doing so I decided rolling my own would be a good way to learn. I understand the intuition well enough now that I could go back to it, but I'm committed to getting it right now. Plus it's just a personal project, so time's not an issue.

      Thank you!

      3 votes
      1. archevel
        Link Parent
        Good luck! I've done this a couple of times now (in Java and octave/mathlab). It is a great way to get an intuition about neural networks for sure.

        Good luck! I've done this a couple of times now (in Java and octave/mathlab). It is a great way to get an intuition about neural networks for sure.

        2 votes
  3. krellor
    Link
    I don't see an obvious error. That said, I have a few suggestions; if you are worried you aren't implementing the math or algorithm correctly, start with a simplified version of the code with...

    I don't see an obvious error. That said, I have a few suggestions; if you are worried you aren't implementing the math or algorithm correctly, start with a simplified version of the code with fewer abstractions and no parallelization, and train and test on trivial data sets. Write out the mathematical calculations (sums of products, etc) and crosswalk the portion of the equations to the code that implements them.

    I haven't written any NN's from scratch for a number of years, and that was in C. But I remember it being helpful to start with a really simple implementation using basic arrays that I could refactor after I had it working and validated. If there is an error, I suspect you are overlooking it due to some of the syntactic sugaring of C# or the parallelization. Once you have a working version, you can refactor one piece at a time, testing to make sure you don't introduce an error.

    As far as the parallelizing goes, your strategy here looks sound: batching up the data, gathering the weights across the batches, and then using a reduce step to average the gradients within the epoch. I know you have set everything to a single thread, so it's not some subtle issue runtime-wise with multiple threads. However, I just can't help but think some of the syntax to support it is hiding something non-obvious.

    Good luck!

    3 votes