mghaderi 13 hours ago

I implemented a neural network from scratch in x86 assembly (no frameworks, no Python) to recognize handwritten digits from MNIST. Feedback on performance optimizations or next steps is welcome Uses AVX-512 SIMD for parallel float32 ops (~7× faster than NumPy). Runs in a lightweight Debian Slim Docker container. The goal was to understand neural networks at the CPU level.

  • checker659 13 hours ago

    > ~7× faster than NumPy

    Is that on the CPU (not sure if NumPy has a GPU backend)

    • mghaderi 13 hours ago

      Yes CPU same resources And same implementation