Citrate Part 1: My First Rust Project#

In this blog post, I want to announce Cerebral a framework written in Rust for experimenting with Self-organizing Neural Networks such as the famous Kohonen networks. It is designed in a way that a network can be defined by composing a model out of a variety of algorithm that define its behavior. Unlike deep learning architectures, the behavior is not defined by a composition of layers, loss functions and diverse learning methods. This type of network defines a less hierarchical topology and learning rules affect all neurons at once. Therefore, this project has different goals as prominent deep learning frameworks, and thus, a different architecture. Being my first Rust library (applications are simpler to implement), I had to evaluate a few options to achieve my goal and I often failed enough battling with Rust’s occasionally frustrating type system.

I started this journey already a good time ago last year, but only now found the time to finally start documenting my progress and experience. In the meanwhile, I even started a new project—Potpourri—which I will write about in parallel despite having started only a couple of months ago.

This article is the first in a series in which I will explain the process of designing and implementing Cerebral, and it will focus on the motivation and the initial extensions that I needed to use ndarray—the numerical library that I chose for its strong resemblance to Python’s Numpy package.

Please keep in mind that this is not an introduction at learning Rust from scratch and I recommend learning about the basic concepts and ideas first (e.g., here).

Motivation#

The main motivation for writing Cerebral was the desire to learn Rust a language that I just learned the basics of, and creating a something that is slightly more useful than a tutorial or standard training project (although I am convinced that there are many perfectly suited available). So I settled on a framework for self-organizing neural networks that still are the foundation for my ongoing robotics research. I wrote a blog post about these networks which are also trained by most favorite algorithm I encountered in my studies of computer science. In addition, I already have implemented the algorithm on multiple occasions in Python or Matlab (or GNU Octave to be precise) in the past. I recommend reading the blog post if you are not familiar with the algorithm.

It turned out (to little surprise) that writing a library in rust is a bit more challenging at the beginning than writing an application, and it eventually turned out less generic than I hoped it to be—mainly because of the lack of GAT in the stable rust compiler at the time of when I began. In the meanwhile, GAT have been included in standard Rust and I incorporated them into the potpourri project.

The reasons for having chosen Rust as the next new language to learn, are that I am a big supporter of strictly typed languages and enforce type annotations every time I work with python or the web (via Typescript). Having been an extensive C++ user during my time in a robotics lab made, learning its “spiritual successor” made immediate sense. As I was less interested in foremost building distributed systems, Rust ultimatetively won over Go. I appreciate a lot the modern ideas and concepts behind rust and enjoy the “fresh” feeling and the excellent tool chain. I often struggled with the type system nevertheless, which is mainly due to the numeric/scientific nature of the project (numerical libraries come with many generic parameters), and the fact that the design of a generic framework needs to be designed more carefully. Furthermore, I hope this series of blog posts will be interesting for anybody who aims at doing the same.

First impressions with Rust#

I have to immediately point out, that development took significantly longer than (I had) anticipated and required significantly more boilerplate code than it would have been in python. To be fair, I had to implement some of numpy’s functionality which is not yet included in the ndarray crate and naturally, having a Rust course a few months ago (plus some occasional experimenting with tree structures) certainly didn’t magically turn me into an experienced Rustacean. Yet, I am afraid that you always sacrifice development speed for speed, memory safety and clean, maintainable code. At the end of the day, I will continue to do the initial algorithm design and general rapid prototyping in Python, and will not move to Rust before a running proof of concept. However, I am convinced that once I spend time with Rust, things will be different.

In general, I rarely could work on side projects for an extended amount of time at once and returning to a codebase after a few weeks means additional effort (and probably more discipline) to pick up the work again.

I organized my code in a monorepo called citrate—the salt of the citric acid in a reference to the “lemonfold” domain. Setting up a monorepo with Cargo and Poetry (which I removed recently), including a skeleton for a Python extension and some basic documentation only took about half a day. This mainly thanks to the excellent job the Rust’s cargo tool does. Implementing the pure algorithm as simple as it is in another story. Again, I am still very new to Rust, but it took me nearly one and a half days to get through countless compiler error messages. Parts of it are due to how shapes are handled in the ndarray crate and my attempts to work around that. Getting used to reference borrowing surely takes a while and prerequisites a good deal of patience. Eventually, I got used to the concept of borrowing after approximately two days into it and changed my opinion (I now really like the idea).

In the remainder, I will describe a basic implementation of the algorithm and some workarounds/helpers that I needed to get it running.

Numerical computations#

As an array for numerical computation and multidimensional arrays, I use the excellent ndarray crate. There are others that could have been used (e.g., nalgebra) but I enjoyed the resemblance to the numpy package in Python. There’s even a migration guide available. Ideally, I would have designed the framework to be generic in that choice, but I did not succeed with that endeavor (due to the lack of support for GAT at that time). I I resolved the issue in my Potpourri crate and will document the solution in one of the first blog posts about it.

In the remainder of this section, I will write about the functionality I miss in ndarray and some convenience functions. We start with two simple functions for linear algebra that compute the differences and distances of a point (1D) to a point set (2D, row-matrix):

/// The L2 norm
pub fn row_norm_l2<A, S>(points: &ArrayBase<S, Ix2>) -> Array1<A>
where
    S: Data<Elem = A>,
    A: Float,
{
    points.mapv(|e| e.powi(2)).sum_axis(Axis(1)).mapv(A::sqrt)
}

/// Differences point, point set
pub fn get_differences<A, S, T>(points: &ArrayBase<S, Ix2>, point: &ArrayBase<T, Ix1>) -> Array2<A>
where
    S: Data<Elem = A>,
    T: Data<Elem = A>,
    A: Float,
{
    points - &point.view().insert_axis(Axis(0))
}

/// Distances point, point set
pub fn get_distances<A, S, T>(points: &ArrayBase<S, Ix2>, point: &ArrayBase<T, Ix1>) -> Array1<A>
where
    S: Data<Elem = A>,
    T: Data<Elem = A>,
    A: Float,
{
    row_norm_l2(&get_differences(points, point))
}

To understand the type annotations, it is useful to know the concept of associated types in Rust. Associated types are declared in an interface definition, and are assigned in its implementations. That way, the return types or parameters of a function can vary with the struct that implements the interface for instance. In case of arrays in the ndarray crate, this allows for keeping the underlying data type generic in the data ownership interface: Data<Elem = A> translates to any data ownership (owned or shared) as long as the elements are of type A. It gets especially interesting as soon as generics are allowed (Generic Associate Types (GAT)) which is a feature that has been added only recently to Rust at the time of writing.

Note that this seems to be the most generic way of defining a function that accepts all different types of data types and arrays that own their data or are views. Since then, I now mostly opt for accepting references to ArrayViews as arguments which has the downside of requiring to convert owned arrays with the .view() method—even if this probably is slightly slower because memory for the view needs to be allocated. Functions are more convenient to write that way and, in machine learning, we anyway deal with views most of the time, when splitting a data set into training and test data for instance.

We can now extend all 2D arrays to act as point sets that support those two methods:

pub trait PointSet<A>{
    fn get_differences<S>(&self, point: &ArrayBase<S, Ix1>) -> Array2<A>
    where S: Data<Elem = A>, A: Float;

    fn get_distances<S>(&self, point: &ArrayBase<S, Ix1>) -> Array1<A>
    where S: Data<Elem = A>, A: Float;
}

impl<A,T> PointSet<A> for ArrayBase<T, Ix2>  where T: Data<Elem=A>, A: Float {
    fn get_differences<S>(&self, point: &ArrayBase<S, Ix1>)  -> Array2<A>
    where S: Data<Elem = A> {
        self - &point.view().insert_axis(Axis(0))
    }

    fn get_distances<S>(&self, point: &ArrayBase<S, Ix1>) -> Array1<A>
    where S: Data<Elem = A>, A: Float {
        row_norm_l2(&self.get_differences(point))
    }
}

fn main(){

    let a = array![[1.,2.,3.,-5.,6.],[1.,2.,3.,-5.,6.5]];
    let b = array![1.,2.,3.,-5.,6.];

    // run with views and arrays
    println!("a-b {}", a.get_differences(b))
    println!("a-b {}", a.get_differences(&b))
}

First version of the algorithm#

Now, we can create a first functional implementation of the algorithm. We plan on creating a toolbox that lets us modify every significant bit of the model—even if the algorithm itself is not very complex by all means (it is intended as an educational project after all). For now, we therefor settle for the classical implementation only. In addition, we limit ourselves to working with the f64 data type.

When implementing a self-organizing map where all neurons are arranged in a rectangular grid, one can use the array indices as coordinates in the latent space. However, this approach is very limiting (if you prefer different topologies), and therefore this algorithm expects a second array in addition to the codebook vectors. It holds the immutable coordinates in the latent space.

#[derive(Debug)]
pub struct SelfOrganizingMap {
    feature: Array<f64, Ix2>,
    latent: Array<f64, Ix2>,
}

impl SelfOrganizingMap {
    fn new<S, T>(feature: &ArrayBase<S, Ix2>, latent: &ArrayBase<T, Ix2>) -> SelfOrganizingMap
    where
        S: Data<Elem = f64>,
        T: Data<Elem = f64>,
    {
        // TODO: Check if #rows is equal for S, T
        let feature = feature.to_owned();
        let latent = latent.to_owned();

        SelfOrganizingMap { feature, latent }
    }

    /// Single update step
    fn adapt<S>(&mut self, feature: &ArrayBase<S, Ix1>, influence: f64, rate: f64)
    where
        S: Data<Elem = f64>,
    {
        let differences = self.feature.get_differences(&feature); // in feature space

        let best_matching = argmin(&self.feature.get_distances(&feature)); // row-index in array
        // To avoid double computation, the feature space differenes is computed explicitedly
        // already here and the method is not used for determining the best matching unit
        // let best_matching = argmin(&row_norm_l2(&differences));
        let best_matching = self.latent.slice(s![best_matching, ..]); // latent coordinate

        let distances = &self.latent.get_distances(&best_matching); // in latent space

        // Compute the Gauss kernel
        let strength = distances.mapv(|e| (-1.0 * e.powi(2) / influence / 2.0).exp());

        let updated = &self.feature - (rate * strength.insert_axis(Axis(1)) * differences); // update rule

        self.feature.assign(&updated);
    }

    /// Batch training of a PointSet repeatedly for a number of `epochs`.
    /// Analog to the `adapt` method, `influences` and `rates` can be specified
    /// as tuples of the start and end values at the beginning and end of
    /// the learning respectively.
    fn batch<S>(
        &mut self,
        features: ArrayBase<S, Ix2>,
        influences: Option<(f64, f64)>,
        rates: Option<(f64, f64)>,
        epochs: Option<usize>,
    ) where
        S: Data<Elem = f64>,
    {
        let influences = influences.unwrap_or((3.0, 1.0));
        let rates = rates.unwrap_or((0.7, 0.1));
        let epochs = epochs.unwrap_or(1);

        let n_samples = features.len_of(Axis(0));

        for epoch in 0..epochs {
            for (i, feature) in features.outer_iter().enumerate() {
                let progress = ((epoch * n_samples + i) as f64) / ((epochs * n_samples) as f64);
                let rate = rates.0 * (rates.1 / rates.0).powf(progress);
                let influence = influences.0 * (influences.1 / influences.0).powf(progress);

                self.adapt(&feature, influence, rate);
            }
        }
    }
}
from dataclasses import dataclass

import numpy as np
from numpy.typing import NDArray


@dataclass
class SelfOrganizingMap:
    """Data class for a self-organizing map

    args:
        latent (np.ndarray): Topology of the network
        codebook (np.ndarray): Feature vectors of the neurons
    """
    latent: NDArray[np.float_]
    codebook: NDArray[np.float_]

    def adapt(self, sample: NDArray[np.float_], rate: float, influence: float) -> None:
        """A single update step

        Args:
            samples (np.ndarray): The training samples
            rate (float): Learning rate
            influence (float): Influence
        """
        n_neurons, output_dim = self.codebook.shape  # (3)

        # Differences between the neurons weights and the sample
        diffs = self.codebook - np.tile(sample, (n_neurons, 1))  # (4)

        # Neural coordinates of the winning neuron (BMU)
        winning = self.latent[np.argmin(np.linalg.norm(diffs, axis=1))]  # (5)

        # Influence of the BMU ∀ neurons
        neighborhood = np.exp(-0.5 * np.sum((self.latent - np.tile(winning, (n_neurons, 1))) ** 2, axis=1) / influence**2)  # (6)

        # Update the codebook
        som.codebook -= rate * diffs * np.tile(neighborhood[:, np.newaxis], (1, output_dim))  # (7)


    def batch(
        self,
        samples: NDArray[np.float_],
        rate_start: float = 0.7,
        rate_end: float = 0.1,
        influence_start: float = 2,
        influence_end: float = 0.1,
    ) -> None:
        """Train a SOM.

        Args:
            samples (np.ndarray): The training samples
            rate_start (float): Learning rate at the start (≤ 1 ∧ > 0)
            rate_end (float): Learning rate at the end.
            influence_start (float): Influence of the BMU at the beggining (>0)
            influence_end (float): Influence of the BMU at the end.
        """

        n_samples, output_dim = samples.shape
        assert output_dim == self.codebook.shape[1]

        for i, x in enumerate(samples):

            rate = rate_start * (rate_end / rate_start) ** (float(i) / n_samples)  # (8)
            influence = influence_start * (influence_end / influence_start) ** (float(i) / n_samples)  # (9)
            self.adapt(x, rate, influence)

The implementations for Python and Rust are approximately of same length, so the overhead from Rust seems negligible in that respect. This is because of the high abstraction level when working with element-wise array operations only.

The initialization of the latent space (the coordinates of the neurons) is done manually, the differences between the languages becomes more imminent:

use num_traits::cast;

fn main() {
    let (n1, n2) = (10, 10);

    let mut som = SelfOrganizingMap::new(
        &Array2::<f64>::zeros((n1 * n2, 3)),
        &Array2::<f64>::zeros((n1 * n2, 2)),
    );

    // 1st variant:  "Classical" nested loops and indexing
    for x in 0..n1 {
        for y in 0..n2 {
            som.latent
                .row_mut(x * n2 + y)
                .assign(&Array::from_vec(vec![cast(x).unwrap(), cast(y).unwrap()]))
        }
    }

    // 2nd variant: More modern, functional approach with iterators
    som.latent
        .axis_iter_mut(Axis(0)) // For all rows (mutable)
        .zip((0..n1).cartesian_product(0..n2)) // for all combinations of 0..n1 and 0..n2 do
        .for_each(|(mut row, (x, y))| {
            row.assign(&Array::from_vec(vec![cast(x).unwrap(), cast(y).unwrap()]))
        });

    println!("{:?}", som);
}

if __name__=="__main__":
    som = SelfOrganizingMap(
        latent = np.array([[x, y] for x in range(nx) for y in range(ny)]),  # (1)
        codebook = np.random.rand((nx * ny, nd))  # (2)
    )
    print(som)

\[\begin{split} \text{latent} = \begin{pmatrix} 0 & 0 \\ 0 & 1 \\ \vdots & \vdots \\ n_1 & n_2 \end{pmatrix} \end{split}\]

Here, the convenience of using Python becomes more obvious. The rust code is more complex and the conversion between types generates a considerable amount of friction. Python’s tendency to convert types for you certainly reduces the amount of code and time to program.

However, I showed two ways for initializing the arrays. The first one is how I would have tackled the problem with my background in C/C++. It involves two nested loops and indexing of the array. While easy to understand, the indexing can resolve in out-of-bounds memory access. At least, Rust will panic in those cases (better than the alternative). The second, functional variant, however, is way more elegant in my opinion once one gets used to it, and it reads like natural language after a while. Note that both variants are equally performant thanks to zero-cost abstractions in Rust, and even more, these expressions can be parallelized among multiple cores with the addition of a single line using the rayon crate. I will investigate further into these claims when discussing the implementation of my Potpourri crate later.

I will stop at this point and continue in the next with breaking down the algorithm into smaller pieces and an architecture design for composable algorithms.