Sorting arrays of the ndarray crate#

I came across a problem where I needed to sort an array by its values in one column and potentially sort other arrays with that index. In Python, this is very easy:


a = np.random.rand(4,3)
print(a)
# array([[0.98782953, 0.3603707 , 0.07544306],
#        [0.3278871 , 0.97414566, 0.36348919],
#        [0.77329616, 0.82990707, 0.07634337],
#        [0.5545888 , 0.19784322, 0.37579676]])
idx = np.argsort(a[:,0])
print(idx)
# array([1, 3, 2, 0])
print(a[idx,:])
# array([[0.3278871 , 0.97414566, 0.36348919],
#       [0.5545888 , 0.19784322, 0.37579676],
#       [0.77329616, 0.82990707, 0.07634337],
#       [0.98782953, 0.3603707 , 0.07544306]])

However, in rust this seems to be much more complicated, and I could not find to solve this issue efficiently. I stumbled over the ndarray-slice crate but preferred to try my own solution. As efficiency is not really that important (I need it for testing) this approach will suffice for now. Maybe it proves useful for you too (if so, please let me know in the comments).

We start by including the ndarray prelude and itertools—it includes implementations for sorting iterators.

use itertools::Itertools;
use ndarray::prelude::*;

Then, let’s start with a simple array and select the first column which we want to use for sorting.

fn main() {
    let a = arr2(&[
        [0.3898262290068988, 0.8167391344557614],
        [0.11878159178282421, 0.5626218062608417],
        [0.5704567148662028, 0.7053201952102196],
        [0.43654539176129825, 0.39508426718930445],
    ]);

    println!("{}", a.slice(s![.., 0]));
    // Output: [0.3898262290068988, 0.11878159178282421, 0.5704567148662028, 0.43654539176129825]

We now have to convert the ndarray slice into a standard rust slice using as_slice. This method fails if the memory is not aligned (which is the case when slicing a column). Therefore, we create a memory aligned copy to_owned (which is an unnecessary copy). As we want the indices that sort the rows, we enumerate the values of the column. Using itertools gives us sort_by, and we can use it to sort the column values. For now, we will print the rows of the sorted array one after another together with their row indices in the original array.

    // slice needs to be owned / copied to be memory aligned (if iterating over rows)
    a.slice(s![.., 0])
        .to_owned().as_slice().unwrap()
        .into_iter()
        .enumerate()
        .sorted_by(|a, b| a.1.partial_cmp(b.1).unwrap())
        .for_each(|x| println!("{} {}", x.0, x.1));
    // 1 0.11878159178282421
    // 0 0.3898262290068988
    // 3 0.43654539176129825
    // 2 0.5704567148662028

We are interested in the index, so we modify the code such that we can store it in a vector.

    let idx: Vec<usize> = a.slice(s![.., 0])
        .to_owned().as_slice().unwrap()
        .into_iter()
        .enumerate()
        .sorted_by(|a, b| a.1.partial_cmp(b.1).unwrap())
        .map(|x| x.0).collect();

    println!("{:?}", idx);
    // Output: [1, 0, 3, 2]

Now that we have the index, we still cannot sort by this index as this seem to be not supported by the Slice object. It seems we can only slice by ranges or single values but not an array.

One solution sufficient for my use case is to iterate over the index and copy the rows in a new array:

    let mut b = a.clone();

    idx.into_iter()
        .enumerate()
        .for_each(|x| {
            b.slice_mut(s![x.0, ..]).assign(&a.slice(s![x.1, ..]));
        });
    // Output:
    // [[0.11878159178282421, 0.5626218062608417],
    //  [0.3898262290068988, 0.8167391344557614],
    //  [0.43654539176129825, 0.39508426718930445],
    //  [0.5704567148662028, 0.7053201952102196]], shape=[4, 2], strides=[2, 1], layout=Cc (0x5), const ndim=2

Or skipping the explicit index:

    let mut b = a.clone();

    a.slice(s![.., 0])
        .to_owned().as_slice().unwrap()
        .into_iter()
        .enumerate()
        .sorted_by(|a, b| a.1.partial_cmp(b.1).unwrap())
        .enumerate()
        .for_each(|x| {
            b.slice_mut(s![x.0, ..]).assign(&a.slice(s![x.1.0, ..]));
        });
}

We can include multiple assignments in the last for_each so that we can sort multiple arrays by the index vector. Eventually, I’ll use the last bit to sort the components of (Gaussian) mixture models by their relevance/weight.

To further improve readability, we can destructure the x variable in the fore_each method into more traditional index names i and j. We can do the same for sorted_by step in the pipeline. Variables that we don’t need (i.e., the indices) can be replaced by the placeholder _ to avoid compiler warnings (and improve readability even further).

    let mut b = a.clone();

    a.slice(s![.., 0])
        .to_owned()
        .as_slice()
        .unwrap()
        .into_iter()
        .enumerate()
        .sorted_by(|(_,a), (_,b)| a.partial_cmp(b).unwrap())
        .enumerate()
        .for_each(|(i, (j, _))| {
            b.slice_mut(s![i, ..]).assign(&a.slice(s![j, ..]));
        });
    println!("{:?}", b);

We can of course apply the same to the case using the index. Here, I used iter instead of into_iter. The latter gives away the ownership of the index which is what we don’t want. Note the dereferencing operator * which is then required.

    let mut b = a.clone();

    idx.clone().iter().enumerate().for_each(|(i, j)| {
        b.slice_mut(s![i, ..]).assign(&a.slice(s![*j, ..]));
    });
    println!("{:?}", b);

Updated on 08 August 2023

Added paragraph about tuple destructuring and ownership preservation of the index