Fun with the EE MMI instructions

Maximus32

Developer
I wanted to experiment with the MMI instructions to see how much they can benefit performance. I searched for examples but couldn't find any, so I created a few tests. I must say I'm surprised how powerfull the MMI instructions are, compared to the standard MIPS III instructions!

So basically what we get is 128 bit integer vector instructions that can operate on the following vectors:
- 16x int8_t
- 8x int16_t
- 4x int32_t
- 2x int64_t

I created the following small library named "libeevec". An EE Vector Library:
https://gitlab.com/ps2max/libeevec

Using C++ and classes it's possible to do:
Code:
int32_t i1[] = {1, 2, 3, 4};
int32_t i2[] = {2, 3, 4, 5};
CInt32_4 vi1(i1), vi2(i2);
CInt32_4 vresult = vi1 + vi2;

As a demo, I've tried to optimize a sound mixer. Mixing int16_t type samples from 32 different sound sources. This scales very well using the CInt16_8 class, and the result is a 10x speedup.

The reason it's more than 8x faster is that the MMI instruction also clamp the samples for free, so instead of writing for each sample:
Code:
// Mix
dest += source[iSource].sample[iSample];
// Clamp max
if (dest > 32767)
dest = 32767;
// Clamp min
if (dest < -32767)
dest = -32767;

We can do with just 1 instruction, to mix and clamp 8 samples at once:
Code:
// PADDSH : Parallel Add with Signed Saturation Halfword
dest += source[iSource].sample[iSample8];

Anyway, it was just a small experiment. I hope this triggers some of you to also experiment with the EE MMI. Any additions to the 'library' via gitlab pull requests are welcome. But other than this experiment I have no plans for it at this time.

Some more random thoughts on the vector instructions:
- With newer compilers it is possible to automatically use vector instructions. Search for "auto vectorization".
- What the MMI instructions are for integers, the VU0 is for floats. We can create a CFloat32_4 class that uses the VU0. Perhaps an experiment for another day ;-).
- The VU0 could also be used by the auto vectorization of compilers ... dreaming of clang with MMI and VU0 auto vectorization support ...
 
There is actually quite a lit you can do with the mmi instructions. There are usually 8/16/32/64 bit versions of the parallel instructions. Its been quite a while since I looked at them though. Most of the optimizations that could be taken advantage of may be on the compiler end where the programmer need not know. Unfortunately that's a lot of work for the toolchain.
 
Most of the optimizations that could be taken advantage of may be on the compiler end where the programmer need not know. Unfortunately that's a lot of work for the toolchain.
It's being worked on. Check out this thread for more information:
https://www.psx-place.com/threads/wip-updated-homebrew-toolchain-for-ps2.20539

Sources of gcc with autovectorization in one of the *mmi* branches here:
https://gitlab.com/ps2max/toolchain/gcc

Autovectorization output example here:
https://gitlab.com/ps2max/testing/test_compiler/-/blob/gcc-output-compare/test_autovec.S

A lot is working, but also still a lot of issues.
 
It's being worked on. Check out this thread for more information:
https://www.psx-place.com/threads/wip-updated-homebrew-toolchain-for-ps2.20539

Sources of gcc with autovectorization in one of the *mmi* branches here:
https://gitlab.com/ps2max/toolchain/gcc

Autovectorization output example here:
https://gitlab.com/ps2max/testing/test_compiler/-/blob/gcc-output-compare/test_autovec.S

A lot is working, but also still a lot of issues.

May you point out where the necessary changes are for auto vectorization.
 

Similar threads

Back
Top