But what does "complete support for MMI" mean? Older versions of gcc do not have auto-vectorization, so they will never generate MMI code.
Surprisingly, it does. Ours was a toolchain modified for the R5900. If you looked at the R5900.md file, it has the MMI defined there.
Under the right conditions, it will emit LQ/SQ and other MMI.
The modern, standard GCC is still missing out a great deal of functionality to generate efficient code on the R5900, in comparison to our aging toolchain.
Other than LQ and SQ, what more MMI instructions does gcc 3.2.3 create?
I forgot, but it is perhaps at least on par with the Sony Linux GCC compiler.
I've seen it generate MMI before, but it isn't too common. I've seen that in OPL, but I forgot which function that was (and hence what conditions entail the emission of MMI).
"Quick memory copies involved the vector modes, which was already used by GCC (for Loongson MIPS, if I remember right). I never completed implementing support for MMI, so only memory copy operations involved LQ/SQ in a sane way."
What are these quick memory copies? Can you give a 'C' example that would produce LQ/SQ code?
Late versions of GCC have the ability to inline memcpy() and loops that copy data. It's designed to utilize architecture-specific capabilities for copying memory.
We've talked about this before, but a really long time ago.
From
your post on psx-scene, in 2017:
Maximus32 said:
That's a nice improvement!
C code from:
Code:
int ivec1[VEC_SIZE];
int ivec2[VEC_SIZE];
inline void _ivec_add(int *v0, int *v1, int count)
{
while (count--)
(*v0++) += (*v1++);
}
void ivec_add(){_ivec_add(ivec1, ivec2, VEC_SIZE);}
To:
Code:
typedef int v4si __attribute__ ((vector_size (16)));
v4si v4sivec1[VEC_SIZE];
v4si v4sivec2[VEC_SIZE];
inline void _v4sivec_add(v4si *v0, v4si *v1, int count)
{
while (count--)
(*v0++) += (*v1++);
}
void v4sivec_add(){_v4sivec_add(v4sivec1, v4sivec2, VEC_SIZE);}
Assemply code from (inner loop only):
Code:
.L2:
addiu $3,$3,4
addiu $2,$2,4
lw $4,-4($3)
lw $5,-4($2)
addu $4,$4,$5
bne $2,$6,.L2
sw $4,-4($3)
To (inner loop only):
Code:
.L7:
addiu $3,$3,16
addiu $2,$2,16
lq $4,-16($3)
lq $6,-16($2)
paddw $4,$4,$6
bne $2,$5,.L7
sq $4,-16($3)
That is the exact same number of instructions, but then doing 4x the number of calculations!
From what you wrote, perhaps it never worked for normal, non-vector expressions. Due to autovectorization not working for the R5900.
I'm not sure if GCC builtins/intrinsics are supported for MMI instructions.
Probably TI mode implementation isn't finished, so that is why you get the double ld instruction.
Since our GPRs are 64-bit to GCC, I don't think you can fit a TI-mode (128-bit) value in. It may have been legal for our GCC 3.2.x, but that GCC was just configured differently and had a lot more invasive rework done to it.
I forgot if I ever mentioned it anywhere, but the GCC team recommended that I keep the R5900 defined this way. We do not have some basic (and important) arithmetic operations for processing 128-bit data, such as multiplication, division, addition etc. The only thing one could possibly do with TI-mode values, is to copy data (using MMI). So it made not so much sense to implement R5900 support as an uncommon, 128-bit target.
If one implemented the MMI as vector modes, then GCC should also correctly preserve the 128-bit values between function calls. It has this concept of treating the vector mode as a different operating mode from integers, which would allow GCC to treat it as a 64-bit target with special 128-bit vector operations (which sounds like the R5900).