void test_v16u8_padd (unsigned char * a, unsigned char * b, unsigned char * c) {
int i; for(i=0;i<256;i++) a[i]=b[i]+c[i];
}
lq $8,0($6)
lq $2,0($5)
paddb $2,$2,$8
sq $2,0($4)
lq $8,16($6)
lq $2,16($5)
paddb $2,$2,$8
sq $2,16($4) etc,etc...
void test_v8u16_pext_h(unsigned int * a, unsigned short * b) {
int i; for(i=0;i<256;i++) a[i]=b[i];
}
lq $2,0($5)
pextlb $6,$0,$2
pextub $2,$0,$2
sq $6,0($4)
sq $2,16($4)
lq $2,16($5)
pextlb $6,$0,$2
pextub $2,$0,$2
sq $6,32($4)
sq $2,48($4) etc,etc...
When calling a function with with 5 or more parameters using the old ABI (eabi64), The parameters are passed as:
- r4 = $a0
- r5 = $a1
- r6 = $a2
- r7 = $a3
- r8 = $t0 <- fifth parameter
Now with the new ABI n32, the register numbers are the same, but they are named differently:
- r4 = $a0
- r5 = $a1
- r6 = $a2
- r7 = $a3
- r8 = $a4 <- fifth parameter
I'm not sure what register $t0 translates to when using the new compiler, but it isn't the correct one (r8). This is a problem for all C code calling assembly functions with 5 or more parameters.
One such example was the InvokeUserModeCallback function, used by the kernel patch for the SetAlarm function. It's fixed here. I'm sure there's more issues like these, in multiple projects.
I found and solved the issue by using the debugger from PCSX2. I searched in binutils/gas and found the following:Why do you think it is the wrong register? From this file, $t0 is was always register 8.
From what I do remember, they renamed some registers like $t0-$t3 to $a4 to $a7. Old code that used the register names would not compile with the new assembler, but code that used the register numbers directly are unaffected. This does not affect the assembly files assembled with GCC itself, with as_reg_compat.h included.
#define N32N64_SYMBOLIC_REGISTER_NAMES \
{"$a4", RTYPE_GP | 8}, \
{"$a5", RTYPE_GP | 9}, \
{"$a6", RTYPE_GP | 10}, \
{"$a7", RTYPE_GP | 11}, \
{"$ta0", RTYPE_GP | 8}, /* alias for $a4 */ \
{"$ta1", RTYPE_GP | 9}, /* alias for $a5 */ \
{"$ta2", RTYPE_GP | 10}, /* alias for $a6 */ \
{"$ta3", RTYPE_GP | 11}, /* alias for $a7 */ \
{"$t0", RTYPE_GP | 12}, \
{"$t1", RTYPE_GP | 13}, \
{"$t2", RTYPE_GP | 14}, \
{"$t3", RTYPE_GP | 15}
#define O32_SYMBOLIC_REGISTER_NAMES \
{"$t0", RTYPE_GP | 8}, \
{"$t1", RTYPE_GP | 9}, \
{"$t2", RTYPE_GP | 10}, \
{"$t3", RTYPE_GP | 11}, \
{"$t4", RTYPE_GP | 12}, \
{"$t5", RTYPE_GP | 13}, \
{"$t6", RTYPE_GP | 14}, \
{"$t7", RTYPE_GP | 15}, \
{"$ta0", RTYPE_GP | 12}, /* alias for $t4 */ \
{"$ta1", RTYPE_GP | 13}, /* alias for $t5 */ \
{"$ta2", RTYPE_GP | 14}, /* alias for $t6 */ \
{"$ta3", RTYPE_GP | 15} /* alias for $t7 */
mips64r5900el-ps2-elf-gcc -v
Using built-in specs.
COLLECT_GCC=mips64r5900el-ps2-elf-gcc
COLLECT_LTO_WRAPPER=/home/rgaiser/dev/ps2max-ee-toolchain-gcc9/ps2dev/ee/libexec/gcc/mips64r5900el-ps2-elf/11.0.0/lto-wrapper
Target: mips64r5900el-ps2-elf
Configured with: ../../ee/gcc/configure --prefix=/home/rgaiser/dev/ps2max-ee-toolchain-gcc9/ps2dev/ee --target=mips64r5900el-ps2-elf --enable-languages=c,c++ --with-float=hard --with-newlib --disable-nls --disable-shared --disable-libssp --disable-libmudflap --disable-threads --disable-libgomp --disable-libquadmath --disable-target-libiberty --disable-target-zlib --without-ppl --without-cloog --with-headers=/home/rgaiser/dev/ps2max-ee-toolchain-gcc9/ps2dev/ee/mips64r5900el-ps2-elf/include --disable-libada --disable-libatomic --disable-multilib --enable-cxx-flags=-G0
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 11.0.0 20200430 (experimental) (GCC)
I tested with adding volatile, but it doesn't make a difference. I tried shortly to fill the clobbered list but I couldn't figure out the right syntax for the vu0 registers.Have you compared the assembled code with the original? The compiler may still rearrange the code (there's no .noreorder directive and it's declared as just "asm" instead of "asm volatile"). I think it could be the compiler reusing some registers, since this asm block is missing the part about clobbering used registers.
{
u32 status, mask;
__asm__ volatile (
".set\tpush\n\t" \
".set\tnoreorder\n\t" \
...
"li\t%1, 0xffffffe7\n\t" \
...
".set\tpop\n\t" : "=r" (status), "=r" (mask));
void VU0MixVec(VU_VECTOR *a,VU_VECTOR *b,float t,VU_VECTOR *res)
{
float fVar1;
float y1;
float z1;
float w1;
float y2;
float z2;
float w2;
y1 = a->y;
z1 = a->z;
w1 = a->w;
y2 = b->y;
z2 = b->z;
w2 = b->w;
fVar1 = 1.00000000 - t;
res->x = a->x * t + b->x * fVar1;
res->y = y1 * t + y2 * fVar1;
res->z = z1 * t + z2 * fVar1;
res->w = w1 * t + w2 * fVar1;
return;
}