WIP updated homebrew toolchain for PS2

Hello,
First of all thanks uyjulian for your effort, this is amazing!

I have noticed that the floats are not working fine in the toolchain

printf("Hello, world!\n");
float hello = 1.0f;
printf("Hello, world! %f\n", hello);

This example works fine with old toolchain but is crashing with the new one. I don't know if I did something wrong during the installation.
Here you have the "hello world repo"

Thanks
 
It may be a standard library issue, not a compiler issue. The issue may have manifested itself by incorrect output in gcc 6.x and error in gcc 7.x.

Edit: It IS a compiler issue. Floating point never really worked correctly, even on GCC 6.x.
The code below will error in libgcc/fp-bit.c

Code:
#include <stdio.h>
int main(int, char*[]) {
volatile float hello;
hello = 1.0f;
printf("Hello, world! %f\n", hello);
return 0;
}
 
Last edited:
I'll update to GCC 9.2, re-add the DVP patches, and remove the PS2DEV/PS2SDK->PS2DEVUJ/PS2SDKUJ environment variable renaming soon.

I will rebase my changes on top of the newlib-libc changes at a later time.
 
Thank you for keeping this alive. In my toolchain folder I have created 3 separate folders: ee, iop and dvp. The reason for this is that it would be possible to have different binutils and gcc version per cpu type.

I know it's very hard to get the EE compiler working becouse how different the cpu is from a standard mips cpu. But the iop processor is a totaly different story. It's a standard mips r3000, so in theory, no changes would have to be made to compile source files into object files. The only "special" is the linking into ".irx" files. I've recently tried compiling ps2sdk with an old EE toolchain and a new (v7 I think it was?) IOP toolchain. With only minor patches the sdk compiles, but it doesn't run OPL and some other test I tried :-(.

So my questions are:
1 - What is the status of the iop toolchain? And if there's problems do we know what's causing them?
2 - What if we complete the iop toolchain and try to upstream that to the gcc/binutils folks? That would already be a big win, right?
3 - What if we "drop" DVP support for newer binutils versions, and just leave it at an old binutils version?

All in all I think separating the 3 toolchains makes a lot of sense, dont you think?
 
1 - What is the status of the iop toolchain? And if there's problems do we know what's causing them?
The IOP toolchain possibly broke when I updated from binutils 2.29 to 2.31 due to large changes in the ELF code.
2 - What if we complete the iop toolchain and try to upstream that to the gcc/binutils folks? That would already be a big win, right?
The current method used in ps2sdk is a big hackjob.
I would like a method to use external tools or linker scripts to insert the iopmod section, so it doesn't require any binutils modifications.
3 - What if we "drop" DVP support for newer binutils versions, and just leave it at an old binutils version?
All in all I think separating the 3 toolchains makes a lot of sense, dont you think?
I might do that.

---

I updated binutils to 2.33.1 and GCC to 9.2.0.
 
  • Like
Reactions: TnA
I'm currently working on GCC 5.5.0 and 6.5.0 branches, which the MMI patches should still apply without major changes.

I'm aware that the current work on 9.2.0 is currently broken.
 
I opened up a pull request for ps2sdk, incorporating most of the changes in my fork of ps2sdk. I also added support for the new toolchain in bin2s.

I think the only things broken currently are math library functions, gp register handling, and libmpeg.

Little tidbit with GAS:
The definition ".gasversion." was added around 2011 (look in the gas history for more information).
This can be used to determine if the toolchain is new or old.
Example:
Code:
.ifdef .gasversion.
# New toolchain
                        sq          $a4,   0x0C0($s0)
                        sq          $a5,   0x0D0($s0)
                        sq          $a6,   0x0E0($s0)
                        sq          $a7,   0x0F0($s0)
.else
# Old toolchain
                        sq          $t4,   0x0C0($s0)
                        sq          $t5,   0x0D0($s0)
                        sq          $t6,   0x0E0($s0)
                        sq          $t7,   0x0F0($s0)
.endif
This does not run through the C preprocessor.
 
I do plan to upgrade to GCC 10, which should release in the next month or so.
GCC 10 has a static analyzer pass, which should surface more potential bugs.

Even through most programs will not work correctly when compiled with this toolchain, it is still useful to find potential bugs and undefined behavior.
 
With recent MMI tests I also decided to take a look at the assembler output of uyjworking_gcc5 and uyjworking_gcc6. It seems there's an issue, but I'm not sure.

What I'm trying to do looks like:
Code:
for(int i=0; i<8; i++)
  dest_vector_of_4x_32bit_ints += source_vector_of_4x_32bit_ints[i];

With the old gcc v3.2.3 the inner part of the loop looks like so:
Code:
lq $17,0($24)
paddsw $15, $16, $17

With the new gcc v5 and v6 the inner part of the loop looks like so:
Code:
ld $8,16($sp)
ld $9,24($sp)
paddsw $2, $6, $8

Instead of loading 128bit data using a single 128bit "LQ" instruction, it seems to be loading the data using 2 64bit "LD" instructions. This wouldn't be a big problem if loaded to the right place, but they are loaded in 2 different registers: $8 and $9 in this example. Only the first register ($8) is used.

So it looks to me 128bit loads and stores are broken. Can anyone confirm?

EDIT: Tested with gcc 9.2 also. The results are the same as gcc 5 and 6.
 
Here's a more simple test:
Code:
typedef unsigned int uint128_t __attribute__((mode(TI)));
void test_lq_sq(uint128_t *input1, uint128_t *input2)
{
//*input1 += *input2;

asm("paddsb %[rv], %[vec1], %[vec2] \n"
: [rv] "=&r"(*input1)
: [vec1] "r"(*input1), [vec2] "r"(*input2));
}

This should compile into:
Code:
lq $2,0($5)
lq $3,0($4)
paddsb $5, $3, $2
sq $5,0($4)
*tested on gcc 3.2.3

But compiles into:
Code:
ld $8,0($5)
ld $9,8($5)
ld $6,0($4)
ld $7,8($4)
paddsb $2, $6, $8
sd $2,0($4)
sd $3,8($4)
*tested on gcc 5, 6, 9 and an unpatched 4.9

I've also tested this on an unpatched gcc 4.9. So I guess this issue is in upstream gcc, and has been in there ever since Jürgen Urban added it.
 
No new GCC version than our 3.2.x has complete support for MMI. Mega man (Jurgen) wrote that support for R5900 was basic, which meant only support for standard MIPS features.

Unlike 3.2.x, the official support for the R5900 was made to assume 64-bit GPRs, not 128-bit. In my patch for 5.3.0, I still kept this, but implemented the MMI for vector operations. Quick memory copies involved the vector modes, which was already used by GCC (for Loongson MIPS, if I remember right). I never completed implementing support for MMI, so only memory copy operations involved LQ/SQ in a sane way.

Since there is no support for the 128-bit registers, I don't think GCC will correctly interface with inline assembly involving MMI. Even if it would, it may not preserve/restore them correctly (with regards to the upper 64 bits).

I guess GCC might have assumed it may truncate the upper 64 bits of the register, since no 64-bit MIPS can process a TI-mode value in a hardware register.
 
Last edited:
But what does "complete support for MMI" mean? Older versions of gcc do not have auto-vectorization, so they will never generate MMI code. Using the MMI will require (inline) assembly, or auto-vectorization from newer compiler versions. Other than LQ and SQ, what more MMI instructions does gcc 3.2.3 create?

"Quick memory copies involved the vector modes, which was already used by GCC (for Loongson MIPS, if I remember right). I never completed implementing support for MMI, so only memory copy operations involved LQ/SQ in a sane way."
What are these quick memory copies? Can you give a 'C' example that would produce LQ/SQ code?
 
I'm not sure if GCC builtins/intrinsics are supported for MMI instructions.

Probably TI mode implementation isn't finished, so that is why you get the double ld instruction.
 
But what does "complete support for MMI" mean? Older versions of gcc do not have auto-vectorization, so they will never generate MMI code.

Surprisingly, it does. Ours was a toolchain modified for the R5900. If you looked at the R5900.md file, it has the MMI defined there.
Under the right conditions, it will emit LQ/SQ and other MMI.

The modern, standard GCC is still missing out a great deal of functionality to generate efficient code on the R5900, in comparison to our aging toolchain.

Other than LQ and SQ, what more MMI instructions does gcc 3.2.3 create?

I forgot, but it is perhaps at least on par with the Sony Linux GCC compiler.

I've seen it generate MMI before, but it isn't too common. I've seen that in OPL, but I forgot which function that was (and hence what conditions entail the emission of MMI).

"Quick memory copies involved the vector modes, which was already used by GCC (for Loongson MIPS, if I remember right). I never completed implementing support for MMI, so only memory copy operations involved LQ/SQ in a sane way."
What are these quick memory copies? Can you give a 'C' example that would produce LQ/SQ code?

Late versions of GCC have the ability to inline memcpy() and loops that copy data. It's designed to utilize architecture-specific capabilities for copying memory.

We've talked about this before, but a really long time ago.
From your post on psx-scene, in 2017:
Maximus32 said:
That's a nice improvement!

C code from:
Code:
int ivec1[VEC_SIZE];
int ivec2[VEC_SIZE];
inline void _ivec_add(int *v0, int *v1, int count)
{
   while (count--)
     (*v0++) += (*v1++);
}
void ivec_add(){_ivec_add(ivec1, ivec2, VEC_SIZE);}

To:
Code:
typedef int v4si __attribute__ ((vector_size (16)));
v4si v4sivec1[VEC_SIZE];
v4si v4sivec2[VEC_SIZE];
inline void _v4sivec_add(v4si *v0, v4si *v1, int count)
{
   while (count--)
     (*v0++) += (*v1++);
}
void v4sivec_add(){_v4sivec_add(v4sivec1, v4sivec2, VEC_SIZE);}

Assemply code from (inner loop only):
Code:
.L2:
   addiu   $3,$3,4
   addiu   $2,$2,4
   lw   $4,-4($3)
   lw   $5,-4($2)
   addu   $4,$4,$5
   bne   $2,$6,.L2
   sw   $4,-4($3)

To (inner loop only):
Code:
.L7:
   addiu   $3,$3,16
   addiu   $2,$2,16
   lq   $4,-16($3)
   lq   $6,-16($2)
   paddw   $4,$4,$6
   bne   $2,$5,.L7
   sq   $4,-16($3)

That is the exact same number of instructions, but then doing 4x the number of calculations!

From what you wrote, perhaps it never worked for normal, non-vector expressions. Due to autovectorization not working for the R5900.

I'm not sure if GCC builtins/intrinsics are supported for MMI instructions.

Probably TI mode implementation isn't finished, so that is why you get the double ld instruction.

Since our GPRs are 64-bit to GCC, I don't think you can fit a TI-mode (128-bit) value in. It may have been legal for our GCC 3.2.x, but that GCC was just configured differently and had a lot more invasive rework done to it.

I forgot if I ever mentioned it anywhere, but the GCC team recommended that I keep the R5900 defined this way. We do not have some basic (and important) arithmetic operations for processing 128-bit data, such as multiplication, division, addition etc. The only thing one could possibly do with TI-mode values, is to copy data (using MMI). So it made not so much sense to implement R5900 support as an uncommon, 128-bit target.

If one implemented the MMI as vector modes, then GCC should also correctly preserve the 128-bit values between function calls. It has this concept of treating the vector mode as a different operating mode from integers, which would allow GCC to treat it as a 64-bit target with special 128-bit vector operations (which sounds like the R5900).
 
Last edited:

Similar threads

Back
Top