PS3 Fun with Syscon and Serial commands

marciolsf

Member
In my efforts to understand what is wrong with CECHA01 ps3, and move past the YLOD/replace caps dance, I've picked up a USB to serial adapter, soldered RX and TX to the UART points on my board, and started pecking at it. Thanks to @M4j0r 's python script, I was able to submit a few commands to syscon and confirm that it's at least alive and responding.

For example, by just plugging it all in and turning the back switch on, I get the following (currently connected via Putty to serial)

R:3A:OK 00000000

So that's good, sounds like there's somebody home! It's a start anyway.

Next up, I get started with python. If you want to follow at home, you'll need the following
  • python 2.7 (anything newer has syntax changes on python and doesn't work so good
  • pycryptodome
  • pyserial
  • not be afraid of the command line
  • This script -- https://pastebin.com/4ymiFQbi. I saved it as ps3.py

According to M4j0r, the syntax is is "python script.py <your serial port number> <CRX or CRXF>" The choice between CRX or CRXF will depend on your syscon version. According to https://www.psdevwiki.com/ps3/Syscon_Hardware#Syscon_UART, COK-001 boards use CRX, so that's what I typed (make sure putty is closed now, or the com port will not be available. Your actual port # might be different than mine)

python ps3.py com3 CXR

Which gets me here

PS C:\Users\iamma\OneDrive\CFW\tools> python .\ps3.py com3 CXR
>

Now, this is not a walk in the park -- Syscon has a list of internal and external commands, all case-sensitive. The output is also not the most friendly. For example, here's the output of the VER command, which returns the syscon version (I think? -- not sure on that one)

> VER
00000000 0B8E

This looks like a pair of octets... the first half will equal to 0, but the second, 0B8E, is the softID that maps for versions 1.30 - 4.84, as listed in this page -- https://www.psdevwiki.com/ps3/System_Controller_Firmware#Syscon_update_packages. My ps3 was on 4.84 when it died, so that checks out.

Well, the main thing we're all after is really "why won't it boot??" There's an external command for that

> ERRLOG GET 00
00000000 A0403034 FFFFFFFF

The first parameter is the Command, Get is the subcommand (there's a couple), and then the index for the error log. In my ps3, I have 20 total (0-19), and they all return the same error. I'm assuming that they 00000000 and FFFFFFFF indicate either a memory range, or they're marking the START and END of a list, and the only error on index 00 is A0403034.

Unfortunately, this is where it ends. You have to authenticate to get access to more commands, but I can't figure out how yet. Apparently, I have to open the syscon with the internal command scopen, but when I do that, I get an error that indicates "unknown command" (see here for the list of messages for each error -- https://www.psdevwiki.com/ps3/Error_Codes#SYSCON_Error_Codes)

> scopen
F0000003

If I try to start up the ps3, I get an error that indicates "Not allowed/Not authorized"

> BOOT MODE
F0000005

Others
> PORTSTAT
F0000005
> SPU INFO
F0000005
> CID GET
F0000005
>

Anyone out there has more details on how to get Auth working properly, or the correct way to enable internal commands? I've tried a few things I've seen from other threads, but not anything that has worked so far.
 
For example, by just plugging it all in and turning the back switch on, I get the following (currently connected via Putty to serial)

R:3A:OK 00000000
This means syscon booted sucessful.

According to M4j0r, the syntax is is "python script.py <your serial port number> <CRX or CRXF>" The choice between CRX or CRXF will depend on your syscon version.
CXR is the model found on retail units, CXRF is the model used in prototypes. It's written on the IC itself.

Now, this is not a walk in the park -- Syscon has a list of internal and external commands, all case-sensitive. The output is also not the most friendly.
The script is meant as a PoC, mostly for devs.

> VER
00000000 0B8E

This looks like a pair of octets... the first half will equal to 0, but the second, 0B8E, is the softID that maps for versions 1.30 - 4.84.
In external mode, the first part is always the status. And the SoftID is just the build ID of the firmware (https://pastebin.com/ypsayxLY).

> ERRLOG GET 00
00000000 A0403034 FFFFFFFF

The first parameter is the Command, Get is the subcommand (there's a couple), and then the index for the error log. In my ps3, I have 20 total (0-19), and they all return the same error. I'm assuming that they 00000000 and FFFFFFFF indicate either a memory range, or they're marking the START and END of a list, and the only error on index 00 is A0403034.
Yes, it's a list which wraps around. The error codes means that in power on sequence step 40 error 3034 (CELL BE) occured, that's it. Some time ago a list "explaining" these errors leaked. The "explanation" just tells you which IC is causing the fault.

Unfortunately, this is where it ends. You have to authenticate to get access to more commands, but I can't figure out how yet. Apparently, I have to open the syscon with the internal command scopen, but when I do that, I get an error that indicates "unknown command"
You just have to type "auth" to activate the privileged commands, but internal mode is something different. It's easy to spot if you look at the code.
In external mode you use AUTH1/AUTH2 to get privileged, in internal mode scopen, but the script does everything for you if you just type "auth".

For the internal mode you need to execute "EEP SET 3961 01 00" in external mode, ground the diag pin of the syscon and reset.
If you want to power the console on, you also need to correct the eeprom checksum at 0x39FE as described around here: https://www.psx-place.com/threads/s...o-what-does-it-mean.26148/page-12#post-236929 .
 
You just have to type "auth" to activate the privileged commands, but internal mode is something different. It's easy to spot if you look at the code.
In external mode you use AUTH1/AUTH2 to get privileged, in internal mode scopen, but the script does everything for you if you just type "auth".


Thanks M4j0r ! I did see the auth command (my specialty is SQL, but I can poke my way around python), and that looked like the right thing to use, so thanks for confirming that. In playing with auth, I learned that if I run the script, run a bunch of commands and then auth, I'll get the error
Code:
Auth1 response invalid
. However, if i perform a cold boot, run the script and then auth, it succeeds. I'm guessing it sets internal variables/keys to a known state?

The output is also not the most friendly.

The script is meant as a PoC, mostly for devs.

I agree... I didn't mean that as a criticism, more as a warning for people that have never done this level of analysis before (like me :) ). There won't be anything like "you're missing a bridge wire".

The error codes means that in power on sequence step 40 error 3034 (CELL BE) occured, that's it. Some time ago a list "explaining" these errors leaked. The "explanation" just tells you which IC is causing the fault.

That's very interesting... I'll get looking for that list then, at least to see what else it shows.

For the internal mode you need to execute "EEP SET 3961 01 00" in external mode, ground the diag pin of the syscon and reset.

Just for kicks, I tried this, but only the SET portion, without ground. Once I reset the system, it booted up with 3 beeps and a blinking red light, and would not respond to the ON switch on the front of the console. Lucky for me, I'd saved the original value of 3961, so I set that back and was able to get back to my previous state... whew. Gotta get that ground wire in place. I guess this little experiment also shows that the value doesn't get set back post reboots, I'll have to keep track of what I changed.
 
With my elevated rights, I was able to set 3961 to 00, and then I ground the diag pin on syscon and restarted the system. Unfortunately, I still get the condition of 3 beeps (maybe that's expected?) but I'm unable to enter internal commands still.

At one point, after setting the values to 00, I booted the console and instead got no lights or beeps whatsoever, and syscon was still responsive... but no internal commands either. I'm guessing that's the state I want to reach, I just need to figure out and document what I did to get there.
 
I'll get the error
Code:
Auth1 response invalid
. However, if i perform a cold boot, run the script and then auth, it succeeds. I'm guessing it sets internal variables/keys to a known state?
The auth routine doesn't clear the uart receive buffer prior to executing the auth commands which can lead to errors.

Just for kicks, I tried this, but only the SET portion, without ground. Once I reset the system, it booted up with 3 beeps and a blinking red light, and would not respond to the ON switch on the front of the console. Lucky for me, I'd saved the original value of 3961, so I set that back and was able to get back to my previous state... whew. Gotta get that ground wire in place. I guess this little experiment also shows that the value doesn't get set back post reboots, I'll have to keep track of what I changed.
If you just change the value, the syscon stays in external mode, the flag is non volatile.


If you want to use the internal commands, you have to connect with the option CXRF after setting the eeprom flag and grounding the diag pin. External mode works at 57600 bps and internal mode at 115200bps.
 
Thanks for your help and patience! I was finally able to get into internal mode (turns out my diag to ground wire wasn't as well connected as I thought).

I ran a lot of commands and got a lot of information to sort through. There are a few things that jump out:


Code:
> eepcsum
Addr:0x000032fe should be 0x52b7
Addr:0x000034fe should be 0x7115
sum:0x0100
Addr:0x000039fe should be 0x0f38
Addr:0x00003dfe should be 0x00ff
Addr:0x00003ffe should be 0x00ff

The only change I made so far was 3961, which (if I'm converting that right) doesn't correspond to any of those addresses.

Code:
> bestat
(Error State) (Unknown Error)

That might correspond with 0xa0403034, which you mentioned previously was a CELL BE power failure at step 34

Code:
> rrsxc 00 01
Command: rrsxc 00 01
Answer: rrsxc 00 01
+0       +4       +8       +C
-----------------------------------
I attempted to read from the RSX, like someone else tried over in the Syscon thread, just to see what I got. I jumped around different different offsets and lengths, and got different results for a bit, but then I started getting the same results. I guess I need to go read the rsx documentation to see what that means.

Code:
> rbe 00
Command: rbe 00
Answer: rbe 00
Error status : c6000006
*** MMIO Access Error ***
[mullion]$
Another attempt to read from Cell, and once again, obvious errors. Really starting to look like there's some major issues with Cell. I ran across MMIO over in the EEPROM page (I think?), so I need to go look that up again.

Coincidentally, Cell is where I last changed caps, and the signal there isn't quite as clean as the signal I'm getting from the RSX, so maybe it's time to add another tantalum or 2 until I get better signals.

Another user (LSL) and I are also trying to track down some information on power supply into Cell and RSX, so maybe that'll help us identify the root cause for the power failure.

I've been looking high and low for that leaked list of error codes, but so far I've come up empty. sandungas, you seem to know a bit of just about everything around here, would you happen to know where that list is?
 
Code:
> eepcsum
Addr:0x000032fe should be 0x52b7
Addr:0x000034fe should be 0x7115
sum:0x0100
Addr:0x000039fe should be 0x0f38
Addr:0x00003dfe should be 0x00ff
Addr:0x00003ffe should be 0x00ff

The only change I made so far was 3961, which (if I'm converting that right) doesn't correspond to any of those addresses.
Address 0x3961 is in the 0x39FE range, so this checksum needs to be corrected.

Code:
> bestat
(Error State) (Unknown Error)

That might correspond with 0xa0403034, which you mentioned previously was a CELL BE power failure at step 34
Yes, it does.

Code:
> rrsxc 00 01
Command: rrsxc 00 01
Answer: rrsxc 00 01
+0       +4       +8       +C
-----------------------------------
The valid range is 0x200000 - 0x20FFFF.

Code:
> rbe 00
Command: rbe 00
Answer: rbe 00
Error status : c6000006
*** MMIO Access Error ***
[mullion]$
The valid range for rbe is 0x100000 - 0x100020.
Both of these commands only work if the system is running.
You can also use r32 (range 0x300000 - 0x30FFFF) to read from the SB.


If you run the auth command in internal mode, you can also get the boot log via the syscon uart. It just prints it on bootup.
Furthermore, if you set the 0x48C02 flag to 0x02 it will allow you to get the lv0+ log via the SB uart. You can also activate more lv0ldr debug messages over the syscon uart by setting 0x48C11 to 0x03. (https://www.psdevwiki.com/ps3/SC_EE..._Block_Offset_Mapping_Table_.28NVS_Service.29)
 
While getting some scope reads from IC6103, I think I shorted something, and now I'm getting a much faster boot failure -- seemingly no power to Cell now. I jumped on internal command mode, ran errlog and got the following errors

0xa0203010 -- error 3010 on power up step 20
0xa0091001 -- error 1001 on power up step 9
0xa0081001 -- error 1001 on power up step 8
0xa0093003 -- error 3003 on power up step 9

M4J0r, can I bother you one more time to help me decode these? It sounds pretty obvious the values for boot sequence are serialized, and then I'm guessing the first two digits are the device id (30 for cell) and then 10* identifies something else. I was messing around with pins 19-21 of IC6103 (the PWM), and those are connected to IC6104, IC6105 and IC6106 (all buck switching regulators), so I'm guessing that 10* is going to reference one of them. Those, in turn, go out into the capacitor array + Tokins (or tantalums, in my case), and then out to Cell, so the extended conclusion here is that I damaged either IC6103, or one of the other regulators.
 
Thank you! I've looked all over the place for this (I'll be able to close several browser tabs now). Looks like it's mostly around BE errors still, but POWER FAIL this time, so it maybe I blew a fuse somewhere. Yay... time to track that sucker down.
 
Thank you! I've looked all over the place for this (I'll be able to close several browser tabs now). Looks like it's mostly around BE errors still, but POWER FAIL this time, so it maybe I blew a fuse somewhere. Yay... time to track that sucker down.

I got these errors on CECHA fixed with 16 tantalum caps(8 on CELL and 8 on RSX):

  1. 00000000 A0801103 0B6BF0C7
  2. 00000000 A0801103 0B6BEEEC
  3. 00000000 A0801001 0B6335FF
  4. 00000000 A0801002 0B633216
  5. 00000000 A0801002 0B6331C7
  6. 00000000 A0801002 0B6331AD
  7. 00000000 A0801002 0B63315A
  8. 00000000 A0801002 0B63311E
  9. 00000000 A0801002 0B633110
  10. 00000000 A0801002 0B6330BE
  11. 00000000 A0003001 0B492A07
  12. 00000000 A0003001 0B4929FE
  13. 00000000 A0003001 0B4929FE
  14. 00000000 A0003001 0B4929F5
  15. 00000000 A0003001 0B4929F5
  16. 00000000 A0003001 0B4929EF
  17. 00000000 A0003001 0B4929EF
  18. 00000000 A0003001 0B4929C9
  19. 00000000 A0003001 0B4929C9
  20. 00000000 A0003001 0B4929C4
How I know what they means?
 
In another CECHA(with Ylod 0.5 seconds) I got this:

> VER
Magic
> VER
Magic
> ERRLOG GET 00
Magic
> ERRLOG GET 01
Magic
 
In another CECHA(with Ylod 0.5 seconds) I got this:

> VER
Magic
> VER
Magic
> ERRLOG GET 00
Magic
> ERRLOG GET 01
Magic


I dont know what happens, but it start to work! In this CECHA00 (0.5 sec YLOD) I got 20 errors
00000000 A0101002 FFFFFFFF
 
To get the script to run, you need to type

Code:
python script.py <port #> CXR

The way the errors work is that every time you boot and get an error, that becomes error 00, and then all the other errors get bumped +1

To get error 00, you type

Code:
ERRLOG GET 00

If you're getting 20 errors, then the last 20 boots got the same errors. The error you got means

101002 - boot step number 10, RSX Vram power fail.
 
I got these errors on CECHA fixed with 16 tantalum caps(8 on CELL and 8 on RSX):

  1. 00000000 A0801103 0B6BF0C7
  2. 00000000 A0801103 0B6BEEEC
  3. 00000000 A0801001 0B6335FF
  4. 00000000 A0801002 0B633216
  5. 00000000 A0801002 0B6331C7
  6. 00000000 A0801002 0B6331AD
  7. 00000000 A0801002 0B63315A
  8. 00000000 A0801002 0B63311E
  9. 00000000 A0801002 0B633110
  10. 00000000 A0801002 0B6330BE
  11. 00000000 A0003001 0B492A07
  12. 00000000 A0003001 0B4929FE
  13. 00000000 A0003001 0B4929FE
  14. 00000000 A0003001 0B4929F5
  15. 00000000 A0003001 0B4929F5
  16. 00000000 A0003001 0B4929EF
  17. 00000000 A0003001 0B4929EF
  18. 00000000 A0003001 0B4929C9
  19. 00000000 A0003001 0B4929C9
  20. 00000000 A0003001 0B4929C4
How I know what they means?
I have the same errors, did you fix it by changing the capacitors?

===================================
ERR 00: 00000000 A0003001 FFFFFFFF
ERR 01: 00000000 A0003001 FFFFFFFF
ERR 02: 00000000 A0003001 FFFFFFFF
ERR 03: 00000000 A0003001 FFFFFFFF
ERR 04: 00000000 A0801002 FFFFFFFF
ERR 05: 00000000 A0801002 FFFFFFFF
ERR 06: 00000000 A0801002 FFFFFFFF
ERR 07: 00000000 A0801002 FFFFFFFF
ERR 08: 00000000 A0801002 FFFFFFFF
ERR 09: 00000000 A0801002 FFFFFFFF
ERR 10: 00000000 A0101002 267496D7
ERR 11: 00000000 A0801001 1C1D1BEC
ERR 12: 00000000 A0801001 1BE18FDE
ERR 13: 00000000 A0801001 1B6F080D
ERR 14: 00000000 A0801001 1B6D54E0
ERR 15: 00000000 A0801001 1B471260
ERR 16: 00000000 A0801001 1B3FC1FD
ERR 17: 00000000 A0801001 1B3B2346
ERR 18: 00000000 A0801001 1B0EDAAD
ERR 19: 00000000 A0801001 1B0E6447
===================================
 
I have the same errors, did you fix it by changing the capacitors?

===================================
ERR 00: 00000000 A0003001 FFFFFFFF
ERR 01: 00000000 A0003001 FFFFFFFF
ERR 02: 00000000 A0003001 FFFFFFFF
ERR 03: 00000000 A0003001 FFFFFFFF
ERR 04: 00000000 A0801002 FFFFFFFF
ERR 05: 00000000 A0801002 FFFFFFFF
ERR 06: 00000000 A0801002 FFFFFFFF
ERR 07: 00000000 A0801002 FFFFFFFF
ERR 08: 00000000 A0801002 FFFFFFFF
ERR 09: 00000000 A0801002 FFFFFFFF
ERR 10: 00000000 A0101002 267496D7
ERR 11: 00000000 A0801001 1C1D1BEC
ERR 12: 00000000 A0801001 1BE18FDE
ERR 13: 00000000 A0801001 1B6F080D
ERR 14: 00000000 A0801001 1B6D54E0
ERR 15: 00000000 A0801001 1B471260
ERR 16: 00000000 A0801001 1B3FC1FD
ERR 17: 00000000 A0801001 1B3B2346
ERR 18: 00000000 A0801001 1B0EDAAD
ERR 19: 00000000 A0801001 1B0E6447
===================================
UPDATE: i was able to fix it by changing the rsx capacitors
 
After looking through this thread, I think this board had a BGA problem from a long time ago (4421x3034), which has now also become a capacitor issue (1002x3001).

Screenshot of the internal error log:
https://imgur.com/7Y83qoe

I don't believe in reflowing and I don't have the equipment for a reball. Thinking of just making this a parts board. I bought this CECHGQ1 (SEM-001) with a working PS2 and working OGXB for $80.

Thanks in advance for any input.
 
Back
Top