PS3 Fault finding YLOD with the SYSCON - First steps and Error reporting

Hey, I really liked that video! He makes a ton of valid points that are equally applicable to both the Mac community and the PS3 community, and yeah, it's very well possible that in the cases where the tokin replacement didn't work that the RSX is just plain dead. I'm hoping with this thread that we can come up with a more reliable way to determine when that's really the case, and when the issue is elsewhere.
 
What I find interesting from your errlog is that you have multiple errors at the same step, #40, namely --
* 3034 BE ERROR (but in the Fatal booting error)
* 4412 BE or RSX Error (But in the data error)

I don't think we've quite nailed the root cause of 3034 errors, but we do have at least 1 confirmed case of disappearing after a reflow. I haven't seen a 4412 error, but since it's in the "Data Error" category, it sounds like it's a communication issue between Cell and RSX? There's not many components between those two ( as seen on pages 29 and 30 of the service manual), pretty much just capacitors, maybe it's worth checking the resistance on those? At the very least, they should not show 0 or infinite resistance.

Sorry for quality.
NEC RSX 4,2ohm 4,1ohm (left and right site)
NEC CELL 2,8ohm (left and right site)
bottom right 14,2ohm
top right 13,1ohm
top left 13,1ohm
bottom left - twice 82,3/82,6ohm - its interesting.
W8TdHRl.jpg
 
My second PS3 LOG - YLOD but repair (working) by heat gun.

Code:
>$ errlog
errlog
ofst[124]:err_code:0xffffffff, clock:0x24d35cdb  2019/07/30 19:54:03
ofst[  0]:err_code:0xa0404402, clock:0x2596313d  2019/12/25 14:39:57
ofst[  4]:err_code:0xa0403034, clock:0x2596313d  2019/12/25 14:39:57
ofst[  8]:err_code:0xa0404402, clock:0x25963150  2019/12/25 14:40:16
ofst[ 12]:err_code:0xa0403034, clock:0x25963150  2019/12/25 14:40:16
ofst[ 16]:err_code:0xa0404402, clock:0x25cb7ff4  2020/02/04 01:05:56
ofst[ 20]:err_code:0xa0403034, clock:0x25cb7ff4  2020/02/04 01:05:56
ofst[ 24]:err_code:0xa0404402, clock:0x25cb8002  2020/02/04 01:06:10
ofst[ 28]:err_code:0xa0403034, clock:0x25cb8002  2020/02/04 01:06:10
ofst[ 32]:err_code:0xa0404402, clock:0x25cb8dc4  2020/02/04 02:04:52
ofst[ 36]:err_code:0xa0403034, clock:0x25cb8dc4  2020/02/04 02:04:52
ofst[ 40]:err_code:0xa0404402, clock:0x25cb8dca  2020/02/04 02:04:58
ofst[ 44]:err_code:0xa0403034, clock:0x25cb8dcb  2020/02/04 02:04:59
ofst[ 48]:err_code:0xa0404402, clock:0x25d80c22  2020/02/13 13:31:14
ofst[ 52]:err_code:0xa0403034, clock:0x25d80c22  2020/02/13 13:31:14
ofst[ 56]:err_code:0xa0404402, clock:0x25f50eba  2020/03/06 13:38:02
ofst[ 60]:err_code:0xa0403034, clock:0x25f50eba  2020/03/06 13:38:02
ofst[ 64]:err_code:0xa0404402, clock:0x2613c6d8  2020/03/29 20:51:36
ofst[ 68]:err_code:0xa0403034, clock:0x2613c6d8  2020/03/29 20:51:36
ofst[ 72]:err_code:0xa0404402, clock:0x265ea960  2020/05/25 16:05:52
ofst[ 76]:err_code:0xa0403034, clock:0x265ea960  2020/05/25 16:05:52
ofst[ 80]:err_code:0xa0404402, clock:0x267e66db  2020/06/18 17:54:35
ofst[ 84]:err_code:0xa0403034, clock:0x267e66db  2020/06/18 17:54:35
ofst[ 88]:err_code:0xa0404402, clock:0x267e6802  2020/06/18 17:59:30
ofst[ 92]:err_code:0xa0403034, clock:0x267e6802  2020/06/18 17:59:30
ofst[ 96]:err_code:0xa0404402, clock:0x26865fb7  2020/06/24 19:02:15
ofst[100]:err_code:0xa0403034, clock:0x26865fb7  2020/06/24 19:02:15
ofst[104]:err_code:0xa0404402, clock:0x26865fc1  2020/06/24 19:02:25
ofst[108]:err_code:0xa0403034, clock:0x26865fc1  2020/06/24 19:02:25
ofst[112]:err_code:0xa0404402, clock:0x26866398  2020/06/24 19:18:48
ofst[116]:err_code:0xa0403034, clock:0x26866398  2020/06/24 19:18:48
ofst[120]:err_code:0xa0801200, clock:0x26866cc6  2020/06/24 19:57:58

>$ bringup
bringup
[SSM] state: 0000 -> 0101
Bringup Mode #0 (0xFF)
[SSM] ssmCb_OnStartingBePowOn() called.
[SSM] First Boot.
[SSM] Bringup mode : syspm_stat=00000000/00000000
[POWSEQ] PowerSeq_Setup called.
[SSM] state: 0101 -> 0201
[POWSEQ] AV Backend Setup
[SSM] state: 0201 -> 0102
[SSM] state: 0102 -> 0202
[SSM] state: 0202 -> 0103
[SSM] state: 0103 -> 0203
[SSM] ssmCb_BeforeBeOn() called.
[SSM] state: 0203 -> 0104
Psbd_SbTransMode_Half:0x20e7
 
So top right at 14.1 ohms can mean several problems, poor BGA contact (reflow required) or core issues

On the cok-001 service manual page 9 gives a pinout of those points - B34 to B39 VO pins

To properly reflow you need a preheater!, a hotgun wont complete a full reflow at the correct temps. and will likely fry the RSX chip.

If you can get it to reflow fully i suspect it will work again as to me looks like poor connections.
Page 9 on the service manual is important and gives an indication of the suspected issue.

You can find a port mapping of the RSX on the internet to match up what ports go where.

https://imgur.com/a/EGMb00N

Sorry for quality.
NEC RSX 4,2ohm 4,1ohm (left and right site)
NEC CELL 2,8ohm (left and right site)
bottom right 14,2ohm
top right 13,1ohm
top left 13,1ohm
bottom left - twice 82,3/82,6ohm - its interesting.
W8TdHRl.jpg
 
What I find interesting from your errlog is that you have multiple errors at the same step, #40, namely --
* 3034 BE ERROR (but in the Fatal booting error)
* 4412 BE or RSX Error (But in the data error)

I don't think we've quite nailed the root cause of 3034 errors,
It is due to POWERSEQ Bittraining error -- ie. initial back-and-fort between BE and RSX
look closely the logs, your:
[POWERSEQ] Error : BitTraining BE:RRAC:RX0:GLOBAL1:RX_STATUS
and inazuma-akai's log:
[POWERSEQ] Error : BitTraining RSX:RRAC:RX1:GLOBAL1:RX_STATUS
 
Marciolsf kindly shared this pdf with us in this thread near the beginning, so just added to my github to share.

I'm really hoping that this thread will develop into something that will be useful to preserve the PS3 and understand on fault repairs.

db260179, the syscon error log codes.pdf in your github repo is pure gold! Where did you find it?
 
Cool. Once I've soldered the wires on though it will be impossible to put the MB shield on. Are you using a standard ATX power supply and is it safe to run the PS3 without a heatsink?
 
So luckly the fat model has alot of space and curves to allow jumper cables spacing. What i do is feed the 4x jumper cables through the nearest capacitor hole on the metal shield and feed through the ventaliation holes - this allows me to put the whole ps3 back together and connect at will from the outside of the ps3 case. I'll upload a photo of this setup.

Just make sure to use electric tape to protect from shorting on the soldered points.

The ps3 gets too hot too quickly to not have a heatsink on, but it can be used as a test to isolate what issues are occuring. but not long term.

Cool. Once I've soldered the wires on though it will be impossible to put the MB shield on. Are you using a standard ATX power supply and is it safe to run the PS3 without a heatsink?
 
Yep i agree, we dont have the full understanding of the error codes, we kind of know what area to look at based on each error code - which is a start.

For now 3034 error is a VO issue - which 90% of the time is poor BGA connectivity.

I think a database with typical ylod symptoms and error messages would speed diagnosis up. and what likely to fix it would be a great start.

Probably a flow chart (typical electronics diagnostics procedure) would facilitate this.

Also looking at the tools to use, logic anaylzers, preheaters etc would be good tools to mention.

Hey, I really liked that video! He makes a ton of valid points that are equally applicable to both the Mac community and the PS3 community, and yeah, it's very well possible that in the cases where the tokin replacement didn't work that the RSX is just plain dead. I'm hoping with this thread that we can come up with a more reliable way to determine when that's really the case, and when the issue is elsewhere.
 
Yep i agree, we dont have the full understanding of the error codes, we kind of know what area to look at based on each error code - which is a start.

For now 3034 error is a VO issue - which 90% of the time is poor BGA connectivity.

I think a database with typical ylod symptoms and error messages would speed diagnosis up. and what likely to fix it would be a great start.

Probably a flow chart (typical electronics diagnostics procedure) would facilitate this.

Also looking at the tools to use, logic anaylzers, preheaters etc would be good tools to mention.
Well, let's play with the idea of a workflow! Historically, the RSX has a higher failure rate, so we could start there by measuring resistance points for known good values (those would be error 3034). Assuming those are good, we can step back through the tokins (3003/3004?), and then another step back through the PWM modules (100* ??). I'm sure I'm missing a million things (errors in the 2000 series?), but we could start high-level, maybe 3-5 steps, and then work our way down.
 
Me and the user @marciolsf started a few days ago to study the SYSCON errors, we both have PS3 with the error ending in 3034.
We have already checked the sectors for some errors ending:
3001 - Without 12v from the source.
3003 - VRM for CELL with problem. It can be NEC Tokins or PWM or any other component of this sector
3004 - VRM for RSX with problem. It can be NEC Tokins or PWM or any other component in this sector.
Good News, more 2 solutions error:

1001 - BE VRAM Power Fail. It can be NEC Tokins
1002 - RSX VRAM Power Fail. It can be NEC Tokins

I had 3 PS3 with this errors, one with errors 1001 and 1002 and two with error 1002 only.

I made this set on all 3 PS3 and after this no more errors:
MiXBrgk.jpg


On the second PS3 I fixed error 1002, then tested it with The Last of Us and
after 20 minutes, approximately, a Ylod occurs(this unit has been working for over on hour with Gran Turismo HD, in silence). I put the UART and saw the 1001 error, closed the unit without repairing the error 1001, turned on and gets Ylod immediately, re-opened, accessed the SYSCON and errors 2102 and 3034 appear. I fixed the error 1001 but is still in Ylod. Unfortunately these 2 errors are alternated. Example: Error 2102 occurs once and 3034 occurs the other time and so on.

I have 3 PS3 with 3034 error, two of them the Ylod disappears if I press the card. Of the 2, in one there was no Ylod ocorrence after the pressure and the other (which has errors 1001 and 1002) the Ylod returns after undoing the pressure. In both devices, pressure was applied at the same point:
r5cWGsq.jpg
 
It is due to POWERSEQ Bittraining error -- ie. initial back-and-fort between BE and RSX
look closely the logs, your:
[POWERSEQ] Error : BitTraining BE:RRAC:RX0:GLOBAL1:RX_STATUS
and inazuma-akai's log:
[POWERSEQ] Error : BitTraining RSX:RRAC:RX1:GLOBAL1:RX_STATUS

On my CECHB01 with errors 2102 and 3034

Error : BitTraining RSX:RRAC:BX0:BX:FLEXIO_ID

This is a picture hours before they errors beginning:
dXJvF2u.jpg



On another PS3 with error 3034, too:
[POWERSEQ] Error : BitTraining BE:RRAC:RX0:GLOBAL1:RX_STATUS
 
Can't run "AUTH"; "Auth1 response invalid" Print statements show that the "checksum" variable doesn't match, Can't get it to work with python 3.8, I've gotten one error "A0404432" and many error "A0403034"
CECHE-01
upload_2020-7-2_19-30-57.png

Where do i go from here? Previously i got this ps3 working again by hitting it with a hot air gun, lasted a couple of months
 
Can't run "AUTH"; "Auth1 response invalid" Print statements show that the "checksum" variable doesn't match, Can't get it to work with python 3.8, I've gotten one error "A0404432" and many error "A0403034"
CECHE-01
View attachment 26574
Where do i go from here? Previously i got this ps3 working again by hitting it with a hot air gun, lasted a couple of months
I tend to have issues with Auth if that's not the very first command I run. You might want to try exiting it out, switching the ps3 off, then back on, run the script and then auth right away.

The current thinking with 3034 errors is BGA failure, we've had 2-3 test cases where they were fixed with either a reflow/reball, or by pushing on different spots on the back of the board until success. It'll definitely not be the same spot from board to board, though!

Error 4432 corresponds to BE or RSX error, so it's likely related to the issues on RSX with 3034.
 
I was revisiting a conversation I had with M4j0r, when I was first getting started with syscon stuff, and he suggested the following. At that point, it was a bit over my head, but now we've had more time to get familiar with syscon (and thanks to db260179's guide!), I think it makes more sense now.

Furthermore, if you set the 0x48C02 flag to 0x02 it will allow you to get the lv0+ log via the SB uart. You can also activate more lv0ldr debug messages over the syscon uart by setting 0x48C11 to 0x03. (https://www.psdevwiki.com/ps3/SC_EE..._Block_Offset_Mapping_Table_.28NVS_Service.29)

So, @db260179, can you spot check this for me? I'm going to try and enable lv0ldr messages. This link (https://www.psdevwiki.com/ps3/SC_EEPROM#Dumpable_SC_EEPROM_Offset_-_Block_ID_and_Block_Offset_Mapping_Table_.28NVS_Service.29) explains 0x48C11 as
Code:
bootrom_trace_level  0x48C11 0x01 [0x01 level]

I sent the following read command, so I can restore the flag afterwards
Code:
>$ r 48c11 01
r 48c11 01
+0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +A +B +C +D +E +F
-----------------------------------------------
FF

If I were to update that flag to 03, I'd have to send the following command. Is this correct? (meaning, the current value is FF, and i want to write 03 instead
Code:
w 48c11 ff 03

Or is this the correct command? (meaning, the current location is 01, and I want to write 03 in that location.
Code:
w 48c11 01 03

I'm also thinking of enabling 02 on 0x48c02, but I'm going to need a second uart :) The SB UART pins are right near the syscon ones, so it should be easy... I'll probably order one early next week.
 
If I were to update that flag to 03, I'd have to send the following command. Is this correct? (meaning, the current value is FF, and i want to write 03 instead
Code:
w 48c11 ff 03

Or is this the correct command? (meaning, the current location is 01, and I want to write 03 in that location.
Code:
w 48c11 01 03
That is wrong, you will write FF to 48c11 and 03 to 48c12 or 01 to 48c11 and 03 to 48c12
The 'w' /write byte/ command is the following:
w <addtress> <byte0> [byte1] [byte2...
(do not confuse with external command EEP SET <addr> 01 <byte> // 01 is length but when tried with longer i got error -- works in GET though)
 
wow, a lot has happened. I can't keep up with you.
Are you writing about "bootrom trace level (0x00: fatal errors, 0x01: errors, 0x02: information messages, 0x03: debug messages)"?
What does changing the 48c11 bit to 02 or 03 give us?

Additional question. Is UART the right way to read the Blueray driver ID and write other IDs to be able to swap readers?
 
wow, a lot has happened. I can't keep up with you.
Are you writing about "bootrom trace level (0x00: fatal errors, 0x01: errors, 0x02: information messages, 0x03: debug messages)"?
What does changing the 48c11 bit to 02 or 03 give us?

Additional question. Is UART the right way to read the Blueray driver ID and write other IDs to be able to swap readers?
You can use QA flag and REBUG CFW to remarry, no need to read any ID.
 

Similar threads

Back
Top