PS3 Fault finding YLOD with the SYSCON - First steps and Error reporting

raidriar · Jan 16, 2022

RIP-Felix said:
I hear what you're saying, but using that much heat right next to the RSX/CPU will introduce a great deal of strain to the BGA. It doesn't look like you have a BGA defect yet, but that can and has changed after attempting to remove/replace tokins. I absolutely AM trying to scare you.

This is the reason I made the Tantalizers, so you can remove/replace tokins with a longer lasting solution with the minimal amount of heat.

even with a hot air station, there is still risk of damage? I was thinking of using foil around the caps to reduce heat to other areas of the board

RIP-Felix · Jan 16, 2022

raidriar said:
even with a hot air station, there is still risk of damage? I was thinking of using foil around the caps to reduce heat to other areas of the board

Yes, in fact you will not likely be able to do it with a hot air wand alone. You will most likely need to crank the heat to dangerous temperatures to saturate the ground plane. Otherwise you will not get enough heat into the board to get them off. That will damage nearby components regardless of masking. I call such damage a "heatgun special."

What you will need to do it properly is a T-8280 or another pre-heater to get the board to 150C before using the hot air wand. And I Highly recommend a BGA rework Jig to prevent warping the motherboard. After the rework, slowly lower the pre-heater temps to 60C (in 20C increments every 10 minutes) before turning off and allowing to cool completely.

The tantalizer is easier and cheaper!

The easiest solution is just do this:

If it works you know the problem and kick the ball down the road long enough to plan a Tantalizer install. At that point you can use uv curable solder mask to cover the copper you exposed to add the piggybacked cap (if you want to).

RIP-Felix · Jan 16, 2022

RIP-Felix said:
I thought I should post this here also. I have noticed alot of 90 2120 HDMI errors are associated with 40 3034. It happened on my system where the HDMI encoder is fine, but the BGA is not. I got a 2120 on PS3#7, a reballed system that definitely has a good HDMI encoder. It was working in game before the latest Tantalum installation using the SoulKilla PCB. I had to dump a lot of heat into the area near the RSX (just used an Iron, no hot air or anything crazy), but it was apparently enough to pop a BGA connection. The BGA is very delicate and the TOKINs are right there next to the BGA with a very thick Copper plane to conduct the heat. It now has a classic BGA defect SYSCON error, a 40 3034. And it also had a 90 2120.

I have a theory about the 2120. Since the RSX is not communicating correctly during bittraining (3034), the system tries to shutdown. The SYSCON is probably expecting some kind of confirmation from the HDMI encoder that it received the shutdown signal, or there is come coordination between the RSX and the HDMI encoder that was expected, but since the RSX isn't powered on that signal isn't working. The BGA defects on the RSX causing the 40 3034 is the bigger issue. The 2120 is meaningless...

At least that's what I thought until I had an idea. I wondered if the 2120 only happens when the HDMI cable is plugged into the console while trying to boot. So I just unplugged the HDMI cable, ran the bringup command to capture the YLOD --> Only got a 3034. Then Plugged the HDMI cable back in and ran the console again --> got the 3034 and 2120. So yeah, that's all it is. If the HDMI cable is plugged in, it must be seen by the SYSCON as ready to receive a signal or something, but when the RSX never get's powered on and the shutdown error state is initialized the HDMI encoder must send some kind of "Hey I never heard anything from my friend the RSX?"

In any case, now we know to expect a 2120 if the HDMI cable is plugged into the console when there is a BGA defect (3034) - That it doesn't mean there's a problem with the HDMI encoder. And that if there is, it'll show up in a different context (like the cable is unplugged or there are no BGA defects).

Code:

Microsoft Windows [Version 10.0.18363.1379] (c) 2019 Microsoft Corporation. All rights reserved. C:\Users\HTPC>CD C:\Users\HTPC\Desktop\PS3\SYSCON C:\Users\HTPC\Desktop\PS3\SYSCON>python ps3_syscon_uart_script.py COM4 CXRF >$ AUTH Auth successful >$ bringup bringup [SSM] state: 0000 -> 0101 Bringup Mode #0 (0xFF) [SSM] ssmCb_OnStartingBePowOn() called. [SSM] First Boot. [SSM] Bringup mode : syspm_stat=00000000/00000000 [POWSEQ] PowerSeq_Setup called. [SSM] state: 0101 -> 0201 [POWSEQ] AV Backend Setup [SSM] state: 0201 -> 0102 [SSM] state: 0102 -> 0202 [SSM] state: 0202 -> 0103 [SSM] state: 0103 -> 0203 [SSM] ssmCb_BeforeBeOn() called. [SSM] stat >$ lasterrlog e: 0203 -> 0104 Psbd_SbTransMode_Half:0x20e2 [POWERSEQ] Error : BitTraining RSX:RRAC:BX0:BX:FLEXIO_ID [SSM] state: 0104 -> 0304 [SSM] ssmCb_AfterBeOn2() called. [SSM] PowSeq Fail : Detected ! [SSM] state: 0304 -> 0700 [POWSEQ] AV Backend Letup [SSM] Shutdown mode : syspm_stat=00000000/00000000 [ERROR]: 0xa0403034 [POWSEQ] PowerSeq_Letup called. [SSM] state: 0700 -> 0600 (PowerOff State) (Fatal) lasterrlog Last Error Code:0xa0403034, Time:0x0b569e8e 2006/01/10 16:34:22 [mullion]$ >$ lasterrlog [SSM] state: 0600 -> 0000 [SSM] Error state is cleared. (PowerOff State) [SSM] state: 0000 -> 0101 Bringup Mode #0 (0xFF) [SSM] ssmCb_OnStartingBePowOn() called. [SSM] Bringup mode : syspm_stat=00000000/00000000 [POWSEQ] PowerSeq_Setup called. [SSM] state: 0101 -> 0201 [POWSEQ] AV Backend Setup [SSM] state: 0201 -> 0102 [SSM] state: 0102 -> 0202 [SSM] state: 0202 -> 0103 [SSM] state: 0103 -> 0203 [SSM] ssmCb_BeforeBeOn() called. [SSM] state: 0203 -> 0104 Psbd_SbTransMode_Half:0x20e2 [POWERSEQ] Error : BitTraining RSX:RRAC:BX0:BX:FLEXIO_ID [SSM] state: 0104 -> 0304 [SSM] ssmCb_AfterBeOn2() called. [SSM] PowSeq Fail : Detected ! [SSM] state: 0304 -> 0700 [POWSEQ] AV Backend Letup [SSM] Shutdown mode : syspm_stat=00000000/00000000 [ERROR]: 0xa0403034 [ERROR]: 0xa0902120 [POWSEQ] PowerSeq_Letup called. [SSM] state: 0700 -> 0600 (PowerOff State) (Fatal) lasterrlog Last Error Code:0xa0902120, Time:0x0b569ec7 2006/01/10 16:35:19 [mullion]$ >$

RIP-Felix said:
Honestly, I'm surprised I was right about the BGA. That's a win for the research effort, even if it is a loss overall. At least we're steadily demystifying some of these errors.

I'm about ready to add DVE/HDMI errors to the list of errors caused by BGA defects. That list includes 2020/2120, 2022/2122, & 2024/2124. But 2020/2120 especially are the ones most often associated with 3034/4xxx's. Not always, but often. In the case of 2024/2124, they don't appear on 90nm consoles. They are for later models with more reliable RSX chips, so I think that's why they tend to fixed by replacing the AV/HDMI controller IC's. But that doesn't mean they can't be cause by BGA defects too. It's just those BGA's are more reliable and tend to outlive the chips, whereas cases of 2020/2120 on 90nm board the opposite is more prevalent.

Since nearly half of the RSX's BGA pads (those near the edge) are AV/HDMI related (VDDIO), it makes sense anything related to DVE/HDMI can be affected by BGA defects. And it closes the BGA related YLOD vs GLOD conundrum. YLOD = critical system failure affecting the Power on Sequence. A major voltage system, reset, power good, CELL/RSX initialization, BitTraining, or system control. GLOD = non POS critical system, like DVE/HDMI I/O or the bootloader.

It's disappointing that most errors lead back to the BGA. But by now I guess I shouldn't be surprised.

Some more context to this. I just re-read the original post I made about that 902120 I have in connection to the 403034, which the SYSCON said was a RSX:RRAC:BX0:BX:FLEXIO_ID error. I happened to catch it on the O-scope at the time:

I nicked named it the "stairway to hell."

It was the RSX. I just pressed down on the leaf spring w/o the BR in place. I felt/heard it compress. Then it booted while it stayed warm, super stable for testing. In the morning after everything had relaxed back to normal position. Instant YLOD (40 3034 + 90 2120). Classic BGA defect.

Looks to me like the VRM is stepping down each VID increment and perhaps checking it against the voltage reference it's expecting to return. A sort of self check after encountering a break in the RSX FlexIO. Not sure what to make of it yet. Just wanted to make note of it here so I can later break it down.

If you have any ideas, I'm all ears.

EDIT:
I need to look into this further...

TwelveAtNight said:
I have great news you! My A01 had the exact same issue as yours where F6302 was blown and C6320 was shorted. It had errors A0213013 and A0202120. I fixed it by replacing F6302 and C6320.

I may be wrong that 202120/213013 means dead CPU. I think there was another console that had issues with fuses in that area causing 3013, but I'll get back to this.

Maybe I shouldn't be posting these diatribes? This is currently one of the error combos that is confusing me and that I need to sift through the data to form a better hypothesis to explain. If you all preferr I didn't post about it until I have a better formed theory, I totally get that.

DeadEnd · Jan 16, 2022

RIP-Felix said:
Maybe I shouldn't be posting these diatribes? This is currently one of the error combos that is confusing me and that I need to sift through the data to form a better hypothesis to explain. If you all preferr I didn't post about it until I have a better formed theory, I totally get that.

I think everything you post is valuable. You're doing a massive contribution to psx place. Personally I don't mind going through a lot of text when it's so articulately written. But I'm even more mind-blown at your topology report. At this rate you'll be able to design your own ps3

marciolsf · Jan 16, 2022

We had a bit of a discussion recently about livelocks (I don't remember which thread, at this point it's all a jumble in my head). I found an IBM document about cell (the 65nm model, in particular), in which they detail livelock resolution. From what I gather, Cell will tell syscon "hey, there's some blocking here!", and then leaving up to the syscon to address it. In most database engines, this condition is called a "dead lock", and generally ends up with the engine killing off one of the conflicting sessions. I know this is not a database, but the principles seem similar.

This particular line is interesting

The ATTENTION signal is asserted when the Cell BE processor detects a livelock condition. This
signal is the same as the ATTENTION signal used during the power-on reset sequence.

So maybe this is backing up your idea earlier, @RIP-Felix, that a BGA breakdown during operation throws an error (I forget which one you suggested), and then it throws a 3034, which is the same ATTENTION signal that is designed to prevent a full boot.

Appendix C. Livelock Resolution Mode
Livelocks occur when one or more units in a processor element cannot make forward progress.
The Cell Broadband Engine (Cell BE) processor contains several internal mechanisms to avoid
livelock. One example is the pseudorandom retry backoff mechanism. Although this mechanism
eliminates most livelocks, some can still occur. In addition to the internal mechanisms, the
Cell BE processor provides an external notification to the system controller when a livelock is
detected and is not resolved. The notification is in the form of an ATTENTION signal to the
system controller. In response to the ATTENTION signal, the system controller can further alter
the operation by enabling the livelock resolution mode, which alters the operation of the
processor and typically resolves the livelock.
Like livelocks, rare cases exist in which a processing element is making forward progress, but at
an extremely slow rate. This is referred to as starvation. The internal mechanisms are typically
sufficient to prevent starvation. Starvation is not detected by the Cell BE processor. A system
controller can, however, randomly and at a very slow rate, enable and then disable the livelock
resolution mode to resolve any condition causing starvation.
For performance reasons, the system controller should never leave the Cell BE processor in the
livelock resolution mode for an extended period of time. The system should provide a mechanism
for the system controller to notify the operating system that a livelock was detected and resolved.

C.1 System Controller Actions
The ATTENTION signal is asserted when the Cell BE processor detects a livelock condition. This
signal is the same as the ATTENTION signal used during the power-on reset sequence. Other
conditions can also cause the ATTENTION signal to be asserted. The system controller should
monitor the ATTENTION signal using either polling or interrupts. When the ATTENTION signal is
asserted, the system controller should read the Read SPI Status Register (rd_spi_status). If
rd_spi_status[0,7] are both set, the Cell BE processor has detected a livelock condition. The
system controller should then perform the following sequence:
1. Write wr_spi_status[18] = '1' to throttle the PowerPC Processor Element (PPE).
2. Write wr_spi_status[16,17,19] = '1' to quiesce transactions and enable the livelock resolution
mode:
• wr_spi_status[16] = '1' quiesces memory flow controller (MFC) bus transactions.
• wr_spi_status[17] = '1' quiesces PPE bus transactions.
• wr_spi_status[19] = '1' enables livelock resolution mode.
3. Write wr_spi_status[18] = '0' to stop throttling the PPE. (If this step is not performed, the
Cell BE processor will not resolve the livelock.)
4. Write wr_spi_status[4] = '1' to reset and resample the livelock condition by deactivating the
ATTENTION signal.
5. Read rd_spi_status[7] to determine if the livelock is resolved. This bit will return '0' if the livelock
is resolved, or it will return '1' if the livelock is not resolved.
6. If the livelock is resolved, perform these next steps:
a. Write wr_spi_status[16,17,19] = '0' to remove the quiesce for the MFC and PPE bus
transactions, and to disable the livelock resolution mode.
b. Notify the operating system to indicate that a livelock has been detected and resolved.
7. If the livelock is not resolved, then the system controller should assert the CHECKSTOP_IN
signal and perform any system-dependent operations for reporting a checkstop condition.
Optionally, the system controller can perform steps 1 through 6a at random intervals to resolve
any potential starvation conditions. Steps 1 through 6 should be performed sequentially, and the
intervals between performing these steps should be randomly spaced

marciolsf · Jan 16, 2022

Reading more into this document, it has a ton of info regarding the boot up sequence. Is this known stuff? Would anyone like to see the rest of it?

RIP-Felix · Jan 16, 2022

marciolsf said:
Reading more into this document, it has a ton of info regarding the boot up sequence. Is this known stuff? Would anyone like to see the rest of it?

Please LINK, use your triforce of courage to post the location of the triforce of POWER on sequence, so that we may finally attain the the triforce of wisdom...

marciolsf · Jan 16, 2022

Hopefully it helps! I don't remember where I found it, I was just looking for Cell info

https://1drv.ms/b/s!AsVyji_NVV9pz_hku5LKZm4DM-HmMg?e=m0Kpzq

M4j0r · Jan 16, 2022

marciolsf said:
Reading more into this document, it has a ton of info regarding the boot up sequence. Is this known stuff? Would anyone like to see the rest of it?

Yes, the HIG is very useful, IBM did only release the 90nm and 65nm variants, the 45nm wasn't released (but exists). You can find a lot of information from these on the devwiki.

marciolsf said:
The ATTENTION signal is asserted when the Cell BE processor detects a livelock condition. This
signal is the same as the ATTENTION signal used during the power-on reset sequence.

This signal and the Checkstop Out are mostly used if the firmware detects any type of (security) error condition (e.g. if you downgrade below the minimum version). During the CELL init process they do have a different uses though.

marciolsf · Jan 16, 2022

This is another thing I ran into yesterday, but this is more "interesting" than useful, I think. It gives a bit more information on what flexIO does, and the relationship between cell and rsx.

The special characteristic of the PS3 is the connection between Cell and RSX

The big special characteristic of PS3 Graphics is the connection between Cell and RSX. The RSX itself has a similar architecture to the G70, but the host interface for the G70 is meant for the PC and is completely different. The G70 uses PCI Express x16 to connect to the chipset as 8GB/sec (4GB/sec one-way), and it cannot directly access main memory. In contrast, the RSX has a 35GB/sec (20GB/sec down, 15GB/sec up) direct connection to the Cell, and can directly render from the main memory on the Cell side.

This is a big difference, because it allows a completely different way of using the GPU from PC architectures, SCEI explained. First of all, because the bus is wider, the Cell can perform a great amount of geometry operations, then send the vertex data [to the RSX]. Conversely, the RSX side can easily send data back to the Cell.

https://www.neogaf.com/threads/detailed-overview-of-ps3-development-station-vs-ps3-console.56541/

RIP-Felix · Jan 16, 2022

RIP-Felix said:
I just wanted to post Victors resistance values for anyone following along. It's interesting to see a direct comparison of these. I want to point out that the 2 JSD-001's have different RSX models (CXD5301 on the working console and CXD5300 on the GLOD). It's shouldn't matter? But maybe there are slight differences in the resistances we measure. Also, some of those values will be different on every console. So we're really looking for values that are way off.

Working CECH25xx (JSD-001) and known good RSX (CXD5301)

VDDC = 2.7Ω (A Healthy value. New ≈ 3.2Ω)

VDDQ = 100.3Ω

VDDIO = 96.5Ω

YC_RC_VDDIO (FlexIO) = ? (He forgot to measure)

VDDA = 61.0Ω

VDDR = 449.6Ω

CECH25xx (JSD-001) GLOD Diagnosed with a Dead RSX (CXD5300):

VDDC = 1.7Ω (Marginal)

VDDQ = 235Ω (High)

VDDIO = 95.5Ω (Same)

YC_RC_VDDIO (FlexIO) = 12.6Ω (Unknown/No comparison)

VDDA = 56.4Ω (Same)

VDDR = 315.8Ω (Low)

Discussion:
1.7Ω VDDC is a bit low, but it's within margin. If this was LINK, he'd have one heart left and be panting. But he's not dead yet. There is enough separation on the Core voltage to rule out a short. That's the most common place to burn out.

I would normally be thinking bump failure on the die, if weren't for the GLOD. The RSX must be able to send/receive information to/from the CPU/SYSCON, otherwise it would have triggered a YLOD in POST/BitTraining. So I think that might rule out DIE bump failures.

However, if there were bump failures on the RAM, leading to internal shorts or open lines, then perhaps it's not able to boot because the RSX Die can't communicate properly with it's onboard RAM. Since the issue is on the RSX itself, it's not on the motherboard and doesn't prevent the RSX from responding to SYSCON check-in's, the SYSCON doesn't throw an error code because the RSX is suffering in silence. So the system is stuck in limbo (GLOD) while the RSX fails to get it's $h!T together.

Bump failures seem more likely to me, probably between the DIE and RAM. Perhaps an SMD component on the RSX substrate? Might be worth probing those values with a good chip to see if there's a difference there. If so that would be an easy fix. VDDQ and VDDR are both related to the RAM. And they are the only ones significantly different. These readings rule out short conditions, but not an electromigration open line fault. VDDQ is high, perhaps that's why. It's hard to know what effect an open line fault on some microscopic trace inside the DIE or RAM would have on resistance measured there. Maybe a Bump on the VDDQ line cracked and increased the resistance. That narrative lines up with what we know about aging solder joints - the resistance tends to increase with deformation and oxidation, until an open line forms. However, we also know some resistances, like VDDC tend to decrease. VDDR is the RAM's main voltage, like VDDC is the die's core voltage. Perhaps the lower VDDR is kinda like a health meter for the RAM. Isn't VDDQ used for voltage reference, among other things? It can't be good if it's significantly off.

EDIT:
I just noticed that the difference between the working and GLOD RSX is the same for both VDDR and VDDQ. VDDQ is 135 ohm higher. VDDR is 134 ohms lower. They are both off by about the same amount? That seems sus!

EDIT2:
VDDQ: The supply voltage to the output buffers of a memory chip.
VDDR: Supply voltage to the memory.

@SkaziChris I thought you might like this. I dredged it up while reviewing the tokin thread for more syscon codes for a project I'm working on.

vyktormvmpay25 · Jan 17, 2022

RIP-Felix said:
I probed a couple more chips, another 40nm and 90nm. But I noticed a difference between them that confused me before. So I thought I would take a closer look. Apparently, the locations labeled PLL in blue are a bit different between these model revisions...
View attachment 35829
As you can see the 40nm reads OL in a couple of spots that the 90nm doesn't. So I made this probing chart to simplify the locations to test, so it'll always return a comparable value.
View attachment 35830

Can you please check again all groups of PLL points with your multimeter set on manual range for kohms? I get very different values (around 1,20k~1,40k,only one way would work to take measurements). It seems multimeter left on auto will react in kind of diode and get different measurements if probes are twisted. I assume you did it in auto mode.
It may take me a while to complete it.

sandungas · Jan 17, 2022

DeadEnd said:
I'm afraid your point went over my head ... None of this changes what I said about the recommended way to read error codes reliably ?

No, i agree with what you and @RIP-Felix said about how to read the error codes, but i mentioned it because i think this "loop mark" inside the errorlog could be other of the problems of the windows tool some people was using (im not sure about how it was working though)
When using a command by UART like "ERRLOG GET 06" we are telling syscon the identifyer 06 and syscon does the job to find the mark that indicates where the loop starts and ends. Something simlar happens when we run the command "lasterrlog", the first thing syscon needs to do is to find where the loop starts and ends

But the tools that are going to read the errorlog in raw needs to implement a function to find the loop start/end, otherway if you read the codes "from top to bottom" without caring about the mark they are going to have a incorrect order (and there is going to be an error displaying code FFFFFFFF that is fake), specially in the errorlogs with invalid timestamps this could be a mess

Also, i mentioned it because im not sure if what i wrote about this special mark for the loop start/end is correct, is mostly speculation

RIP-Felix · Jan 17, 2022

vyktormvmpay25 said:
Can you please check again all groups of PLL points with your multimeter set on manual range for kohms? I get very different values (around 1,20k~1,40k,only one way would work to take measurements). It seems multimeter left on auto will react in kind of diode and get different measurements if probes are twisted. I assume you did it in auto mode.
It may take me a while to complete it.

I only have 2 models of RSX (90nm and 40nm). I measured 2 of each...

90nm PLLVDD = 22K, 22K
40nm PLLVDD = 3900K, 3500K

I have 1 or 2 more 90nm that I could measure. They need cleaned up first, I pulled them and never bothered to wick the solder bridges off. I don't have any 65nm. I'm not interested in them (40nm Frankenstein or bust).

What I noticed is that the resistance is vastly different between models.

sandungas said:
No, i agree with what you and @RIP-Felix said about how to read the error codes, but i mentioned it because i think this "loop mark" inside the errorlog could be other of the problems of the windows tool some people was using (im not sure about how it was working though)
When using a command by UART like "ERRLOG GET 06" we are telling syscon the identifyer 06 and syscon does the job to find the mark that indicates where the loop starts and ends. Something simlar happens when we run the command "lasterrlog", the first thing syscon needs to do is to find where the loop starts and ends

But the tools that are going to read the errorlog in raw needs to implement a function to find the loop start/end, otherway if you read the codes "from top to bottom" without caring about the mark they are going to have a incorrect order (and there is going to be an error displaying code FFFFFFFF that is fake), specially in the errorlogs with invalid timestamps this could be a mess

Also, i mentioned it because im not sure if what i wrote about this special mark for the loop start/end is correct, is mostly speculation

We need a new windows executable like the one @M4j0r pulled down (which used his code). It was more accessible to people willing to at least solder 2 wired to the motherboard. Lots of people preferred that option. It just needed fixed to display all 32 errors and have built in controls for other commands. Or perhaps be set by default to return them. Maybe even it can help with gaining internal access, IDK. I'll bet @M4j0r is working on it, otherwise why would he pull the other one down?

M4j0r · Jan 17, 2022

RIP-Felix said:
We need a new windows executable like the one @M4j0r pulled down (which used his code). It was more accessible to people willing to at least solder 2 wired to the motherboard. Lots of people preferred that option. It just needed fixed to display all 32 errors and have built in controls for other commands. Or perhaps be set by default to return them. Maybe even it can help with gaining internal access, IDK. I'll bet @M4j0r is working on it, otherwise why would he pull the other one down?

I didn't pull it down because it uses my code. It violates the license under which I released it - since it's released on the wiki you have to follow the GNU Free Documentation License Version 1.2 (the version on twitter is proprietary). I don't care what people do with the stuff I release, but I do care about the wiki. Everyone should handle the code which is released there under the GFDL as LGPL licensed, it's not explicitly stated but it makes handling the lifecycle of the application using it way easier.
I shared the method on how to use the script (and freely release your work), without conflicting with the license, multiple times:

Save this to a file: https://www.psdevwiki.com/ps3/Talk:..._squeasy_way_.28UART.29_.28CXR.2FCXRF.2FSW.29
In your python file, import the PS3UART class: "from file import PS3UART"
Create an instance of the class: "ps3 = PS3UART(serial port, type)"
Use the auth or command function: ps3.auth() returns a string, ps3.command(command) returns the command output, the format depends on the target type

Note: The script already contains a command console as an example, implemented in the main function. Other examples can be found here: https://www.psdevwiki.com/ps3/Talk:SC_Communication#Example_Scripts

RIP-Felix · Jan 17, 2022

Okay @vyktormvmpay25 I redid all the ohm tests. I cleaned all the RSX's I have off (except 40nm #2, which was NOS and preballed, but which I managed to loose pads/ball...not sure. Didn't feel like cleaning it until I decide if those pads are needed or not.)

Here are the results. I decided to let the meter auto range, because if I did it manually the numbers are different depending on the significant digit selected and auto made it much easier anyway. All the measurements were comparable on my meter at least. So I feel the auto range did fine.

RIP-Felix · Jan 17, 2022

M4j0r said:
I didn't pull it down because it uses my code. It violates the license under which I released it - since it's released on the wiki you have to follow the GNU Free Documentation License Version 1.2 (the version on twitter is proprietary). I don't care what people do with the stuff I release, but I do care about the wiki. Everyone should handle the code which is released there under the GFDL as LGPL licensed, it's not explicitly stated but it makes handling the lifecycle of the application using it way easier.
I shared the method on how to use the script (and freely release your work), without conflicting with the license, multiple times:

Save this to a file: https://www.psdevwiki.com/ps3/Talk:..._squeasy_way_.28UART.29_.28CXR.2FCXRF.2FSW.29

In your python file, import the PS3UART class: "from file import PS3UART"

Create an instance of the class: "ps3 = PS3UART(serial port, type)"

Use the auth or command function: ps3.auth() returns a string, ps3.command(command) returns the command output, the format depends on the target type

Note: The script already contains a command console as an example, implemented in the main function. Other examples can be found here: https://www.psdevwiki.com/ps3/Talk:SC_Communication#Example_Scripts

Sorry if I insinuated you took it down unnecessarily. That wasn't my intention. Thank you for clarifying.

My point was there needs to be another executable program, so the SYSCON is more accessible to non-coders. Seems like an easy first program for someone wanting to dip their toes in, but for someone experienced it should be a breeze. Poll request?

vyktormvmpay25 · Jan 17, 2022

So you get around 20k PLL, I get 1,2k~1,40k ohms. It gives that 20k when I twist probes.
Anyway got around 2 90nm measurements.
Not many of 65 because I often bought slim and were fixed. Few 40nm on table are in that special glod condition.
Didn't keep rest faulty around to much time.

RIP-Felix · Jan 17, 2022

vyktormvmpay25 said:
So you get around 20k PLL, I get 1,2k~1,40k ohms. It gives that 20k when I twist probes.
Anyway got around 2 90nm measurements.
Not many of 65 because I often bought slim and were fixed. Few 40nm on table are in that special glod condition.
Didn't keep rest faulty around to much time.

Yeah, getting a pretty consistent 20-22K on 90nm PLL test pads shown. Was better after cleaning RSX, I think flux residue and old solder was getting in the way before. I don't like twisting probes on these RSX's the pads/balls are easy to tear off. Much better to clean first, so you don't have to apply much pressure during the Ohm tests.

I can now definitively say that you must test all the voltage lines to know if an RSX is good or bad. 90nm #1 tested fine in all but the shorted out VDDQ (which I could tell you just by looking at the shorting pads, but it's not always that obvious). You can't just test VDDC and assume it's good! They all tested okay on the core.

Killuminati · Jan 17, 2022

Thanks for your replies guys

and sorry for disturbing your more advanced analysis

I did the ERRLOG GET 00 to 1F in external mode (thought external mode would be enough for just the error codes)
I am getting the same results as before (and in the windows program as well):

Code:

XX@XX-MBP ps3syscon-master % python ps3_syscon_uart_script.py /dev/tty.usbserial-A10L97SK CXR
>$ auth
Auth successful
>$ ERRLOG GET 00
ERRLOG GET 01
ERRLOG GET 02
ERRLOG GET 03
ERRLOG GET 04
ERRLOG GET 05
ERRLOG GET 06
ERRLOG GET 07
ERRLOG GET 08
ERRLOG GET 09
ERRLOG GET 0A
ERRLOG GET 0B
ERRLOG GET 0C
ERRLOG GET 0D
ERRLOG GET 0E
ERRLOG GET 0F
ERRLOG GET 10
ERRLOG GET 11
ERRLOG GET 12
ERRLOG GET 13
ERRLOG GET 14
ERRLOG GET 15
ERRLOG GET 16
ERRLOG GET 17
ERRLOG GET 18
ERRLOG GET 19
ERRLOG GET 1A
ERRLOG GET 1B
ERRLOG GET 1C
ERRLOG GET 1D
ERRLOG GET 1E
ERRLOG GET 1F
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF
>$ 00000000 FFFFFFFF FFFFFFFF

I tried to enter internal mode but I'm stuck at changing the checksum. It's a bit advanced for me so I thought it's better to ask before changing anything.

My output at eepcsum is the following:

Code:

eepcsum
Addr:0x000032fe should be 0x56b6
Addr:0x000034fe should be 0x985b
sum:0x0100
Addr:0x000039fe should be 0x7350
Addr:0x00003dfe should be 0x00ff
sum:0xfffe
Addr:0x00003ffe should be 0xffff0101

r 3900 100 gives me:

Code:

r 3900 100
+0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +A +B +C +D +E +F
-----------------------------------------------
FF BF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 
40 50 21 FF FF FF FF FF FF FF FF FF FF FF FF FF 
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 
FF FF FF 00 FF FF FF FF FF FF FF FF FF FF FF FF 
FF 03 C8 78 FF FF FF FF FF FF FF FF FF FF FF FF 
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 
02 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF 
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 
FF FF FF FF FF FF FF FF FF FF FF FF FF FF 50 74

What should I change? Don't want to mess up anything

Thanks so much for your replies

PS3 Fault finding YLOD with the SYSCON - First steps and Error reporting

raidriar

Member

RIP-Felix

Senior Member

RIP-Felix

Senior Member

DeadEnd

Senior Member

marciolsf

Member

marciolsf

Member

RIP-Felix

Senior Member

marciolsf

Member

M4j0r

marciolsf

Member

RIP-Felix

Senior Member

vyktormvmpay25

Senior Member

sandungas

RIP-Felix

Senior Member

M4j0r

RIP-Felix

Senior Member

RIP-Felix

Senior Member

vyktormvmpay25

Senior Member

RIP-Felix

Senior Member

Killuminati

Forum Noob

Similar threads