Recovering a Bricked Rover MEMS3 ECU – Without
Opening the Case
Download
Link: https://andrewrevill.co.uk/Downloads/MEMS3Tools.zip
NOTE: THIS DOCUMENTATION IS UP TO DATE AS OF VERSION 4.90 RELEASE OF
THE MEMS3 TOOLS APPLICATION SUITE.
In the latest releases of my MEMS3 Mapper tool I have gone to
great lengths to try to ensure that ECUs cannot be accidentally bricked by
writing bad firmware or map to the ECU. However, there are some things that I
just cannot protect against, for example the user manually editing the memory
bytes in the hexadecimal editor to produce something that is not valid, or
writing a firmware to hardware that just cannot support it. The small but
non-zero risk of bricking an ECU has always bothered me. Whilst researching
these ECUs I’ve bricked many, and the only way to recover them so far has been
to open them up, de-solder the main EEPROM chip using hot air, reprogram the
EEPROM with valid firmware and map using an EEPROM programmer on the bench and
then solder the chip back in, which is tedious and time-consuming problem and
not exactly good for the board and other components.
I always suspected that the engineers who designed it would have
left a back door in somewhere to allow a bricked ECU to be recovered.
Bricked
ECUs
In addition to the firmware and map, the EEPROM chip contains a
boot loader program. When a brand new virgin ECU is suppled, the boot loader is
all that is present. The boot loader has sufficient functionality to allow the
ECU to communicate with a programmer and load the firmware and map. The boot
loader generally checks to see whether valid firmware and map are present and
if so, shortly after the ECU boots it transfers control straight to the
firmware. The boot loader initialises watchdog timers which reboot the ECU if
the firmware appears to have hung up, and what generally seems to happen in the
case of a bricked ECU is that the boot loader sets up the watchdogs, transfers
control to the firmware which fails to operate normally, the watchdogs detect
this and reboots the ECU which then just repeats the same cycle. The ECU is
stuck in an indefinite boot loop. You usually see the fuel pump starting to
prime then being switched off again several times a second forever. Because the
ECU is constantly rebooting, you never get the chance to establish stable
communications with it so cannot even begin to reprogram it with valid firmware
and map.
The boot loader code is protected and permanent. It is loaded into
the first sector of the EEPROM chip at manufacture and the ECU does not appear
to provide any method to erase or modify it. Certainly my MEMS3 Mapper and
Flasher tools never touch the boot loader in an ECU. This means that whatever
we do to the firmware and map, however badly we screw those up to brick the
ECU, the boot loader will still be clean and valid and just the way it was the
day the ECU was manufactured.
If only I could find some way to prevent the boot loader from
transferring control to the firmware, as though no firmware had been loaded …
Back Door
Search
I decided to search for a possible back door by working through a
disassembly of the boot loader code. Now I’ve spent quite a long time looking
at disassembled code from the ECU in the past and I have to say it’s not been
anywhere near as successful as I would have hoped. The code is very opaque and
difficult to get into. But I thought that if I was searching for the answer to
a very specific question I might stand a chance, and in this case I had few
pointers to help me get started. For example, once the boot loader jumps into
the firmware, the firmware mostly executes as a separate stand-alone program.
The boot loader should be independent of the firmware, so there are very few
references to firmware addresses in the boot loader code and any references
that I did find were likely to be related to the process of checking for loaded
firmware and launching the firmware.
Back Door
Analysis
The analysis below is based on a disassembly of the VVC 160 ECU
NNN000160’s boot loader “bootp033” but the code varies very little across
different boot loaders.
Subroutine $001007CA is
responsible for jumping into the firmware (at addresses $001007D8, $001007DE and
$001007E2). It loads the address of
the firmware $110000 into register a0, offsets that by 4 bytes and reads the
vector at that address to put the firmware entry point into register a0, then
jumps to the address that register a0 points to:
ROM:001007CA
ROM:001007CA ; =============== S U B R O U T I N E
=======================================
ROM:001007CA
ROM:001007CA
ROM:001007CA sub_1007CA: ; CODE XREF:
ROM:00100622↑p
ROM:001007CA btst #0,d1
ROM:001007CE bne.s loc_1007E6
ROM:001007D0 cmpi.l #0,d0
ROM:001007D6 bne.s
loc_1007E6
ROM:001007D8 movea.l
#dword_110000,a0
ROM:001007DE movea.l
4(a0),a0
ROM:001007E2 jmp (a0)
ROM:001007E4 ;
---------------------------------------------------------------------------
ROM:001007E4 bra.s locret_1007EA
ROM:001007E6 ;
---------------------------------------------------------------------------
ROM:001007E6
ROM:001007E6 loc_1007E6: ; CODE XREF:
sub_1007CA+4↑j
ROM:001007E6 ;
sub_1007CA+C↑j
ROM:001007E6 bsr.w sub_1007F8
ROM:001007EA ;
---------------------------------------------------------------------------
ROM:001007EA
ROM:001007EA locret_1007EA: ; CODE XREF: sub_1007CA+1A↑j
ROM:001007EA rts
ROM:001007EA ; End of function sub_1007CA
ROM:001007EA
Before executing the firmware, it performs a couple of tests.
Specifically it tests that both bit 0 of register d1 is 0 (at addresses $001007CA and $001007CE) and register d0 contains 0 ($001007D0 and $001007D6)
when the subroutine is called. The firmware is only executed if both of these
tests are passed.
Subroutine $001007CA
above is called right at the end of the main entry point routine of the boot
loader. Before calling $001007CA
this calls other boot loader subroutines $00100628,
$00100708, $00100740 and $001007A2:
ROM:00100612
ROM:00100612 loc_100612: ; CODE XREF:
ROM:00100576↑j
ROM:00100612 bsr.w
sub_100628
ROM:00100616 bsr.w sub_100708
ROM:0010061A bsr.w sub_100740
ROM:0010061E bsr.w sub_1007A2
ROM:00100622 bsr.w sub_1007CA
ROM:00100626
ROM:00100626 locret_100626: ; CODE XREF:
ROM:00100610↑j
ROM:00100626 rts
ROM:00100628
Subroutine $00100740
checks for the $5AA5 signatures of the firmware (at firmware address $110410, in code starting at address $00100752) and map (at map address $13C012, in code starting at address $0010075E) (you can look at these
addresses in any valid firmware and map using MEMS3 Mapper and see that they do
indeed always contain the signature bytes $5AA5) and sets bits in register d0
if they are not found, so the check for register d0 being 0 amounts to a check
that firmware and map are loaded.
ROM:00100740
ROM:00100740 ; =============== S U B R O U T I N E
=======================================
ROM:00100740
ROM:00100740
ROM:00100740 sub_100740: ; CODE XREF: ROM:0010061A↑p
ROM:00100740 movea.l
#dword_110000,a0
ROM:00100746 clr.l d0
ROM:00100748 cmpi.w #0,d7
ROM:0010074C bne.s loc_100752
ROM:0010074E bset
#1,d0
ROM:00100752
ROM:00100752 loc_100752: ; CODE XREF:
sub_100740+C↑j
ROM:00100752 cmpi.w #$5AA5,$410(a0)
ROM:00100758 beq.s loc_10075E
ROM:0010075A bset #0,d0
ROM:0010075E
ROM:0010075E loc_10075E: ; CODE XREF:
sub_100740+18↑j
ROM:0010075E movea.l
#word_13C000,a0
ROM:00100764 cmpi.w #$5AA5,$12(a0)
ROM:0010076A beq.s loc_100770
ROM:0010076C bset #2,d0
ROM:00100770
ROM:00100770 loc_100770: ; CODE XREF:
sub_100740+2A↑j
ROM:00100770 movea.l
#unk_110400,a1
ROM:00100776 movea.l
#word_13C00A,a0
ROM:0010077C move.b
#0,d2
ROM:00100780 bra.s loc_100786
ROM:00100782 ;
---------------------------------------------------------------------------
ROM:00100782
ROM:00100782 loc_100782: ; CODE XREF:
sub_100740+5E↓j
ROM:00100782 addi.b #1,d2
ROM:00100786
ROM:00100786 loc_100786: ; CODE XREF:
sub_100740+40↑j
ROM:00100786 cmpi.b #7,d2
ROM:0010078A bgt.s locret_1007A0
ROM:0010078C move.b
(a1),d1
ROM:0010078E cmp.b (a0),d1
ROM:00100790 beq.s loc_100796
ROM:00100792 bset #3,d0
ROM:00100796
ROM:00100796 loc_100796: ; CODE XREF:
sub_100740+50↑j
ROM:00100796 adda.w #1,a0
ROM:0010079A adda.w #2,a1
ROM:0010079E bra.s loc_100782
ROM:001007A0 ;
---------------------------------------------------------------------------
ROM:001007A0
ROM:001007A0 locret_1007A0: ; CODE XREF:
sub_100740+4A↑j
ROM:001007A0 rts
ROM:001007A0 ; End of function sub_100740
ROM:001007A0
ROM:001007A2
So we now know that the firmware will be executed if both firmware
and map are loaded and bit 0 of register d1 is 0 when subroutine $001007CA is called. We also know from
previous experience that when an ECU does not have firmware or map loaded, it
happily runs the boot loader and allows normal programming over OBDII using the
MEMS3Mapper application. Looking at the code, I would expect identical
behaviour if the other test failed, i.e. if bit 0 of register d1 is 1 when
subroutine $001007CA is called. So
if we can find a way of setting this bit at this point, we should have a way of
preventing the ECU from executing the firmware and accepting programming just
as though it was a virgin ECU, and that would be the
back door we were looking for.
So what is bit of 0 of register d1 doing?
Subroutine $001007A2,
which is called in the boot loader immediately before the signature checking
routine $00100740 above, manipulates
bit 0 of d1, and so it determining the other requirement for the firmware to be
executed. It’s a very simple subroutine which sets bit 0 of d1 entirely based
on registers of the QSM (Queued Serial Module) in the MC68336 microcontroller:
ROM:001007A2
ROM:001007A2 ; =============== S U B R O U T I N E
=======================================
ROM:001007A2
ROM:001007A2
ROM:001007A2 sub_1007A2: ; CODE XREF:
ROM:0010061E↑p
ROM:001007A2 btst #6,($FFFFFC0D).w
ROM:001007A8 beq.s loc_1007C4
ROM:001007AA cmpi.b #$96,($FFFFFC0F).w
ROM:001007B0 bne.s loc_1007C4
ROM:001007B2 bset
#0,d1
ROM:001007B6 bset #0,($FFFFFC0B).w
ROM:001007BC bclr #0,($FFFFFC0B).w
ROM:001007C2 bra.s locret_1007C8
ROM:001007C4 ; ---------------------------------------------------------------------------
ROM:001007C4
ROM:001007C4 loc_1007C4: ; CODE XREF:
sub_1007A2+6↑j
ROM:001007C4 ;
sub_1007A2+E↑j
ROM:001007C4 bclr #0,d1
ROM:001007C8
ROM:001007C8 locret_1007C8: ; CODE XREF:
sub_1007A2+20↑j
ROM:001007C8 rts
ROM:001007C8 ; End of function sub_1007A2
ROM:001007C8
ROM:001007CA
If bit 6 of QSM register SCSR is 0 (tested at addresses $001007A2 and $001007A8) then bit 0 of register d1 is cleared to 0 (at address $001007C4), otherwise if the low byte
of QSM register SCDR is not $96 (tested at addresses $001007AA and $001007B0)
then bit 0 of register d1 is also cleared to 0 (at address $001007C4), otherwise bit 0 of d1 is set to 1 (at address $001007B2), bit 0 of QSM register SCCR1
is toggled to 1 then 0. The subroutine then exits in all cases.
Now the SCSR register is the SCI Status Register and SCDR register
is the SCI Data Register, and the SCI is the Serial Communications Interface
which is part of the QSM. It is basically a standard UART serial port. SCSR
tells us about the current status of port operation and SCDR holds the most
recent byte received. From the MC68336 data sheet, Bit 6 of the status register
SCSR is the RDRF flag, or Receive Data Register Full – it tells us if there’s
something in the data register SCDR.
So the condition which determines that a loaded firmware should be
executed is simply that no byte has been received by the SCI port, or that byte
was not $96.
In other words, we can prevent any loaded firmware being executed
by sending the byte $96 to the SCI port at the right moment.
That leaves three questions:
1)
What is the right moment?
2)
What is the SCI port connected to?
3)
What Baud rate should we transmit the $96 at?
The answer to question 1 is clearly “as the ECU boots”, but
exactly when during the boot sequence is not easy to determine, so I decided to
try the approach of just broadcasting the byte $96 continuously as the ECU was
powered on. I couldn’t be sure this would work but it was certainly worth a
try.
The answer to question 2 seemed likely to be the OBDII serial
port. My guess was that SCI was just connected to the OBDII K-Line as this was
used for serial UART communications with the ECU. It would be extremely
convenient if this turned out to be the case as I could provide a utility
within MEMS3 Mapper that just broadcast $96 to the OBDII port while you powered
the ECU on. Again, I couldn’t be sure
this was the case without a lot more digging in the firmware disassembly or
tracing the electronics, but the easiest thing was to give it a try.
As for question 3, one of the subroutines called immediately prior
to those discussed above by the boot loader was $00100628, and that contained the following lines:
ROM:0010069E
ROM:001006A0 ;
---------------------------------------------------------------------------
ROM:001006A0
ROM:001006A0 loc_1006A0: ; CODE XREF:
ROM:0010051A↑j
ROM:001006A0 move.w #$34,($FFFFFC08).w ; '4'
ROM:001006A6 move.w #$C,($FFFFFC0A).w
ROM:001006AC bra.l loc_100520
ROM:001006B2 ;
---------------------------------------------------------------------------
ROM:001006B2
These load the SCCR0 register of the QSM with the value $34 (at
address $001006A0). SCCR0 is the SCI
Control Register 0 and controls the Baud rate at which the SCI port operates,
according to the formula Baud Rate = (Clock Frequency) / 32 / SCCR0. So with a
system clock frequency of 16Mhz and an SCCR0 value of $34 or 52 decimal, the
Baud rate was configured to 16000000 / 32 / 52 = 9615, which is as close as you
can get to the standard OBDII Baud rate of 9600 Baud. This provided even more
evidence that I was on the right lines thinking that the SCI port was actually
the OBDII K-Line.
So my plan was to broadcast $96 at 9600 Baud over the OBDII port
while booting up the ECU and … rather to my surprise, it worked! That’s the back door I was looking for.
So that
provides a failsafe way of being able to recover a bricked ECU. I’ve
tested using it to prevent a perfectly healthy ECU from executing the firmware
- the ECU just sits in boot / programming / recovery mode in the boot loader
waiting to be programmed. I’ve tested communications with the ECU after
blocking the firmware with this signal and all seems normal. Finally I’ve used
to unbrick a development ECU that got bricked while I was playing with it. I’m
pretty much 100% sure that this method will unbrick any MEMS3 ECU that has bad
firmware or map as the boot loader code is protected and always virgin. It
doesn’t matter how the ECU was bricked, whether it was using MEM3 Flasher /
Mapper or other programming tools such as Galletto, this method provides a safe
recovery option.
Recover
Bricked ECU
In release 4.87 of MEMS3 Flasher and MEMS3 Mapper I’ve now added
an option to the ECU Tools menu to Recover Bricked ECU. When you select
this option, the application will begin to broadcast the request code at
regular intervals and monitor for the expected response. If the ECU is stuck in
a boot loop, constantly rebooting, it should recover almost immediately as the
next time it reboots it will detect the request and remain in the boot loader
ready to accept programming. If the response is not detected within a short
time, the following dialog is displayed:
If the ECU is hanging and unable to communicate, we need to
persuade it to reboot. Switching the ignition off for at last 15 seconds (and
then back on again) is usually sufficient to ensure that it will perform a full
boot when powering on again. At this point it should again remain in the boot
loader ready to accept programming as above. If the ECU was really tightly
stuck in a loop, then just occasionally switching the ignition off is not
sufficient to break it out of the loop and trigger a reboot. In this case, you
need to briefly remove power from the ECU. When power is re-applied and the
ignition is turned on it will boot again, detect the request and remain in the
boot loader ready to accept programming as above.
Depending on the timing of the request code, it may occasionally
require more than one attempt to recover an ECU.
When the expected response is detected, the following dialog is
displayed:
Once the ECU has recovered into a state where it is running the
boot loader and willing to communicate with the application again it is
important that you then write a good replacement firmware, map or both
(depending on what was damaged, if in doubt do both). The initial recovery is
only temporary; we have broken the ECU out of the cycle that was preventing it
from communicating and accepting programming using the special code, but next
time it reboots without seeing the request broadcast it will of course continue
to try to execute the damaged firmware or map as before. In recovery mode, some
of the ECU tools and operations will fail (the ECU will reject the operations
as it will not be running the firmware) and it will not provide any engine
management functionality, but everything necessary to read and write the
firmware and map will function normally.
Once you have rewritten the ECU the repair will be permanent.