MEMS3 Debug Code Injection

MOTIVATION

As part of my ongoing quest to understand the internal working of the MEMS3 ECU, I have been working through disassembled code, slowly trying to unpick more and more of the key routines and to use the understanding gained to decipher what the various tables and scalar value found in the maps actually mean. In my day job as a software developer, I would usually be able to inspect the internal workings of code I had written by running it under a debugger. This would allow me to place break points at arbitrary code locations and pause the code execution at those points to inspect variable values etc. Sometimes this approach isn’t possible and it is not possible to run the code under a debugger and in those instances we usually have to resort to inserting debug code into the software to write out debugging information to a file, to the console or to another program. Although this approach is not as flexible as live execution under a debugger, it is still enormously useful. Although it takes more time using this method, it is usually possible to extract all of the information you need to understand a particular situation with patience.

So far with MEMS3 I have just been looking at disassembled source code in text form. It occurred to me that it would be so much better if I could actually debug a live running MEMS3 while it was running an engine.

Now I can’t see any practical way of executing the code on a live running engine on the road under a debugger in this situation, but it occurred to me that the second method above might just be doable …

OVERALL PLAN

The overall plan was to provide a mechanism that allowed me to inject debug code into the firmware code at any point (or almost any point as there will be some restrictions), and to have that debug code be able to log out details of processor registers, memory variable contents etc. to the OBDII port where I could monitor it with a terminal program running on a PC or laptop connected through a simple OBDII serial cable. I would need to be able to inject code at an arbitrary number of points simultaneously and I would need to be able to specify what information I wanted to log out at each point.

There’s no way that wholesale modifications could be made to MEMS3 code and recompiled into a running state without access to the original development environments. But it seemed that it should be possible to write some small routines and patches directly in 68000 assembly language, or even generate them dynamically directly in 68000 opcodes, and then patch these into the existing code. My idea was that I could replace a few bytes at the point I wanted to debug with a jump into a debugging routine. This would add the required debugging information to the buffer to be written out to the OBDII port, then execute the instructions that had been overwritten and finally jump back to the next instruction in the original routine. This would place a few small restrictions on the points at which debug code could be injected:

·        I could only insert debug code at the first byte of a machine code instruction.

·        A JMP (permanent jump) or JSR (call to a subroutine) instruction to our code would occupy 6 bytes, so 6 consecutive bytes would get overwritten.

·        Those 6 bytes may contain part of the next instruction. I would have to consider all operations at least partially contained within the 6 bytes to have been overwritten.

·        The instructions overwritten must be capable of being executed at a different address. So short branches, access to memory locations relative to the program counter etc. would not be permitted.

None of these restrictions are so onerous as to make this not worth doing.

In order to do this I needed to find some EEROM addresses where I could safely write my injected program code and some RAM addresses I could safely use for variables, buffers etc.

EEPROM

The firmware EEPROM address space runs from $110000 to $139FFF. The actual size of the firmware code varies from one configuration to another but it typically ends around $1357FF, leaving addresses $135800 to $139FFF unused. This gives a contiguous block of 18 kilobytes of available EEPROM space. The last few bytes of this range are used by MEMS3Mapper to perform checksum corrections, but I think I am fairly safe to assume that addresses around $139000 onwards will otherwise be unused, and this gives me close to 4 kilobytes to play with, which is a lot more than I will need.

RAM

There are two modules within the 68336 microcontroller that provide RAM, namely SRAM (4 kilobytes of Standby RAM) and TPURAM (3.5 kilobytes of TPU ROM Emulation RAM). SRAM has the capability to provide low power standby battery backup, probably not used here. TPURAM has the ability to emulate the ROM in the TPU (Time Processing Unit) for development purposes, although this isn’t used in a running ECU.

Each of these blocks of RAM can be located anywhere within the memory address space. The base address of the SRAM memory is controlled by registers RAMBAH and RAMBAL. The base address of TPURAM memory is controlled by register TRAMBAR.

The main entry point routine of the boot loader handles basic hardware configuration and sets up both of these registers:

ROM:00100478                 move.w  #$10,(Local Boot Variable TPURAM Register TRAMBAR).w

ROM:0010047E                 move.w  #$100,(Local Boot Variable TPURAM Register TRAMMCR).w

ROM:00100484                 move.w  #4,(Local Boot Variable SIM Register CSBAR0).w

ROM:0010048A                 move.w  #$3831,(Local Boot Variable SIM Register CSOR0).w

ROM:00100490                 move.w  #4,(Local Boot Variable SIM Register CSBAR1).w

ROM:00100496                 move.w  #$5831,(Local Boot Variable SIM Register CSOR1).w

ROM:0010049C                 move.w  #$1106,(Local Boot Variable SIM Register CSBAR2).w

ROM:001004A2                 move.w  #$1031,(Variable SIM Register CSOR2).w

ROM:001004A8                 move.w  #$4007,(Local Boot Variable SIM Register CSBAR3).w

ROM:001004AE                 move.w  #$7871,(Local Boot Variable SIM Register CSOR3).w

ROM:001004B4                 move.w  #$5007,(Local Boot Variable SIM Register CSBAR4).w

ROM:001004BA                 move.w  #$7871,(Boot Variable SIM Register CSOR4).w

ROM:001004C0                 move.w  #$FFE0,(Local Boot Variable SIM Register CSBAR5).w

ROM:001004C6                 move.w  #$BB71,(Boot Variable SIM Register CSOR5).w

ROM:001004CC                 move.w  #0,(Local Boot Variable SIM Register CSBAR6).w

ROM:001004D2                 move.w  #0,(Local Boot Variable SIM Register CSOR6).w

ROM:001004D8                 move.w  #0,(Local Boot Variable SIM Register CSBAR7).w

ROM:001004DE                 move.w  #0,(Local Boot Variable SIM Register CSOR7).w

ROM:001004E4                 move.w  #0,(Local Boot Variable SIM Register CSBAR8).w

ROM:001004EA                 move.w  #0,(Local Boot Variable SIM Register CSOR8).w

ROM:001004F0                 move.w  #0,(Local Boot Variable SIM Register CSBAR9).w

ROM:001004F6                 move.w  #0,(Local Boot Variable SIM Register CSOR9).w

ROM:001004FC                 move.w  #0,(Local Boot Variable SIM Register CSBAR10).w

ROM:00100502                 move.w  #0,(Local Boot Variable SIM Register CSOR10).w

ROM:00100508                 move.w  #0,(Local Boot Variable SRAM Register RAMBAL).w

ROM:0010050E                 move.w  #0,(Local Boot Variable SRAM Register RAMBAH).w

ROM:00100514                 move.w  #0,(Local Boot Variable SRAM Register RAMMCR).w

 

Once set, these are never changed. So RAMBAH and RAMBAL are set to 0 and TRAMBAR is set to $10. According to the 68336 User Manual, these values place the SRAM at addresses $0 to $FFF and the TPURAM at addresses $1000 to $1DFF. So in effect, even though coming from two different modules, there is a contiguous block of 7.5 kilobytes of fast RAM at addresses $0 to $1DFF.

I analysed the assembly language code for an NNN000160 (VVC 160 ECU) running ksr3p007 firmware using some software I wrote to produce a list of all RAM addresses in this range accessed by the code. This showed that the variable addresses are mostly closely stacked together (probably automatically by the compiler used) with a few notable exceptions:

·        There’s an apparently unused block of 256 bytes ending at $200.

·        There’s an apparently unused block of 254 bytes ending at $400.

·        There are a few other “holes” where the memory doesn’t seem to be accessed.

·        The variables are otherwise pretty much stacked together up to address $1405.

·        Beyond that the RAM appears to be unused to address $1AFF, a gap of 1787 bytes.

·        Between $1B00 and almost the top of RAM at $1DB3 there appear to be some larger structures as the only addresses referenced are in 32-byte increments.

When the ECU boots its initial behaviour is determine by the boot loader vector table which starts as follows:

ROM:00100000 dword_100000:   dc.l $1400

ROM:00100000 

ROM:00100004                 dc.l Boot Entry Subroutine Boot Loader Main Entry Point

ROM:00100008                 dc.l Boot Copy Vector Subroutine Enter Background Debugging Mode

ROM:0010000C                 dc.l Boot Copy Vector Subroutine Enter Background Debugging Mode

ROM:00100010                 dc.l Boot Copy Vector Subroutine Enter Background Debugging Mode

ROM:00100014                 dc.l $10279C

ROM:00100018                 dc.l $10279C

ROM:0010001C                 dc.l $10279C

 

The first DWORD sets the initial value of the supervisor stack pointer register SP to $1400. The second DWORD is the address of the boot loader main entry point routine, which starts off as follows:

ROM:0010041C ; ---------------------------------------------------------------------------

ROM:0010041C                 cmpa.l  #$1400,sp

ROM:00100422                 beq.s   loc_10043A

ROM:00100424                 cmpi.l  #$12345677,d4

ROM:0010042A                 beq.s   loc_100434

ROM:0010042C                 cmpi.l  #$12345678,d4

ROM:00100432                 bne.s   loc_10043A

ROM:00100434

ROM:00100434 loc_100434:

ROM:00100434                 bra.l   loc_100520

ROM:0010043A ; ---------------------------------------------------------------------------

 

This tests whether the stack pointer is equal to $1400 as the first step, which initially confused me as the stack pointer would have been set to this immediately before calling this routine, however I then noticed that very early on in the initialisation code it does this:

 

ROM:00100526

ROM:00100526 loc_100526:

ROM:00100526                 movea.l #$200,sp

 

This sets the stack pointer to $200. So it looks as though the check for $1400 is to determine whether the routine is being executed from cold after the ECU boots up or whether it has been reset by a jump back to the main entry point (although I haven’t yet found this in code, and I believe the Software Watchdog System would perform a full RESET if the firmware locked up).

 

It’s a very similar story when the boot loader transfers control to the main firmware. There’s a firmware vector table which starts as follows:

 

ROM:00110000 dword_110000:   dc.l $1400

ROM:00110000 

ROM:00110004                 dc.l Entry Subroutine Firmware Main Entry Point

ROM:00110008                 dc.l Copy Vector Subroutine Enter Background Debugging Mode

ROM:0011000C                 dc.l Copy Vector Subroutine Enter Background Debugging Mode

ROM:00110010                 dc.l Copy Vector Subroutine Enter Background Debugging Mode

ROM:00110014                 dc.l $115F22

ROM:00110018                 dc.l $115F22

ROM:0011001C                 dc.l $115F22

 

And the firmware main entry point routine starts off as follows:

 

ROM:0011562E ; =============== S U B R O U T I N E =======================================

ROM:0011562E

ROM:0011562E ; Attributes: noreturn

ROM:0011562E

ROM:0011562E sub_11562E:

ROM:0011562E                 move.l  #dword_110000,d0

ROM:00115634                 movec   d0,vbr

ROM:00115638                 movea.l #$400,sp

ROM:0011563E                 bsr.l   Subroutine sub_119524

ROM:00115644                 clr.l   d0

ROM:00115646                 move.l  #$1B00,d1

ROM:0011564C                 movea.l d0,a0

 

This updates the Vector Base Register to $110000 which is the address of the firmware vector table, the proceeds as for the boot loader, and shortly afterwards does this:

 

ROM:001156B0

ROM:001156B0 loc_1156B0:

ROM:001156B0                 movea.l #$400,sp

 

This sets the stack pointer to $400.

 

I scanned the full assembly code for anywhere that the stack pointer register SP was updated and found only the following lines. The first of these is the line I found above in the boot loader main entry routine, the second one is the line I found in the firmware main entry routine and the third one is also in the firmware and resets the stack pointer to the same address $400.

 

ROM:00100526                 movea.l #$200,sp

ROM:00115638                 movea.l #$400,sp

ROM:001156B0                 movea.l #$400,sp

 

So it appears the 256 byte gap in the RAM addresses accessed ending at $200 is the boot loader stack and the 254 byte gap in the RAM addresses accessed ending at $400 is the firmware stack (on a 68000 family microprocessor, the stack grows downwards to lower addresses so SP is initialise to point to the top of the otherwise unused memory block).

 

So in summary there is a boot loader stack ending at $200, we have a firmware stack ending at $400, packed around these there are variables up to $1405 and then some other structures of mostly 32 and 64 bytes in the last part of RAM from $1B00 onwards. Now $1405 is not a round number and I don’t think it has any special significance; it’s just that the actual physical RAM on the chip has more capacity than the programming of the ECU actually uses, and $1405 is where the used portion ends (with $1B00 onwards being used for some special purpose).

 

This leaves a block of 1787 bytes between $1405 and $1B00 which look they are unused and possibly available for me to use for my own purposes.

 

Now all of this is based on ksr3p007 firmware running on NNN000160 hardware and boot loader. It can’t be guaranteed that every other MEMS3 configuration will be exactly the same, but from what I’ve seen they all have so much in common that I am probably safe to make the following assumptions:

 

·        The general structure of the lower RAM addresses stacking up will be fixed across all MEMS3 configurations.

·        The usage of $1B00 onwards for some special purpose is also likely to remain fixed.

·        Some ECUs will probably use fewer variables and so the RAM addresses used will end below $1405.

·        Some ECUs may have more variables and so the RAM addresses used will end above $1405, although the differences in memory are likely to be relatively small as the vast majority of the code base is common.

 

So for example if I choose to work on the basis that RAM addresses from $1A00 or $1900 are overwhelmingly likely to be available in any MEMS3 configuration, this would give me either 256 or 512 bytes to play with, which is more than I will need.

 

FIRST APPROACH

 

The standard OBDII protocols run at 9600 or 10400 Baud. This limits the rate at which debug information could be logged out. My initial thought was that I would have to write a complete interrupt-driven read and write communication system and effectively take complete control of the QSM/SCI module which drives the OBDII port, replacing the existing interrupt handlers with my own. However, there were two problems with this. Firstly it was going to be a difficult job to code something like directly in assembly language, especially when I was new to 68000 assembly. Secondly, there were places in the main code execution loop which indirectly accessed the QSM Data Register and QSM Status Register and these would be hard to patch out. A quick test of some minimal replacements for the interrupt handlers just caused the ECU to crash and it appeared that there was an element of polling going on as well as responding to interrupts.

 

A little more digging turned up the following two subroutines:

 

ROM:00112234 ; =============== S U B R O U T I N E =======================================

ROM:00112234

ROM:00112234 sub_112234:

ROM:00112234 

ROM:00112234                 move.l  (Scalar $13C124),d0

ROM:00112238                 divu.l  #$51400,d0

ROM:00112240                 move.w  d0,(Variable QSM Register SCCR0).w

ROM:00112244                 rts

ROM:00112244 ; End of function sub_112234

ROM:00112244

 

ROM:00112246 ; =============== S U B R O U T I N E =======================================

ROM:00112246

ROM:00112246 sub_112246:

ROM:00112246 

ROM:00112246                 move.l  (Scalar $13C124),d0

ROM:0011224A                 divu.l  #$4B000,d0

ROM:00112252                 move.w  d0,(Variable QSM Register SCCR0).w

ROM:00112256                 rts

ROM:00112256 ; End of function sub_112246

ROM:00112256

 

These are the only two places in the whole of the firmware where register SCCR0 is modified. This is Control Register 0 for the SCI and sets the Baud rate. According to the MC68336 User Manual, the Baud rate is (System Clock Frequency) / 32 / SCCR0. In these routines a scalar value is read from the map and divided by a constant, either $51400 or $4B000. These are just the hexadecimal representations of the two standard Baud rates 10400 and 9600, multiplied by 32. So the calculation looks to be correct according to the definitions in the User Manual if the scalar value is programmed with the system clock frequency. Checking the map, the 4-byte Long value at this address was $00F8D400 or 16,307,200 decimal. I knew the system clock frequency was nominally 16Mhz, so this made sense.

 

These are simple self-contained routines that do nothing else other than set the Baud rate and so should be easy targets to scan for and patch in the firmware.

 

I did some experiments patching the constant for the Rover BMW protocol from 9600 Baud to high Baud rates. I then modified the code in my MEMS3 Mapper application to talk to the ECU at the higher Baud rate and everything worked fine up to 38,400 Baud (with the constant patched to $12C000). Beyond this, it would not communicate, despite the fact that both the PC / FTDI serial cable and the QSM/SCI in the ECU were capable of going to much higher Baud rates. It was only when I came to work on the assembly language files later that I realised why:

 

·        For 9600 Baud, the calculation 16307200 / 32 / 9600 gives 53.08. As this is an integer constant we are forced to use 53, which gives an actual Baud rate of 9615. This is only a 0.15% error and communication will be fine.

·        For 38400 Baud, the calculation 16307200 / 32 / 38400 gives 13.27. Again as this is an integer constant we are forced to use 13, which gives an actual Baud rate of 39200. This is a 2% error and communication should still just about be OK (up to about 3% difference between send and receive clock rates is acceptable).

·        For 57600 Baud, the calculation 16307200 / 32 / 57600 gives 8.84. The closest integer constant we can use is 9, but a simple integer division as used in the code will round down and so yield 8, which gives an actual Baud rate of 63700. This is a 10.5% error and communication will not be possible.

 

I checked the specifications for the FT232RL chip in my programming cable, and that generates Baud rates by dividing in internal clock frequency of 3 MHz by the sum of an integer divisor and fractional divisor, which can take various fixed values including ¼, ½, ¾ and some others. A quick check showed that a nominal standard value of 256,000 Baud should be possible:

 

·        The QSM/SCU in the ECU would actually run at 16307200 / 32 / 2 = 254800 Baud.

·        The FT232RL serial cable would actually run at 3000000 / 11.75 = 255319 Baud.

·        The error / difference between them would be only 0.2%, and communication should therefore be reliable.

 

One thing that still concerned me was the direct access to the SCI registers that I found in routines called from within the main program loop. The whole ECU firmware is written as one large loop that executes repeatedly. The main firmware entry point routine initialises various aspects of the hardware not fixed by the boot loader than jumps straight into the main program loop. This starts by notifying the Software Watchdog Service that it is still operating normally then calls a whole bunch of nested routines which for a large collection of state machines, and which handle all of the ECUs executive functions and calculations.

 

With some more digging through the OBDII code, although I didn’t understand quite a lot of what I was reading, the overall structure started to come to me and several things told me there was a problem:

 

·        The ECU seemed to be using the same buffer for sending and receiving messages.

·        The process of sending a message seemed to involve building the message up in a buffer (and always right at the start of the buffer), then setting a count variable to say how many bytes from the buffer were to be sent, then writing the first byte to the SCDR register. From here an interrupt was generated when the first byte had finished sending, and the interrupt handler then wrote the next byte to the SCDR register. This continued until the count was 0.

 

All of this suggested that the routines expected to receive a single message, then send a single response. There was no possibility of sending and receiving overlapping as they used the same buffer, and it really wasn’t structured to allow more data to appended to the sending queue while a message was being sent; although the bytes of a message were queued in a buffer and sent in the background through interrupt handlers, multiple messages could not be queued. This all works find for the OBDII protocols but not for a debugging system where debug information may be appended to the queue at any time.

 

So it looked like there was nothing for it but to bite the bullet and write my own communications software for it.

 

SECOND APPROACH

 

The first challenge was then to ensure that existing code in the ECU wouldn’t fight against what I was doing. I needed to find a way to completely knock out all access to the QSM/SCI from the existing code, whilst leaving the engine management functions intact and functioning normally. I set about this by making a list of all of the registers that were associated with the QSM/SCI (these being the control registers SCCR0 and SCCR1, the status register SCSR and the data register SCDR) and using my MEMS3 Browser application to track down all of the references to these in the code.

 

My first idea was simple. Unfortunately a bit too simple. All of the registers are accessed as memory locations ($FFFFFC08, $FFFFFC0A, $FFFFFC0C and $FFFFFC0F). My idea was to replace all references to these addresses with references to harmless addresses in RAM. The ECU would then just read and write to and from unused memory rather than to and from the QSM/SCI control registers. The ECU would be none the wiser but the QSM/SCI would not be touched and would be free for me to manipulate in my own code. However, when I came to look at the binary instructions that wrote to these addresses I was initially confused until I learned about 68000 short addressing.

 

Here’s an instruction to write a byte to the SCDR:

 

ROM:001122C2                 move.w  #$55,(Variable OBDII Port Data Register).w

 

The address associated with the SCDR is $FFFFFC0F, but the bytes for this instruction only specify the lower word $FCOF. The address is then sign-extended, so if the high bit of the lower word is 1 then the higher word is $FFFF and if the high bit of the lower word is 0 then the higher word is $0000. This means that these instructions can only access the first 32 kilobytes of the memory space $00000000-$00007FFF or the last 32 kilobytes of the memory space $FFFF8000-$FFFFFFFF. This works in the ECU as the registers are all mapped to the top few addresses. It did allow me to covere the whole of the RAM address space, but unfortunately all I found was that the existing code got very unhappy if all it read from the registers was the data it last wrote to them and the ECU just kept rebooting.

 

So the second idea was to assume that if a subroutine was handling OBDII processing, then hopefully the code was well enough structured that the subroutine was dedicated exclusively to OBDII functions and there wasn’t a random bit of engine code in the same subroutine. On this basis, there was a chance that I would be able to disable a number of entire subroutines (by overwriting the first instruction with RTS in the case of a normal subroutine or RTE in the case of an interrupt or exception vector) and still have a functioning ECU.

 

The list of subroutines which accessed the QSM/SCI was initially quite long:

Address

FFFFFC08

FFFFFC0A

FFFFFC0C

FFFFFC0F

Register

SCCR0

SCCR1

SCSR

SCDR

Vectors

$111680

$115E30

Ref.

Ref.

Subroutines

$1116D0

Ref.

$1116D6

Ref.

Ref.

$116AF6

Ref.

Ref.

$118C92

Ref.

Ref.

$132C8A

$111D4C

Ref.

$112172

Ref.

Ref.

$1121EE

Ref.

$112210

Ref.

$112234

Ref.

$112246

Ref.

$112278

Ref.

Ref.

$1122C2

Ref.

$1122CA

Ref.

$1122D2

Ref.

$1122DA

Ref.

$112318

Ref.

$116DAA

Ref.

Ref.

$11862E

Ref.

Ref.

$11878C

Ref.

Ref.

$125030

Ref.

$1254A4

Ref.

$1254DE

Ref.

$12558C

Ref.

$134816

Ref.

$13482A

Ref.

But with a bit of investigation using the dependency tree mapping in my MEMS3 Browser application I was able to see that these mostly formed trees, where one routine which accessed the OBDII port was actually often just a child subroutine of another parent subroutine which also accessed the OBDII port, so in fact with some planning I only needed to disable a small number of routines:

 

Address

FFFFFC08

FFFFFC0A

FFFFFC0C

FFFFFC0F

Register

SCCR0

SCCR1

SCSR

SCDR

Vectors

Fix

$111680 (QSM Int. Vector)

RTE

$111680 $4E

(OR Replace Vector)

$111681 $73

$115E30 (Unknown Int. Vect.)

Ref.

Ref.

RTS

$118C26 $4E

(RTS from Switch Subroutine)

$118C27 $75

Subroutines

Referenced By

Fix

$1116D0

Ref.

Vect. $115E30

RTS

$1116D0 $4E

$1116D1 $75

$1116D6

Ref.

Ref.

Main Entry

RTS

$1116D6 $4E

$1116D7 $75

$116AF6

Ref.

Ref.

Vect. $115E30

RTS

$116AF6 $4E

$116AF7 $75

$118C92

Ref.

Ref.

Vect. $115E30

RTS

$118C92 $4E

$118C93 $75

$132C8A

Master Update

RTS

$132C8A $4E

$132C8B $75

Referenced Only by the Above

Ref. By

$111D4C

Ref.

$111680

$112172

Ref.

Ref.

$111680

$1121EE

Ref.

$1116D6

$112210

Ref.

$1116D6

$112234

Ref.

$1116D6

$111680

$112246

Ref.

$1116D6

$111680

$112278

Ref.

Ref.

$1116D0

$1116D6

$111680

$1122C2

Ref.

$1116D6

$1122CA

Ref.

$1116D6

$1122D2

Ref.

$1116D6

$1122DA

Ref.

$1116D6

$112318

Ref.

$111680

$116DAA

Ref.

Ref.

$111680

$11862E

Ref.

Ref.

$111680

$11878C

Ref.

Ref.

$111680

$125030

Ref.

$1116D6

$111680

$1254A4

Ref.

$1116D6

$111680

$1254DE

Ref.

$1116D6

$111680

$12558C

Ref.

$1116D6

$134816

Ref.

$132C8A

$13482A

Ref.

$132C8A

There were two interrupt vectors to disable, namely $111680 which was assigned to interrupt number $50 in the interrupt vector table and interrupt number $50 was assigned to the QSM in code, and $115E30 which was assigned to interrupt number $52 in the interrupt vector table. There were the five subroutines to disable. Now although the QSM is assigned interrupt number $50, the QSM contains two submodules, the SCI (which drives the OBDII port) and the QSPI (Queued Serial Peripheral Interface). The SCI uses the assigned interrupt number $50 and the QSPI uses the next number $51. The vector for $51 is unassigned in the vector table suggesting that the QSPI is unused in the MEMS3 ECU. I couldn’t find any references in code to interrupt number $52 but is clearly being assigned somehow, and it broke the rule I proposed earlier that OBDII subroutines are likely to be exclusively OBDII-related, as disabling this interrupt vector left the ECU engine management function crippled. So instead I had to step one level down the tree of subroutines and disable the code within a SWITCH subroutine inside the vector routine itself and a three other subroutines which it called.

 

In the end I had to patch over just 14 bytes:

 

$111680 $4E

$111681 $73

$118C26 $4E

$118C27 $75

$1116D0 $4E

$1116D1 $75

$1116D6 $4E

$1116D7 $75

$116AF6 $4E

$116AF7 $75

$118C92 $4E

$118C93 $75

$132C8A $4E

$132C8B $75

 

I then wrote the firmware to the ECU using MEMS3 Mapper and the result was as I hoped. The ECU still ran the engine perfectly normally, but was completely silent in terms of OBDII communications. No outgoing data and all incoming data silently ignored. Otherwise it seemed perfectly happy. Once the ECU had loaded its firmware on boot, it was now effectively “bricked” as you could not communicate with it to update the programming further, but luckily my MEMS3 Mapper tool now has a “Recover Bricked ECU” function which sends a special code to the boot loader to stop it loading the firmware, and the boot loader code was still intact. So recovery was just one click each time I needed to reprogram the ECU.

 

Now poking individual bytes into memory locations was OK just to prove a point, but obviously to do any serious programming I was going to need a rather more functional setup. I searched on Google and found the EASy68K assembler. It’s nothing fancy, not a fully-featured IDE or anything, just a basic 68000 assembler with a simulator to test bits of code, but it looked like it would do the job and it was free, so I downloaded a copy and used it to put together the starts of a project. Here is the listing up to this point. It basically just declares RTS instructions at the addresses determined above:

   

    * START OF CODE

 

    START:

 

    * APPLY PATCH - DISABLE NATIVE QSM/SCI ACCESS

    *

    * The instructions below are patched into specific locations within the firmware to

    * effectively disable access to the QSM/SCI. RTS instructions are used to replace

    * existing instructions in order to force early return from a subroutine. In most

    * cases the first instruction of the subroutine is replaced, disabling the subroutine

    * entirely.

 

        ORG $118C26 * Subroutines, return with RTS. * Firmware ksr3p007 only!

        rts

        ORG $1116D0

        rts

        ORG $1116D6

        rts

        ORG $116AF6

        rts

        ORG $118C92

        rts

        ORG $132C8A

        rts

       

    END START

     

GETTING CODE INTO THE ECU

 

I was using my MEMS3 Mapper application for reading and writing to ECUs. The assembler produced two output files, a .L68 listing file (which was really for human consumption) and a .S68 file which on inspection was in Motorola S-Record format. This is a fairly simple file format to read, so I added support for importing firmware patches from .S68 files into my MEMS3 Mapper application. I added in quite a bit of checking code too, to make sure that the .S68 file was fully understood and decoded and to make sure that it could only be loaded against the firmware version for which it was written, as it was based on absolute addresses which would vary between firmware versions.

 

When the MEMS3 Mapper application loads a firmware patch from a .S68 file, it only applies changes which fall within the firmware memory address space and raises an error if it finds anything which falls outside of the firmware. It initially copies the firmware into a buffer, then overwrites any bytes specified in the S-Record file. It then copies the data back from the buffer into the firmware provided that the .S68 loaded correctly. Before doing this, it compares the Firmware ID field in the buffer with that in the firmware and rejects the patch if they are different. This allowed me to include the Firmware ID field in my patch, and the loader would only then let me load it against the correct firmware.

 

I added the following lines at the top of my listing. Notice that each character of the firmware ID is repeated twice; this is correct, this is how the ECU stores the firmware ID for some reason which I do not know:

 

    * FIRMWARE COMPATIBILITY CHECK

    *  

    * This writes the firmware ID into the location at which it is normally stored within the

    * firmware. This allows the MEMS3 Mapper S-Record loader code to verify that the patch is

    * being applied to the firmware version for which it was written.

    *

    * Many of addresses specified as constants within this file are specific to a firmware

    * version. Constants annotated "Firmware ksr3p007 only!" are only applicable to the

    * firmware version ksr3p007.

   

    ADDR_FIRMWARE_ID: EQU $110400

   

        ORG ADDR_FIRMWARE_ID

        DC.B 'kkssrr33pp000077'     * Firmware ksr3p007 only!

 

INITIALISING EVERYTHING

 

Once I started putting some proper code together, I was clearly going to need to initialise things as the ECU booted up. In order to do this, I was going to have to patch into the main initialisation code of the existing firmware. Looking at the main entry point routine of the firmware using my MEMS3 Browser tool, I could see this section (all of the comments are added by me):

 

ROM:001156B0 loc_1156B0:

ROM:001156B0                 movea.l #$400,sp

ROM:001156B6                 move.w  sp,(Variable $53E).w

ROM:001156BA                 movea.l #word_13C000,a5         ; point register a5 to start of map

ROM:001156C0                 move.l  a5,(Variable Map Base Address).w

ROM:001156C4                 bsr.w   Copy Point Subroutine sub_1158BE

ROM:001156C8                 bsr.l   Subroutine sub_116056

ROM:001156CE                 clr.w   (Variable $4CC).w

ROM:001156D2                 bsr.l   Subroutine sub_119524

ROM:001156D8                 bsr.l   Point Subroutine sub_110D06

ROM:001156DE                 bsr.l   Copy Point Subroutine nullsub_52

ROM:001156E4                 bsr.l   Copy Point Subroutine sub_115E1E

ROM:001156EA                 bsr.l   Point Subroutine sub_11086A

ROM:001156F0                 bsr.l   Point Subroutine sub_11143E

ROM:001156F6                 bsr.l   Point Subroutine sub_111422

ROM:001156FC                 bsr.l   Point Subroutine sub_135582

ROM:00115702                 move.b  #0,(Variable SIM Register PFPAR).w

ROM:00115708                 move    #$2000,sr

ROM:0011570C

ROM:0011570C ; This is the start of the main program loop.

ROM:0011570C

ROM:0011570C main_program_loop:

ROM:0011570C                 move.b  #$55,(Variable Software Watchdog Service Register).w ; reset watchdog service step 1 of 2

ROM:00115712                 move.b  #$AA,(Variable Software Watchdog Service Register).w ; reset watchdog service step 2 of 2

ROM:00115718                 movea.l (Variable Map Base Address).w,a5 ; point register a5 to start of map

ROM:0011571C                 addq.w  #1,(Local Variable $C1A).w

ROM:00115720                 bvc.s   loc_115728

ROM:00115722                 move.w  #$8000,(Local Variable $C1A).w

 

The first few lines are the end of the ECU initialisation code which executes as the firmware boots up. Once that is complete it drops into the main program loop, which just executes over and over again. The first thing the main program loop does is to reset the watchdog service to tell it that it’s still alive and well and running normally (if this isn’t done within a certain timeframe the watchdog service will reboot the ECU).

 

The last but one line of the initialisation code at address $115702 cleared one of the microcontroller peripheral registers to 0. At this point all of the normal initialisation of the ECU is complete and it is ready to start normal operation in the main program loop, so this is where I need to make any configuration changes without them being overwritten by anything. The next instruction is at address $115708, so this instruction occupies 6 bytes, which is exactly the same size as a JSR (Jump to Subroutine) instruction. And it’s relocatable (there’s nothing about it that accesses data relative to the address of the instruction, so it will work just the same from any memory address). So all I needed to do was to replace this one instruction with a call to my own initialisation code, and make sure that within my own code I executed the one instruction I had overwritten before returning.

 

I added in some constants to set where everything was going to go in memory:

 

    * CONSTANTS - BASE ADDRESSES

    *

    * These constants declare the base addresses for the RAM and ROM blocks which we have

    * chosen to use. They also declare the base address of the firmware vector table which is

    * found at the very start of the firmware address space, the base address of the map and

    * the base address for the table index in the map.

 

    ADDR_BASE_RAM_DATA: EQU $1900   * Base RAM data address.    * Firmware ksr3p007 only!

    ADDR_BASE_ROM_CODE: EQU $139040 * Base ROM code address.    * Firmware ksr3p007 only!

    ADDR_BASE_VECTORS:      EQU $110000 * Base address for vectors.

   

A constant to declare where I was going to patch in the call:

 

    * CONSTANT - INITIALISATION PATCH ADDRESSES

    *

    * This constant declares the address within the firmware initialisation routine where we

    * have chosen to patch in a call to our own initialisation routine.

 

    PATCH_INI_JMP:  EQU $115702 * Firmware ksr3p007 only!

 

The basic outline of an initialisation routine:

 

    * PATCH ROUTINE FOR INITIALISATION

    *

    * The _patch_initialise subroutine initialises the debug patch. It is called from the

    * main firmware initialisation routine by overwriting a MOVE.B #0,($FFFFFA1F).W

    * instruction with JSR _PATCH_INITIALISE. The _patch_initialise subroutine then

    * executes the relocatable instruction which was overwritten then proceeds to perform

    * its own initialisation tasks.

    *

    * PARAMETERS:

    *   Void.

    * RETURNS:

    *   Void.

   

    _patch_initialise:

 

        OPT NOWAR                               * Execute instruction replaced by patch.

        move.b #0,($FFFFFA1F).w

        OPT WAR

 

        REG_PATCH_INI: REG d0-d1,ao-a1,sr       * Save registers (PATCH: Preserves ALL).

        movem.l REG_PATCH_INI,-(sp)         * Non-scratch registers preserved by subroutines.

       

        **** INITIALISATION CODE GOES HERE ****

       

        movem.l (sp)+,REG_PATCH_INI         * Restore registers.

     

        rts

 

And the actual patch to replace the instruction with a call to this routine:

 

    * APPLY PATCH - INITIALISATION

    *

    * The _patch_initialise subroutine initialises the debug patch. It is called from the

    * main firmware initialisation routine by overwriting a MOVE.B #0,($FFFFFA1F).W

    * instruction with JSR _PATCH_INITIALISE. The _patch_initialise subroutine then

    * executes the relocatable instruction which was overwritten then proceeds to perform

    * its own initialisation tasks.

   

        ORG PATCH_INI_JMP       * Patch address.

        jsr _patch_initialise   * Patch routine.

       

Notice that the first thing I did within my initialisation routine was to execute the MOVE.B #0,($FFFFFA1F).W instruction which replaced the one overwritten by the call. In the disassembled listing shown earlier the register address $FFFFFA1F was decoded and displayed as Variable SIM Register PFPAR but they both refer to the same things. I then had to take care to ensure that the code I was patching in did not disturb any of the register values set up by the existing code, so I pushed all registers to the stack before any of my code and restored them all afterwards.

 

REPLACING THE OBDII ROUTINES

 

Some searching on the internet turned up this: https://www.nxp.com/docs/en/application-note/AN1724.pdf “Implementing SCI Receive and Transmit Buffers in C”. Freescale (previously the semiconductor division of Motorola, later merged with NXP, who produced the MC68336 microcontroller chip) had produced an example, written in C, of a ring-buffered receiver and transmitter for the 68336 SCI. Now this should really make things a lot easier. It was in the form of generic library functions to initialise the SCI, query the status, read received bytes and queue up bytes for writing and seemed to to be exactly what I needed. Being interrupt driven, the actual transmission happened in the background and the code sending the message was not delayed waiting for the transmission to complete.

 

All I needed to do (in theory …) was to find a C compiler for the 68000 CPU that would allow me to set the based address for RAM and EEPROM to place them in the free areas identified earlier.

 

Of course nothing is ever that simple. All of this technology is getting a bit long in the tooth now, but luckily the 68000 was very well supported and there is still plenty of stuff available for free on the net, however there were so many different flavours in the 68000 family that a lot of the different tools out there are just not compatible. In the end I had to go through the following process to end up with something that at least looked like it might work:

 

·        I used the online C to 68000 Assembly version of Compiler Explorer: https://franke.ms/cex/. I compiled the C file as though it was for an Amiga computer, using the g++-10.2.1b compiler settings.

·        This compiled to Assembly language rather than right down to binary code, but this was ideal as I could then combine this with the other assembly code I was writing and feed it through one common assembler.

·        I then used the EASy68K assembler as before.

 

On inspection, the assembled code had some problems:

 

·        One of the routines was an interrupt vector. These need to be compiled with certain particular features. In particular they must preserve the values of all registers (as they could get executed asynchronously in the middle of other random code being executed, so they must make sure that when they return it is as though they had never been called). Interrupt vector routines also need to return with an RTE (Return From Exception) instruction instead of an RTS (Return from Subroutine) instruction, to exit supervisor mode and restore the status register. The C code marked this routine with #pragma TRAP_PROC which was supposed to instruct the compiler to compile it as a vector, but the online compiler clearly hadn’t understood this and it was compiled as a regular subroutine. It was fairly easy to edit in the required changes by hand.

·        The assembly language produced by the compiler used mangled subroutine names, which the assembler rejected as being too long. These had to be manually edited down to shorter names acceptable to the assembler.

·        The assembly language produced by the compiler was intended to feed into a linker, and therefore used address-independent pseudo-opcodes like JBSR for a jump (which the linker would convert to BSR for a short branch or JSR for a long jump once it knew where everything was going in memory). This was fairly easy to work around, manually editing all of the pseudo-opcodes into real opcodes.

·        The syntax used by the compiler and assembler differed slightly, for example the compiler produced LINK.W opcodes where the assembler expected the syntax LINK. Again these were all easily edited by hand.

·        All of the meaningful constants, such as QSM register addresses, had been lost during the process of compilation and appeared in the code as meaningless numbers. This was compounded by the fact that addresses for registers such as $FFFFFC00 had been converted to NEGATIVE numbers, in this case -1024. Whilst technically correct, this was not consistent with the format used elsewhere throughout the assembly language code I was working with for the ECU and made the code difficult to read. Once again, I had to manually work my way through the assembled code defining constants and editing them in where used. This was particularly important as some of the “constants” were likely to change in the future, for example the buffer sizes I had chosen. I needed to ensure that these changes would be applied consistently.

·        The design used the same buffer size for the transmit and receive buffers. This constant was compiled into the routines that queried the queue status. For my used case this was extremely wasteful of RAM which was a previous resource. Ideally I needed a large transmit buffer to allow me to log a lot of messages in quick succession without overrunning, but I did not need a similarly large receive buffer. In fact for the initial plans I had in mind I didn’t really need receive functionality at all but no doubt it will come in useful in the future so I decided to retain it. I had to do some work to do to parameterise the queue status function and separate out the transmit and receive buffer sizes.

 

CALLING CONVENTIONS

 

The biggest problem with the assembled code quickly became obvious during initial testing.

 

The standard calling convention on 68000-based systems (in fact about the only calling convention I can find documented anywhere) is that registers d0, d1, a0, a1 are “scratch registers”, which means that you have to assume that after calling a subroutine these registers will have been overwritten. All other registers must be preserved by subroutines, which is generally done using MOVEM instructions to push them onto the stack. Parameters are passed to subroutines on the stack. Local variable storage is on the stack, with a stack frame being allocated for each subroutine using LINK and UNLK instructions.

 

The 68000 has a 24-bit address bus and so can access 16 megabytes of memory. All of the above works fine on an Atari or an Amiga, where RAM is plentiful. But as I have already established, here there is only 7.5 kilobytes for the whole system, and much of that is allocated to specific purposes. The amount of RAM left allocated for the stack is absolutely tiny, especially when handling an interrupt where all of the registers get stacked. The apparent hole in the memory allocation used for the stack was only 254 bytes in size. Even just pushing the 8 data registers and 8 address registers when handling an interrupt would use 64 bytes which is a quarter of the total stack space. You really can’t nest anything very deeply using a protocol that is stack-hungry. Everything I tried to do with the assembled code above eventually led to stability problems and crashes, and in nearly every case the cause turned out to be the same; I had simply run out of stack space and the stack was overwriting other things in RAM.

 

A quick analysis of the disassembled code for one particular firmware revealed that there were 1322 subroutines. The LINK and UNLK instructions, which I expected to be used in most subroutines for allocating a stack frame, were used a total of ONCE across the whole of the code. The MOVEM instruction, which I expected to be used in most subroutines for preserving registers, was used for less frequently than expected. The standard 68000 calling convention was just far too stack-hungry to use with this little RAM available in an embedded system. I was clearly going to have to do a lot of work to rework everything to work in a way more similar to rest of the ECU code.

 

This would involve combing through all of the compiled code by hand, stripping out all of the stack frames, changing all of the parameter access from stack-based to register based and replacing all of the local variables allocated on the stack with static RAM addresses.

 

After a lot of manual editing and formatting, I ended up with the following assembly language listing:

   

   

    * CALLING CONVENTION

    *

    * The standard 68000 calling convention is that registers d0-d1 and a0-a1 are scratch

    * and must be assumed to be trashed after calling a subroutine. All other registers

    * must be preserved by the callee. Parameters are passed on the stack in right to left

    * order. Stack frames are used to allocate local variable space on the stack. Results

    * are returned in register d0.

    *

    * HOWEVER: This calling convention is "stack-hungry" and the stack space available here

    * is very small. None of the native code within the ECU boot loader or firmware follows

    * this calling convention. Attempting to follow this convention in this code just

    * resulted in stack overflows and crashes. The code in this file is therefore written

    * to be more like the native ECU and follows the following convention:

    *

    * Registers d0-d1 and a0-a1 AND ANY REGISTERS IN WHICH PARAMETERS ARE PASSED TO A

    * SUBROUTINE are scratch and must be assumed to be trashed after calling a subroutine.

    * All other registers must be preserved by the callee. Parameters are passed in

    * registers d1, d2, d3 ... (for data values) and a1, a2, a3 ... (for pointers) in left

    * to right order. Stack frames are not used. All subroutines are non-re-entrant and

    * so local variables are allocated as global variables at fixed RAM addresses. Stack is

    * used sparingly where required to preserve specific register values. Numeric results

    * are generally returned in register d1. Register d0 is often used to return an error

    * code with $00 meaning success and anything else (often simply $FF) meaning failure.

 

    * START OF CODE

 

    START:

 

    * FIRMWARE COMPATIBILITY CHECK

    *  

    * This writes the firmware ID into the location at which it is normally stored within the

    * firmware. This allows the MEMS3 Mapper S-Record loader code to verify that the patch is

    * being applied to the firmware version for which it was written.

    *

    * Many of addresses specified as constants within this file are specific to a firmware

    * version. Constants annotated "Firmware ksr3p007 only!" are only applicable to the

    * firmware version ksr3p007.

   

    ADDR_FIRMWARE_ID: EQU $110400

   

        ORG ADDR_FIRMWARE_ID

        DC.B 'kkssrr33pp000077'     * Firmware ksr3p007 only!

   

    * CONSTANTS - BASE ADDRESSES

    *

    * These constants declare the base addresses for the RAM and ROM blocks which we have

    * chosen to use. They also declare the base address of the firmware vector table which is

    * found at the very start of the firmware address space, the base address of the map and

    * the base address for the table index in the map.

 

    ADDR_BASE_RAM_DATA: EQU $1900   * Base RAM data address.    * Firmware ksr3p007 only!

    ADDR_BASE_ROM_CODE: EQU $139040 * Base ROM code address.    * Firmware ksr3p007 only!

    ADDR_BASE_VECTORS:      EQU $110000 * Base address for vectors.

   

    * CONSTANT - INITIALISATION PATCH ADDRESS

    *

    * This constant declares the address within the firmware initialisation routine where we

    * have chosen to patch in a call to our own initialisation routine.

 

    PATCH_INI_JMP:  EQU $115702 * Firmware ksr3p007 only!

   

    * CONSTANTS - DATA FORMAT

    *

    * These constants declare the data format to be used for communications. The BAUD_RATE

    * should be chosen to be close to (ideally within 2% of) one which an FT232RL or CH340G

    * cable can synthesize and such that 16,307,200 / 32 / BAUD_RATE is an integer. The

    * PARITY constant combines appropriate values from PARITY_EVEN, PARITY_ODD,

    * PARITY_DISABLED and PARITY_ENABLED as required. The FRAME_BITS constant may be set to

    * either FRAME_BITS_10 (e.g. 1 start bit, 8 data bits and 1 stop bit) or FRAME_BITS_11

    * (e.g. 1 start bit, 8 data bits, 1 parity bit and 1 stop bit). The DATA_FORMAT constant

    * then combines PARITY and FRAME_BITS.

 

    BAUD_9600:          EQU 9615

    BAUD_19600:         EQU 19600

    BAUD_39200:         EQU 39200

    BAUD_85000:         EQU 84933

    BAUD_127400:        EQU 127400

    BAUD_256000:        EQU 254800

    BAUD_RATE:          EQU BAUD_25600

    PARITY_EVEN:        EQU 0

    PARITY_ODD:         EQU $800

    PARITY_DISABLED:    EQU 0

    PARITY_ENABLED:     EQU $400

    PARITY:             EQU PARITY_EVEN|PARITY_ENABLED  * Parity.

    FRAME_BITS_10:      EQU 0

    FRAME_BITS_11:      EQU $200

    FRAME_BITS:         EQU FRAME_BITS_11               * Bits.

    DATA_FORMAT:        EQU PARITY|FRAME_BITS

   

    * CONSTANTS - QSM REGISTER QMCR

    *

    * These constants declare fields and values for use with the QSM module control register

    * QMCR. These define the QSM configuration registers as lying within protected supervisor

    * address space and set the QSM interrupt arbitration priority value to 7. These are the

    * same values as are normally assigned by the firmware.

   

    QSM_QMCR_SUP:       EQU $80 * QSM registers in supervisor space.

    QSM_QMCR_IARB_7:    EQU $07 * QSM interrupt arbitration priority 7.

    QSM_QMCR_VALUE:     EQU QSM_QMCR_SUP|QSM_QMCR_IARB_7

 

    * CONSTANTS - QSM REGISTER QILR

    *

    * These constants declare fields and values for use with the QSM interrupt level register

    * QILR. These set the SCI interrupt level to 6, and the QSPI interrupt level to 0 (the

    * QSPI is unused). These are the same values as are normally assigned by the firmware.

   

    QSM_QILR_ILQSP: EQU 0   * QSPI unused, interrupt level 0.

    QSM_QILR_ILSCI: EQU $06 * SCI interrupt level 6.

    QSM_QILR_VALUE: EQU QSM_QILR_ILQSP|QSM_QILR_ILSCI

 

    * CONSTANTS - QSM REGISTER QIVR

    *  

    * These constants declare fields and values for use with the QSM interrupt vector

    * register QIVR. These set the SCI interrupt vector number to $50 (the QSPI interrupt

    * vector number the automatically becomes $51 but is unused). These are the same values

    * as are normally assigned by the firmware.

 

    QSM_QIVR_INTV:  EQU $50 * SCI interrupt vector $50.

    QSM_QIVR_VALUE: EQU QSM_QIVR_INTV

 

    * CONSTANTS - QSM REGISTER SCCR0

    *

    * These constants declare fields and values for use with the QSM/SCI control register

    * SCCR0. They set the Baud rate using a calculation based on the BAUD_RATE constant

    * declared above.

 

    QSM_SCCR0_SCBR:     EQU 16307200/32/BAUD_RATE   * Baud rate.

    QSM_SCCR0_VALUE:    EQU QSM_SCCR0_SCBR

 

    * CONSTANTS - QSM REGISTER SCCR1

    *

    * These constants declare fields and values for use with the QSM/SCI control register

    * SCCR1. They combine the DATA_FORMAT constant declared above with enable flags for the

    * receiver, the transmitter and receiver interrupts by default (transmitter interrupts

    * are enabled and disabled dynamically based on whether there is more data to be sent).

   

    QSM_SCCR1_RE:       EQU $4  * RX enable.

    QSM_SCCR1_TE:       EQU $8  * TX enable.

    QSM_SCCR1_RIE:      EQU $20 * RX interrupt enable.

    QSM_SCCR1_TIE:      EQU $80 * TX interrupt enable.

    QSM_SCCR1_VALUE:    EQU DATA_FORMAT|QSM_SCCR1_RE|QSM_SCCR1_TE|QSM_SCCR1_RIE

 

    * CONSTANTS - QSM REGISTER ADDRESSES

    *

    * These constants declare the memory addresses to which the QSM registers are mapped.

    * QSM registers are accessed as though they were simply memory at the addresses below.

   

    QSM_BASE:   EQU $FFFFFC00       * Base address of the QSM registers.

    QSM_QMCR:   EQU QSM_BASE        * QSM module control register QMCR address.

    QSM_QILR:   EQU QSM_BASE+$04    * QSM interrupt level register QILR address.

    QSM_QIVR:   EQU QSM_BASE+$05    * QSM interrupt vector register QIVR address.

    QSM_SCCR0:  EQU QSM_BASE+$08    * QSM control register SCCR0 address.

    QSM_SCCR1:  EQU QSM_BASE+$0A    * QSM control register SCCR1 address.

    QSM_SCSR:   EQU QSM_BASE+$0C    * QSM status register SCSR address.

    QSM_SCDR:   EQU QSM_BASE+$0E    * QSM data register SCDR address.

   

    * CONSTANTS - QUEUE SIZES

    *

    * These constants declare the sizes of the receive and transmit queues in bytes. These

    * are the actual numbers of bytes which may be queued and not the overall sizes of the

    * structures including their control fields, which are 6 bytes larger.

 

    RX_QUEUE_SIZE: EQU $20  * RX queue size in bytes.

    TX_QUEUE_SIZE: EQU $80  * TX queue size in bytes.

   

    * RAM DATA

 

        ORG ADDR_BASE_RAM_DATA

 

    * RAM VARIABLES - RECEIVE AND TRANSMIT QUEUES

    *  

    * The labels below identify the locations of the receive and transmit queue structures

    * in RAM. These use the DS.B directive to define them as storage locations without

    * specifying the contents. It is important that the assembled .S68 file does not contain

    * data for locations which lie outside of the firmware memory address space as the MEMS3

    * Mapper application will reject .S68 files which do.

 

    _rx_queue:  DS.B RX_QUEUE_SIZE+6    * RX queue structure.

    _tx_queue:  DS.B TX_QUEUE_SIZE+6    * TXT queue structure.

   

 

    * RAM VARIABLES - LOCAL VARIABLES - QUERY STATUS FOR RX OR TX QUEUE

    *

    * The labels below identify the locations used for local variables for the _queue_status

    * subroutine. Local variable use fixed locations in preference to being allocated on

    * the stack as the stack is small and this subroutine is not re-entrant.

   

    _queue_status_w_in: DS.W 1

    _queue_status_w_full:   DS.W 1

    _queue_status_w_out:    DS.W 1

 

    * RAM VARIABLES - LOCAL VARIABLES - QSM/SCI INTERRUPT VECTOR

    *

    * The labels below identify the locations used for local variables for the _sci_interrupt

    * subroutine. Local variable use fixed locations in preference to being allocated on

    * the stack as the stack is small and this subroutine is not re-entrant.

   

    _sci_interrupt_w_scsr:      DS.W 1

    _sci_interrupt_w_scdr:      DS.W 1

    _sci_interrupt_w_in:        DS.W 1

    _sci_interrupt_w_full:      DS.W 1

    _sci_interrupt_w_out:       DS.W 1

    _sci_interrupt_l_infull:    DS.L 1

 

    * RAM VARIABLES - LOCAL VARIABLES - READ BYTE FROM RX QUEUE

    *

    * The labels below identify the locations used for local variables for the _rx_byte

    * subroutine. Local variable use fixed locations in preference to being allocated on

    * the stack as the stack is small and this subroutine is not re-entrant.

   

    _rx_byte_w_in:      DS.W 1

    _rx_byte_w_full:    DS.W 1

    _rx_byte_w_out:     DS.W 1

    _rx_byte_l_fullout: DS.L 1

 

    * RAM VARIABLES - LOCAL VARIABLES - WRITE BYTE TO TX QUEUE

    *

    * The labels below identify the locations used for local variables for the _tx_byte

    * subroutine. Local variable use fixed locations in preference to being allocated on

    * the stack as the stack is small and this subroutine is not re-entrant.

   

    _tx_byte_b_txbyte:  DS.W 1

    _tx_byte_w_in:      DS.W 1

    _tx_byte_w_full:    DS.W 1

    _tx_byte_w_out:     DS.W 1

    _tx_byte_l_infull:  DS.L 1

 

    * ROM CODE

 

        ORG ADDR_BASE_ROM_CODE

       

    * FREESCALE SCI BUFFER LIBRARY FUNCTIONS

    *

    * The library functions _sci_initialise, _sci_interrupt, _queue_initialise,

    * _queue_status, _rx_byte and _tx_byte below represent a reworking of the code from the

    * Freescale Application Node AN1724 "Implementing SCI Receive and Transmit Buffers in

    * C" at https://www.nxp.com/docs/en/application-note/AN1724.pdf. This code was compiled

    * to 68000 assembly language using Compiler Explorer: https://franke.ms/cex/, compiling

    * for Amiga using the g++-10.2.1b compiler settings. The code was then edited by hand

    * as required to make it compatible with EASy68K.

   

    * INITIALISE QSM/SCI

    *

    * The _sci_initialise subroutine initialises the QSM, QILR, QIVR, SCCR0 and SCCR1

    * registers of the QSM with constant values declared above. See the comments for the

    * various constant blocks above for further information.

    *

    * PARAMETERS:

    *   Void.

    * RETURNS:

    *   Void.

 

    _sci_initialise:

 

        move.w #QSM_QMCR_VALUE,(QSM_QMCR)       * Initialise QSM register QMCR.

        move.b #QSM_QILR_VALUE,(QSM_QILR)       * Initialise QSM register QILR.

        move.b #QSM_QIVR_VALUE,(QSM_QIVR)       * Initialise QSM register QIVR.

        move.w #QSM_SCCR0_VALUE,(QSM_SCCR0) * Initialise QSM register SCCR0.

        move.w #QSM_SCCR1_VALUE,(QSM_SCCR1) * Initialise QSM register SCCR1.

       

        rts

       

    * QSM/SCI INTERRUPT VECTOR

    *

    * The _sci_interrupt subroutine forms the interrupt vector for the QSM/SCI. This is used

    * to completely replace the interrupt vector subroutine in the firmware by patching the

    * entry in the firmware vector table. This subroutine was initially compiled from C as

    * described above, then edited by hand for compatibility. In particular the MOVEM, LINK

    * and UNLK instructions were added and RTS was replaced with RTE in order to make the

    * compiled subroutine compatible with being an interrupt vector. Stack frames were

    * removed and stack local storage was replaced with fixed RAM addresses.

    *

    * PARAMETERS:

    *   Void.

    * RETURNS:

    *   Void.

       

    _sci_interrupt:

   

        REG_SCI_INT: REG d0-d1/a0-a1    * Save registers (INTERRUPT: preserves ALL).

        movem.l REG_SCI_INT,-(sp)       * SR preserved implicitly.

       

        move.w #QSM_SCSR,a0

        move.w (a0),(_sci_interrupt_w_scsr)

        moveq #0,d0

        move.w (_sci_interrupt_w_scsr),d0

        moveq #64,d1

        and.l d1,d0

        tst.l d0

        beq _sci_interrupt_4

        move.w #QSM_SCDR,a0

        move.w (a0),(_sci_interrupt_w_scdr)

        moveq #0,d0

        move.w (_sci_interrupt_w_scsr),d0

        moveq #8,d1

        and.l d1,d0

        tst.l d0

        bne _sci_interrupt_10

        move.w _rx_queue,(_sci_interrupt_w_in)

        move.w 2+_rx_queue,(_sci_interrupt_w_full)

        move.w 4+_rx_queue,(_sci_interrupt_w_out)

        move.w (_sci_interrupt_w_in),d0

        cmp.w (_sci_interrupt_w_out),d0

        bne _sci_interrupt_1

        tst.w (_sci_interrupt_w_full)

        bne _sci_interrupt_10

       

    _sci_interrupt_1:

   

        moveq #0,d0

        move.w (_sci_interrupt_w_in),d0

        move.w (_sci_interrupt_w_scdr),d1

        lea 6+_rx_queue,a0

        move.l d0,a1

        move.b d1,(0,a1,a0.l)

        move.w (_sci_interrupt_w_in),d0

        addq.w #1,d0

        move.w d0,(_sci_interrupt_w_in)

        cmp.w #RX_QUEUE_SIZE-1,(_sci_interrupt_w_in)

        bls _sci_interrupt_2

        clr.w (_sci_interrupt_w_in)

       

    _sci_interrupt_2:

   

        move.w (_sci_interrupt_w_in),d0

        cmp.w (_sci_interrupt_w_out),d0

        bne _sci_interrupt_3

        moveq #0,d0

        move.w (_sci_interrupt_w_out),d0

        swap d0

        clr.w d0

        moveq #1,d1

        or.l d1,d0

        move.l d0,(_sci_interrupt_l_infull)

        lea _rx_queue,a0

        move.l (_sci_interrupt_l_infull),(a0)          

        bra _sci_interrupt_10

       

    _sci_interrupt_3:

 

        move.w (_sci_interrupt_w_in),_rx_queue

        bra _sci_interrupt_10

 

    _sci_interrupt_4:

   

        moveq #0,d0

        move.w (_sci_interrupt_w_scsr),d0

        moveq #8,d1

        and.l d1,d0

        tst.l d0

        beq _sci_interrupt_5

        move.w #QSM_SCDR,a0

        move.w (a0),(_sci_interrupt_w_scdr)

        bra _sci_interrupt_10

       

    _sci_interrupt_5:

   

        moveq #0,d0

        move.w (_sci_interrupt_w_scsr),d0

        and.l #256,d0

        tst.l d0

        beq _sci_interrupt_9

        move.w _tx_queue,(_sci_interrupt_w_in)

        move.w 2+_tx_queue,(_sci_interrupt_w_full)

        move.w 4+_tx_queue,(_sci_interrupt_w_out)

        move.w (_sci_interrupt_w_in),d0

        cmp.w (_sci_interrupt_w_out),d0

        bne _sci_interrupt_6

        tst.w (_sci_interrupt_w_full)

        beq _sci_interrupt_8

       

    _sci_interrupt_6:

   

        moveq #0,d0

        move.w (_sci_interrupt_w_out),d0

        lea 6+_tx_queue,a0

        move.l d0,a1

        move.b (0,a1,a0.l),d0

        move.b d0,d0

        and.w #255,d0

        move.w d0,(_sci_interrupt_w_scdr)

        move.w (_sci_interrupt_w_out),d0

        addq.w #1,d0

        move.w d0,(_sci_interrupt_w_out)

        cmp.w #TX_QUEUE_SIZE-1,(_sci_interrupt_w_out)

        bls _sci_interrupt_7

        clr.w (_sci_interrupt_w_out)

       

    _sci_interrupt_7:

   

        moveq #0,d0

        move.w (_sci_interrupt_w_out),d0

        move.l d0,(_sci_interrupt_l_infull)

        lea 2+_tx_queue,a0

        move.l (_sci_interrupt_l_infull),(a0)

        move.w #QSM_SCDR,a0

        move.w (_sci_interrupt_w_scdr),(a0)

        bra _sci_interrupt_10

       

    _sci_interrupt_8:

   

        move.w #QSM_SCCR1,a0

        move.w #QSM_SCCR1_VALUE,(a0)

        bra _sci_interrupt_10

       

    _sci_interrupt_9:

   

        bsr _sci_initialise

       

    _sci_interrupt_10:

 

        movem.l (sp)+,REG_SCI_INT       * Restore registers.

       

        rte

       

    * INITIALISE RX OR TX QUEUE

    *

    * The _queue_initialise subroutine is called to initialise a queue structure (either the

    * receive or transmit queue). This subroutine was initially compiled from C as described

    * above, then edited by hand for compatibility. Stack frames were removed and stack

    * local storage was replaced with fixed RAM addresses.

    *

    * PARAMETERS:

    *   Address of queue (Long) in a1.

    * RETURNS:

    *   Void.

 

    _queue_initialise:

   

        move.w #2,(a1)

        clr.w (2,a1)

        move.w #2,(4,a1)

       

        rts

       

    * QUERY STATUS FOR RX OR TX QUEUE

    *

    * The _queue_status subroutine is called to query the number of bytes currently queued in

    * a queue structure (either the receive or transmit queue). This subroutine was initially

    * compiled from C as described above, then edited by hand for compatibility. Stack frames

    * were removed and stack local storage was replaced with fixed RAM addresses.

    *

    * PARAMETERS:

    *   Address of queue (Long) in a1.

    *   Size of queue (Word) in d1.

    * RETURNS:

    *   Word in d0.

       

    _queue_status:

   

        move.w (a1),(_queue_status_w_in)

        move.w (2,a1),(_queue_status_w_full)

        move.w (4,a1),(_queue_status_w_out)

        move.w (_queue_status_w_in),d0

        cmp.w (_queue_status_w_out),d0

        bls _queue_status_1

        move.w (_queue_status_w_in),d0

        sub.w (_queue_status_w_out),d0

        bra _queue_status_4

 

    _queue_status_1:

   

        move.w (_queue_status_w_in),d0

        cmp.w (_queue_status_w_out),d0

        bcc _queue_status_2

        move.w d1,d0

        sub.w (_queue_status_w_out),d0

        add.w (_queue_status_w_in),d0

        bra _queue_status_4

 

    _queue_status_2:

   

        tst.w (_queue_status_w_full)

        beq _queue_status_3

        move.w d1,d0

        bra _queue_status_4

 

    _queue_status_3:

   

        clr.w d0

 

    _queue_status_4:

   

         rts

 

    * READ BYTE FROM RX QUEUE

    *

    * The _rx_byte subroutine is called to get the next received byte queued from the

    * receive queue. This subroutine was initially compiled from C as described above, then

    * edited by hand for compatibility. Stack frames were removed and stack local storage

    * was replaced with fixed RAM addresses.

    *

    * PARAMETERS:

    *   Address of byte (Long) in a1.

    * RETURNS:

    *   Byte ($00 Success, $FF Failure) in d0.

       

    _rx_byte:

   

        move.w _rx_queue,_rx_byte_w_in

        move.w 2+_rx_queue,_rx_byte_w_full

        move.w 4+_rx_queue,_rx_byte_w_out

        move.w _rx_byte_w_in,d0

        cmp.w _rx_byte_w_out,d0

        bne _rx_byte_1

        tst.w _rx_byte_w_full

        beq _rx_byte_3

       

    _rx_byte_1:

   

        moveq #0,d0

        move.w _rx_byte_w_out,d0

        lea 6+_rx_queue,a0

        move.l d0,a1

        move.b (0,a1,a0.l),d0

        move.l a1,a0

        move.b d0,(a0)

        move.w _rx_byte_w_out,d0

        addq.w #1,d0

        move.w d0,_rx_byte_w_out

        cmp.w #RX_QUEUE_SIZE-1,_rx_byte_w_out

        bls _rx_byte_2

        clr.w _rx_byte_w_out

       

    _rx_byte_2:

   

        moveq #0,d0

        move.w _rx_byte_w_out,d0

        move.l d0,_rx_byte_l_fullout

        lea 2+_rx_queue,a0

        move.l _rx_byte_l_fullout,(a0)

        clr.b d0

        bra _rx_byte_4

       

    _rx_byte_3:

   

        moveq #$FF,d0

       

    _rx_byte_4:

   

        rts

       

    * WRITE BYTE TO TX QUEUE

    *

    * The _tx_byte subroutine is called to send the next transmitted byte queued to the

    * transmit queue. The byte is queued for future background transmission under interrupt

    * control and this routine returns immediately without waiting for the transmission to

    * complete. This subroutine was initially compiled from C as described above, then edited

    * by hand for compatibility. Stack frames were removed and stack local storage was

    * replaced with fixed RAM addresses.

    *

    * PARAMETERS:

    *   Byte (Word) in d1. 

    * RETURNS:

    *   Byte ($00 Success, $FF Failure) in d0.

       

    _tx_byte:

   

        move.w d1,d0

        move.b d0,_tx_byte_b_txbyte

        move.w _tx_queue,_tx_byte_w_in

        move.w 2+_tx_queue,_tx_byte_w_full

        move.w 4+_tx_queue,_tx_byte_w_out

        move.w _tx_byte_w_in,d0

        cmp.w _tx_byte_w_out,d0

        bne _tx_byte_1

        tst.w _tx_byte_w_full

        bne _tx_byte_5

       

    _tx_byte_1:

   

        moveq #0,d0

        move.w _tx_byte_w_in,d0

        lea 6+_tx_queue,a0

        move.l d0,a1

        move.b _tx_byte_b_txbyte,(0,a1,a0.l)

        move.w _tx_byte_w_in,d0

        addq.w #1,d0

        move.w d0,_tx_byte_w_in

        cmp.w #TX_QUEUE_SIZE-1,_tx_byte_w_in

        bls _tx_byte_2

        clr.w _tx_byte_w_in

       

    _tx_byte_2:

   

        move.w _tx_byte_w_in,d0

        cmp.w _tx_byte_w_out,d0

        bne _tx_byte_3

        moveq #0,d0

        move.w _tx_byte_w_out,d0

        swap d0

        clr.w d0

        moveq #1,d1

        or.l d1,d0

        move.l d0,_tx_byte_l_infull

        lea _tx_queue,a0

        move.l _tx_byte_l_infull,(a0)

        bra _tx_byte_4

   

    _tx_byte_3:

   

        move.w _tx_byte_w_in,_tx_queue

       

    _tx_byte_4:

   

        move.w #QSM_SCCR1,a0

        move.w #QSM_SCCR1_VALUE|QSM_SCCR1_TIE,(a0)

        clr.b d0

        bra _tx_byte_6

       

    _tx_byte_5:

   

        moveq #$FF,d0

       

    _tx_byte_6:

   

        rts

       

    * PATCH ROUTINE FOR INITIALISATION

    *

    * The _patch_initialise subroutine initialises the debug patch. It is called from the

    * main firmware initialisation routine by overwriting a MOVE.B #0,($FFFFFA1F).W

    * instruction with JSR _PATCH_INITIALISE. The _patch_initialise subroutine then

    * executes the relocatable instruction which was overwritten then proceeds to perform

    * its own initialisation tasks.

    *

    * PARAMETERS:

    *   Void.

    * RETURNS:

    *   Void.

   

    _patch_initialise:

 

        OPT NOWAR                               * Execute instruction replaced by patch.

        move.b #0,($FFFFFA1F).w

        OPT WAR

 

        REG_PATCH_INI: REG d0-d1,ao-a1,sr       * Save registers (PATCH: Preserves ALL).

        movem.l REG_PATCH_INI,-(sp)         * Non-scratch registers preserved by subroutines.

       

        move.l #_rx_queue,a1                    * Initialise RX queue.

        bsr _queue_initialise

       

        move.l #_tx_queue,a1                    * Initialise TX queue.

        bsr _queue_initialise

       

        bsr _sci_initialise                 * Initialise QSM/SCI.

       

        move.w #'H',d1                          * Test code.

        bsr _tx_byte

        move.w #'e',d1

        bsr _tx_byte

        move.w #'l',d1

        bsr _tx_byte

        move.w #'l',d1

        bsr _tx_byte

        move.w #'o',d1

        bsr _tx_byte

        move.w #' ',d1

        bsr _tx_byte

        move.w #'W',d1

        bsr _tx_byte

        move.w #'o',d1

        bsr _tx_byte

        move.w #'r',d1

        bsr _tx_byte

        move.w #'l',d1

        bsr _tx_byte

        move.w #'d',d1

        bsr _tx_byte

        move.w #'!',d1

        bsr _tx_byte

       

        movem.l (sp)+,REG_PATCH_INI         * Restore registers.

     

        rts

       

    * APPLY PATCH - DISABLE NATIVE QSM/SCI ACCESS

    *

    * The instructions below are patched into specific locations within the firmware to

    * effectively disable access to the QSM/SCI. RTS instructions are used to replace

    * existing instructions in order to force early return from a subroutine. In most

    * cases the first instruction of the subroutine is replaced, disabling the subroutine

    * entirely.

 

        ORG $118C26 * Subroutines, return with RTS. * Firmware ksr3p007 only!

        rts

        ORG $1116D0

        rts

        ORG $1116D6

        rts

        ORG $116AF6

        rts

        ORG $118C92

        rts

        ORG $132C8A

        rts

       

    * APPLY PATCH - REPLACE QSI/SCM INTERRUPT VECTOR

    *

    * The _sci_interrupt subroutine forms the interrupt vector for the QSM/SCI. This is used

    * to completely replace the interrupt vector subroutine in the firmware by patching the

    * entry in the firmware vector table.

   

        ORG ADDR_BASE_VECTORS+(QSM_QIVR_INTV*4)

        DC.L _sci_interrupt

       

    * APPLY PATCH - INITIALISATION

    *

    * The _patch_initialise subroutine initialises the debug patch. It is called from the

    * main firmware initialisation routine by overwriting a MOVE.B #0,($FFFFFA1F).W

    * instruction with JSR _PATCH_INITIALISE. The _patch_initialise subroutine then

    * executes the relocatable instruction which was overwritten then proceeds to perform

    * its own initialisation tasks.

   

        ORG PATCH_INI_JMP       * Patch address.

        jsr _patch_initialise   * Patch routine.

       

    * END OF CODE  

   

    END START

   

You can see some test code towards the bottom of the listing which writes the message “Hello World!” into the transmit buffer as part of the initialisation routine.

 

So if everything went to plan, the idea was that if I listen to the OBDII port with my MEMS3 Terminal application at 256000 Baud as I powered on the ECU, I should receive the message “Hello World!” and if I did, the basic foundations of my debug logging system were in place and working. Now I’m not pretending this worked first time – far from it. A lot of this issues I mentioned above were found out at this stage one by one, and working out what was happening inside the ECU at this stage was extremely difficult. Once the debugging system was up and running to the point where it could communicate then thigs would get a lot easier, but initially I had no way to pass messages out of the ECU. Tests either worked, or did not work. There was no way of seeing what was happening when they did not work.

 

I figured out that I had a way of sending out 1 bit of information from a test. If I wanted to send True, I locked the ECU into deliberate permanent loop. This led the Software Watchdog Timer Service to detect that the ECU was no longer running normally and reset it. This was detectable externally as one of the first things the ECU does when it boots is to prime the fuel pump, so if it then timed out on the watchdog system you would see the fuel pump output cycling about once per second as the ECU constantly rebooted after a short delay. If I wanted to send False, I screwed up the stack pointer. Clearing the stack pointer to 0 and then executing RTS would lead to an immediately Exception condition which reset the ECU very quickly. The result was a much faster boot loop and the fuel pump output cycled at more like ten times per second. It was easy to see the difference. The problem was that errors in the code could also easily lead to permanent loops or Exception conditions, and I couldn’t distinguish between one of these I’d triggered deliberately and one that was just a code error, so it was very heavy going for a while.

 

Once I had some of the basic routines running to the point where I could try writing to the QSM/SCI data register SCDR, things got a bit easier. It took a while to fathom why I wasn’t initially seeing any data coming out from the OBDII port, but I tracked the issues down to:

 

·        The firmware was initially setting up the module control register QMCR in a way which wasn’t compatible with my code.

·        Some confusion over the way you need to set it up for 11-bit transmissions (1 start bit, 8 data bits, 1 parity bit and 1 stop bit).

·        The fact that although I was only sending a single BYTE out, I had to write a two-byte WORD to the SCDR data register to get anything to happen. I think this is because it supports 9-bit data formats, so it doesn’t consider that data has been written to the SCDR unless all 9 possible bits have been written, which means writing both bytes as a WORD. This certainly isn’t obvious in the documentation!

 

Once I’d got past these issues, I would write a single byte to the SCDR and then correctly receive this byte over the OBDII serial cable. I couldn’t transmit more than one byte at this point as there was no working queue, so no way to stop the second byte just overwriting the operation in process as serial data writes are relatively slow, but at least I could write out a single byte reliably and distinguish successful transmission of a byte from code errors, so this gave me a way to debug the rest of interrupt driven queue code.

 

One way or another though, I eventually got this:

 

 

That’s “Hello World!” in hexadecimal character codes at 256000 Baud!

 

So that meant that so far I had achieved the following:

 

·        Disconnected the firmware from the OBDII port, without upsetting general operation.

·        Taken control of the OBDII port myself and configured it for 256000 Baud operation instead of the usual 9600 Baud.

·        Implemented a full interrupt-driven, queued communications system.

 

Still remaining to be done:

 

·        Write a high-level message-formatting routine that would let me pass the parameters I wanted to log on the stack and which would format them as a sensible message with length and checksum etc. and write them to the buffer.

·        Call this from a real-life patch in say the Table Lookup function to achieve something useful.

 

SENDING MESSAGES

 

I wanted to be able to send structured, formatted debugging messages. These needed to include an arbitrary number of bytes of payload data as I didn’t know at this point what debugging needs I may come up with in the future. My first instinct, with my day-job programmer’s hat on, was to write the equivalent of a function that took an array of bytes or words and sent these to the transmit queue as a formatted message, but this meant either using an arbitrary number of registers to pass the data (there are only a few available, that’s a really bad idea) or that I needed to gather all of the payload data together into some kind of array, probably on the stack, and given the limited amount of stack space I had found to be available this didn’t seem like a good idea either.

 

Instead I decided to write a bunch if separate routines.

 

·        _tx_message_begin - This would initialise the message checksum and send the message header. It would need the message payload size and “Service Identifier” code as parameters (the “Service Identifier” in an OBDII message usually reflects the operation being performed by the message; in this case I retained it in my message format as an identifier for different debug messages coming from patches in different places in the code, so I could distinguish between the different messages and debug points in the data received at the PC end).

·        _tx_message_byte - This would send a single byte of payload data and update the message checksum.

·        _tx_message_word - This would send a single word of payload data (two bytes, implemented as two calls to _tx_message_byte) and update the message checksum.

·        _tx_message_long - This would send a single long word of payload data (two words or four bytes, implemented as two calls to _tx_message_word) and update the message checksum.

·        _tx_message_end - This would send the message checksum to complete the message.

 

In this way there was no need to gather together all of the message content in one place at one time, I could generate each part, send it and discard it, which was a lot more RAM-friendly for a small embedded system.

 

I added an extra RAM variable to keep a track of the message checksum:

 

    * RAM VARIABLES - LOCAL VARIABLES - WRITE MESSAGE TO TX QUEUE

    *

    * The labels below identify the locations used for local variables for the

    * _tx_message_begin, _tx_message_byte, _tx_message_word, _tx_message_long and

    * _tx_message_end subroutines. Local variable use fixed locations in preference to

    * being allocated on the stack as the stack is small and these subroutines are not

    * re-entrant.

   

    _tx_message_b_checksum: DS.W 1

 

I implemented the message functions as shown below:

 

    * WRITE MESSAGE TO TX QUEUE - BEGIN

    *

    * The _tx_message_begin subroutine is called to begin sending a properly formatted

    * message to the transmit queue. It initialises the checksum byte and sends the format

    * byte, service identifier and length bytes.

    *

    * PARAMETERS:

    *   Payload byte count (Byte as Word) in d1.

    *   Service identifier (Byte as Word) in d2.

    * RETURNS:

    *   Byte ($00 Success, $FF Failure) in d0.

   

    FMT_FUNCTIONAL_DEBUG: EQU $DB

   

    _tx_message_begin:

   

        clr.b (_tx_message_b_checksum)      * Initialise checksum byte.

   

        move.w d1,-(sp)                 * Write format byte.

        move.w #FMT_FUNCTIONAL_DEBUG,d1

        eor.b d1,(_tx_message_b_checksum)

        bsr _tx_byte

        move.w (sp)+,d1

 

        tst.b d0                            * Check for overrun.

        bne _tx_message_begin_overrun

       

        addi.b #1,d1                        * Write message length byte.      

        eor.b d1,(_tx_message_b_checksum)

        bsr _tx_byte

 

        tst.b d0                            * Check for overrun.

        bne _tx_message_begin_overrun

       

        move.b d2,d1                        * Write service identifier byte.

        eor.b d1,(_tx_message_b_checksum)

        bsr _tx_byte

       

    _tx_message_begin_overrun:

   

        rts

 

    * WRITE MESSAGE TO TX QUEUE - BYTE

    *

    * The _tx_message_byte subroutine is called to send a message payload byte to the

    * transmit queue. It updates the checksum byte and sends the payload data byte passed.

    *

    * PARAMETERS:

    *   Payload data (Byte as Word) in d1.

    * RETURNS:

    *   Byte ($00 Success, $FF Failure) in d0.

   

    _tx_message_byte:

   

        eor.b d1,(_tx_message_b_checksum)   * Write payload byte.

        bsr _tx_byte

 

        rts

       

    * WRITE MESSAGE TO TX QUEUE - WORD

    *

    * The _tx_message_word subroutine is called to send a message payload word to the

    * transmit queue. It updates the checksum byte and sends the payload word passed (by

    * making two calls to _tx_message_byte).

    *

    * PARAMETERS:

    *   Payload data (Word) in d1.

    * RETURNS:

    *   Byte ($00 Success, $FF Failure) in d0.

   

    _tx_message_word:

   

        move.w d1,-(sp)             * Write payload word high byte.

        asr.w #8,d1

        bsr _tx_message_byte

        move.w (sp)+,d1

       

        tst.b d0                        * Check for overrun.

        bne _tx_message_word_overrun

       

        bsr _tx_message_byte            * Write payload word low byte.

       

    _tx_message_word_overrun:

 

        rts

       

    * WRITE MESSAGE TO TX QUEUE - LONG

    *

    * The _tx_message_long subroutine is called to send a message payload long to the

    * transmit queue. It updates the checksum byte and sends the payload long passed (by

    * making two calls to _tx_message_word).

    *

    * PARAMETERS:

    *   Payload data (Long) in d1.

    * RETURNS:

    *   Byte ($00 Success, $FF Failure) in d0.

   

    _tx_message_long:

   

        move.l d1,-(sp)             * Write payload long high word.

        swap d1

        bsr _tx_message_word

        move.l (sp)+,d1

       

        tst.b d0                        * Check for overrun.

        bne _tx_message_word_overrun

       

        bsr _tx_message_word            * Write payload word low byte.

       

    _tx_message_long_overrun:

 

        rts

       

    * WRITE MESSAGE TO TX QUEUE - END

    *

    * The _tx_message_end subroutine is called to end sending a properly formatted

    * message to the transmit queue. It sends the checksum byte.

    *

    * PARAMETERS:

    *   Void.

    * RETURNS:

    *   Byte ($00 Success, $FF Failure) in d0.

 

    _tx_message_end:

   

        move.b (_tx_message_b_checksum),d1

        bsr _tx_byte

   

        rts

 

I then did some work on my MEMS3 Terminal program in order to allow it to receive and decode the messages sent. It now recognises the debug message format with a $DB format byte and it also recognises “known” service identifiers. I decided to dedicate service identifiers from $80 onwards to “known” debug messages that the terminal should be able to decode (for example, $80 is now “Table Lookup” and it knows how to decode the payload bytes into Table Index, X, Y and Z values, and can recognise the difference between a 2D table and a 3D table based on the number of bytes in the payload as a 2D table omits the Y value). Service identifiers $00 to $7F are then free for just sending un-decoded debugging information for whatever debugging ideas I’m working on at the time.

 

PATCHING TABLE LOOKUP

 

The final piece of this jigsaw was to patch into the ECU code which looks up value in tables in order to use the code developed above to send some meaningful messages telling me what the ECU is doing internally.

 

My first idea on this was to patch in jumps to small sections of my own code in the same way as I did for the initialisation routine, but when I tried to implement it, it all got rather messy. I really needed to patch into the table lookup routine in three different places:

 

·        At the start of the routine, to record the table index and the X and Y value parameters passed in, as these were in registers which were re-used later in the routine so the original values got lost.

·        Part way through the routine at the point where it had decoded whether the table was 2D or 3D, to avoid my having to repeat the same logic in my own code.

·        At the end of the routine, to record the looked up Z value returned.

 

At each of these patch points I would need to preserve a lot of register values to avoid upsetting the existing code, and the stack was largely full when this routine was called.

 

In the end it just seemed a lot easier and cleaner to copy the whole table lookup subroutine from the firmware into my code, and then simply patch the first instruction of the native routine to jump permanently into mine, where I did the whole job and never returned to the native table lookup. This allowed me to insert code wherever I needed to within the routine in a much cleaner way. The table lookup routine has been identical in every different firmware version I’ve looked at and is pretty much identical even in the TD5 diesel ECUs, so the routine I copied into my debug patch should work fine when used with other firmware versions.

 

I had been using IDA to initially disassemble the firmware, so I copied the disassembled code for this routine into my own assembly language listing. There were a couple of small formatting changes needed for compatibility. However there was a bigger problem. There were four instructions used in the routine that I just could not get the EASy68K assembler to accept. These were fairly complex instructions used for looking things up in tables and arrays such as move.w (2,a0,d4.w*2),d1 (which takes data register d4, multiplies the value by 2, adds this to an address register a0, adds a further offset of 2 then uses this as an address to index a word value in memory which it then moves into data register d1, all in once single machine code instruction – you can imagine a0 pointing to a table of word values, d4 holds the index of the word in the array, so you need to add two bytes for each word to the table address to get the address of the data you are looking up). There are some simpler forms of this instruction such as move.w (2,a0),d1 which omits the data register and I knew that different assemblers used different syntax to represent this instruction, e.g. some would write it as move.w 2(a0),d1. But I tried every combination I could think of and none of them were accepted.

 

Then I realised what the problem was. I was using EASy68K, which was an assembler for the 68000 microprocessor. I had kind of thought of this as being the 68000 family, bit it was actually specific to the original Motorola MC68000 chip only. The Motorola MC68336 microcontroller used in the MEMS3 ECU had a later CPU32 core, derived from the MC68000 but with some new instructions and some extra addressing modes. These more complex instructions were only available on the CPU32 core and not on the MC68000, so were not recognised by the 68000 assembler. If I was doing serious MC68336 development I guess I would want to find a good CPU32 assembler that fully supported it, but having come this far with the free EASy68K and with it only being 4 instructions I decided to just work around it and copy these instructions into the listing directly as binary opcodes (so just doing the assembler’s job for it for these few cases):

 

    _patch_table_lookup_9:

 

        add.w (a0),d7

        DC.W $3230,$4202 move.w (2,a0,d4.w*2),d1    * CPU32 not 68K.

        DC.W $9270,$4200 sub.w (a0,d4.w*2),d1       * CPU32 not 68K.

        beq.s _patch_table_lookup_10

        muls.w d6,d1

        divs.w d5,d1

 

The code above shows two of the newer instructions, just defined as binary opcodes (which I simply copied from the binary dump of the original firmware routine) with the assembly language adjacent to them as comments for clarity.

 

It quickly became clear that even at 256000 Baud, I was going to struggle to log information about every table access in full. It seems that the main execution loop of the firmware runs about 75 times per second, and through all of the subroutines it calls it probably makes a couple of hundred table lookups in each pass, so that’s 15000 lookups per second. So to get around this I implemented two mechanisms, a table filter and some different message formats. The table filter is simply an array of binary bits, one for each table index 0 – 255. IF the bit is set to 1, it logs information for that table, if the bit is set to 0 it doesn’t:

 

    * TABLE LOOKUP FILTER

    *

    * The TABLE_LOOKUP_FILTER table contains one bit for each table index in the range 0 -

    * 255. Debug messages are only sent when the ECU performs a lookup on a table if the bit

    * corresponding to the table index is set to 1 here.

   

    TABLE_LOOKUP_FILTER:

   

        *     01234567   89012345

        DC.B %00001000, %00000000   * Tables 0 - 15.

        DC.B %00000000, %00000000   * Tables 16 - 31.

        DC.B %00000000, %00000000   * Tables 32 - 47.

        DC.B %00000000, %00000000   * Tables 48 - 63.

        DC.B %00000000, %00000000   * Tables 64 - 79.

        DC.B %00000000, %00000000   * Tables 80 - 95.

        DC.B %00000000, %00000000   * Tables 96 - 111.

        DC.B %00010000, %00000000   * Tables 112 - 127.

        DC.B %00000000, %00000000   * Tables 128 - 143.

        DC.B %00000000, %00000000   * Tables 144 - 159.

        DC.B %00000000, %00000000   * Tables 160 - 175.

        DC.B %00000000, %00000000   * Tables 176 - 191.

        DC.B %00000000, %00000000   * Tables 192 - 207.

        DC.B %00000000, %00000000   * Tables 208 - 223.

        DC.B %00000000, %00000000   * Tables 224 - 239.

        DC.B %00000000, %00000000   * Tables 240 - 255.

       

In the example shown, I’m only logging information for Table 4 (coolant temperature sensor calibration) and Table 115 (Ignition Timing).

 

I allowed for three different levels of logging, using three different formats:

 

    * TABLE LOOKUP FORMAT

    *

    * When TABLE_LOOKUP_FORMAT is defined as TABLE_LOOKUP_BYTES, the debug message will

    * consist of a single byte containing the table index. When TABLE_LOOKUP_FORMAT is

    * defined as TABLE_LOOKUP_INDEX the debug message will be a properly formatted message

    * containing the table index only. When TABLE_LOOKUP_FORMAT is defines as

    * TABLE_LOOKUP_PARAM the debug message will also contain the X and Y values looked up

    * and the Z value returned.

   

    TABLE_LOOKUP_BYTES:     EQU 0

    TABLE_LOOKUP_INDEX:     EQU 1

    TABLE_LOOKUP_PARAM:     EQU 2  

    TABLE_LOOKUP_FORMAT:    EQU TABLE_LOOKUP_PARAM

   

·        BYTES – This just logs one byte per lookup, containing the table number. This will allow me to very quickly scan all of tables and identify which tables are being used under different conditions in real time. There’s no formatting, just a stream of table number bytes.

·        INDEX – This logs a short message per lookup, containing just the table number but in proper message format. This will have a much lower capacity but will allow me to mix the data with other logging information, e.g. current coolant temperature, and separate it out at the PC end.

·        PARAM – This logs a long message per lookup, containing the table number, X and Y values looked up and the interpolated Z value returned. This will allow me to look in detail at particular table usage.

 

This is the modified table lookup function in full:

 

    * PATCH ROUTINE FOR TABLE LOOKUP

    *

    * The _patch_table_lookup subroutine sends a debug message whenever the ECU performs

    * a table lookup. It is a complete copy of the main firmware table lookup routine which

    * is patched by overwriting the first instructions with JMP _PATCH_TABLE_LOOKUP.

    * Additional code is then inserted to log the parameters passed and result returned in

    * a debug message.

    *

    * PARAMETERS:

    *   Table index offset (Word) in d1.

    *   Table X value (Word) in d2.

    *   Table Y value (Word) in d3.

    *   Map address (Long) in a5.

    * RETURNS:

    *   Table Z value (Word) in d1.

    *   Byte ($00 Success, $FF Failure) in d0.

   

    _patch_table_lookup:

 

        move.w d1,(_patch_table_lookup_w_offset)        * Store table offset.

 

        IFEQ TABLE_LOOKUP_FORMAT-TABLE_LOOKUP_PARAM

       

            move.w d2,(_patch_table_lookup_w_x_value)   * Store X value.

            move.w d3,(_patch_table_lookup_w_y_value)   * Store Y value.

           

        ENDC

 

        movea.w (a5,d1.w),a0

        adda.l a5,a0

        move.w (a0)+,d4

        ble.w _patch_table_lookup_17

        move.w (a0)+,d5

        ble.w _patch_table_lookup_17

       

        IFEQ TABLE_LOOKUP_FORMAT-TABLE_LOOKUP_PARAM

       

            move.w d5,(_patch_table_lookup_w_y_count)   * Store Y value.

       

        ENDC