In computer science, self-modifying code is code that alters its own instructions while it is executing. Getting code that overwrites itself can seem dubious, rather risky and hence unrecommended. However in some cases, it is a technique that can be used to significantly improve the performance of a piece of code or to achieve a behaviour from a program that would be more complex to implement, requiring significantly more code to achieve the same output.
Self-modifying code is fairly straight forward to achieve with a Low-Level language when using a Von Neumann computer architecture where both data and instructions are stored in the same memory.
You may even already have inadvertently created your own low-level code (e.g. using LMC) where your program was over-riding itself (by mistake more than by design) when storing data in a memory cell already used to store an instruction of your program.
For instance, let’s consider the following program written using LMC:
- INP
- STA 3
- HLT
You can try this code using the online LMC simulator:
LMC SimulatorOpen in New Window
If you load this code into your RAM using the LMC simulator, it will use the first 3 memory locations (a.k.a mailboxes 00, 01 & 02) of the RAM to store the 3 instructions of this program:
As you execute this program, the user input would be stored in the fourth memory location of the RAM (mailbox 03). E.g. This because the LMC simulator uses direct addressing where the operand of an instruction is the memory location of where the data is to be loaded from or, for an STA instruction, where the data will be stored. For instance, with our code, if the user inputs the value 999:
As you can see, when using the LMC simulator, both instructions and program data are stored in the same memory using the same format. This is one of the key characteristics of a Von Neumann computer architecture that the LMC simulator is emulating.
Now let’s consider our initial program and add some extra lines of code, for instance to ask for a second user input and to store the second input value next to the first input value. It would be tempting to write the following code:
- INP
- STA 3
- INP
- STA 4
- HLT
However this code will not work as per our initial intention. Effectively while executing this code, the first STA instruction (STA 3) of this code would save program data (in this case the value typed in by the user) over an existing instruction of the program. This is a typical example of self-modifying code that would impact on the expected execution of the program. e.g. You can see what the code in memory would look like if the user entered 999 and 998 as input values:
A quick solution would be to store both input values in different memory locations, not already used by the program: e.g.
- INP
- STA 8
- INP
- STA 9
- HLT
However this is not the recommended solution. Effectively, there is no guarantee that when this program is executed, memory locations 8 and 9 are not already being used by another program.
The recommended solution is to use labels in your code. When the code is loaded in memory (assembled), the labels are automatically replaced with memory locations of unused memory cells.
e.g.
INP STA num1 INP STA num2 HLT num1 DAT num2 DAT
This code uses two labels, num1 and num2.
Self-Modifying Code Example
As explained before, on occasions we can decide to create self-modifying code on purpose as demonstrated in the following example.
The aim of this code is to fill in the RAM with the value 999, from memory location 10 onwards. (e.g. Note that this code would break after reaching the maximum RAM capacity: memory location 99).
Here is the code of our “RAM filler” program. Can you see how we have used self-modifying code to complete this “RAM Filler” program?
start LDA line ADD one STA line LDA data line STA 10 BRA start HLT data DAT 999 one DAT 1
Test this code online:
LMC SimulatorOpen in New Window
Trying to achieve this “memory filling” behaviour without using self-modifying code would be a lot more complex to implement in LMC. You can see how it could be achieved using an alternative approach relying on using indirect addressing by completing this LMC challenge.