<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://www.temlib.org/AtariForumWiki/index.php?action=history&amp;feed=atom&amp;title=Earxtutchap4</id>
	<title>Earxtutchap4 - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://www.temlib.org/AtariForumWiki/index.php?action=history&amp;feed=atom&amp;title=Earxtutchap4"/>
	<link rel="alternate" type="text/html" href="https://www.temlib.org/AtariForumWiki/index.php?title=Earxtutchap4&amp;action=history"/>
	<updated>2026-05-13T17:42:15Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.39.2</generator>
	<entry>
		<id>https://www.temlib.org/AtariForumWiki/index.php?title=Earxtutchap4&amp;diff=13018&amp;oldid=prev</id>
		<title>&gt;Wongck at 15:30, 12 October 2011</title>
		<link rel="alternate" type="text/html" href="https://www.temlib.org/AtariForumWiki/index.php?title=Earxtutchap4&amp;diff=13018&amp;oldid=prev"/>
		<updated>2011-10-12T15:30:21Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 11:30, 12 October 2011&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l464&quot;&gt;Line 464:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 464:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Back to [[ASM_Tutorial]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Back to [[ASM_Tutorial]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Programming&lt;/del&gt;]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Making optimized assembly code by Earx&lt;/ins&gt;]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>&gt;Wongck</name></author>
	</entry>
	<entry>
		<id>https://www.temlib.org/AtariForumWiki/index.php?title=Earxtutchap4&amp;diff=13017&amp;oldid=prev</id>
		<title>&gt;Silver Surfer: Added category</title>
		<link rel="alternate" type="text/html" href="https://www.temlib.org/AtariForumWiki/index.php?title=Earxtutchap4&amp;diff=13017&amp;oldid=prev"/>
		<updated>2009-05-02T17:10:57Z</updated>

		<summary type="html">&lt;p&gt;Added category&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 13:10, 2 May 2009&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l464&quot;&gt;Line 464:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 464:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Back to [[ASM_Tutorial]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Back to [[ASM_Tutorial]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;[[Category:Programming]]&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>&gt;Silver Surfer</name></author>
	</entry>
	<entry>
		<id>https://www.temlib.org/AtariForumWiki/index.php?title=Earxtutchap4&amp;diff=13016&amp;oldid=prev</id>
		<title>&gt;Zorro 2 at 08:07, 9 October 2006</title>
		<link rel="alternate" type="text/html" href="https://www.temlib.org/AtariForumWiki/index.php?title=Earxtutchap4&amp;diff=13016&amp;oldid=prev"/>
		<updated>2006-10-09T08:07:58Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 04:07, 9 October 2006&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l462&quot;&gt;Line 462:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 462:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;                        maybe to large to fit into the 68K cache.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;                        maybe to large to fit into the 68K cache.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Back to [[ASM_Tutorial]]&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>&gt;Zorro 2</name></author>
	</entry>
	<entry>
		<id>https://www.temlib.org/AtariForumWiki/index.php?title=Earxtutchap4&amp;diff=13015&amp;oldid=prev</id>
		<title>&gt;Simonsunnyboy at 17:47, 6 October 2006</title>
		<link rel="alternate" type="text/html" href="https://www.temlib.org/AtariForumWiki/index.php?title=Earxtutchap4&amp;diff=13015&amp;oldid=prev"/>
		<updated>2006-10-06T17:47:09Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;&amp;lt;pre&amp;gt;&lt;br /&gt;
                       .------&amp;lt;------&amp;lt;------&amp;lt;-----.&lt;br /&gt;
                       v CHAPTER 4 : MAKING LOOPS ^&lt;br /&gt;
                       '------&amp;gt;------&amp;gt;------&amp;gt;-----'&lt;br /&gt;
&lt;br /&gt;
This chapter really is different in many ways to the last two. It is not&lt;br /&gt;
aimed at getting sound, music or interaction directly, but it shows you the&lt;br /&gt;
basics on how to make a fast and effecient loop for all of your routines.&lt;br /&gt;
You want to plot someting like 80 pixels on your Falcon true color screen.&lt;br /&gt;
You know that every pixel is one word(16-bits). You can do either this:&lt;br /&gt;
repeat the command that moves a word 80 times in your code (this is called&lt;br /&gt;
&amp;quot;unrolling&amp;quot; or &amp;quot;hardcoding&amp;quot;), OR this: You can use a loop...&lt;br /&gt;
&lt;br /&gt;
Let's take a look at falcon-pixel plotting routines:&lt;br /&gt;
&lt;br /&gt;
        moveq   #0,d0                   * Prepare d0 for being a counter.&lt;br /&gt;
loop:   move.w  #$ffff,(a0)+            * Do one pixel and move to the next.&lt;br /&gt;
        addq.w  #1,d0                   * Increase the counter.&lt;br /&gt;
        cmpi.w  #80,d0                  * If the counter isn't 80 &amp;gt; again.&lt;br /&gt;
        bne.s   loop&lt;br /&gt;
&lt;br /&gt;
As you can see every loop-structure consists of an initialising part,&lt;br /&gt;
a processing part and a part to do loop-household things.&lt;br /&gt;
Well, this is reasonable. At least it is smaller than repeating the command&lt;br /&gt;
every time in your code.. But it's slower. And if it's one thing we don't&lt;br /&gt;
want in assembly it's slow code!&lt;br /&gt;
&lt;br /&gt;
The first step to faster code is:&lt;br /&gt;
&lt;br /&gt;
        moveq   #80-1,d7                * initialize counter&lt;br /&gt;
loop:   move.w  #$ffff,(a0)+            * do one pixel and move to the next&lt;br /&gt;
        dbra    d7,loop                 * Subtract 1 from d7.w and loop&lt;br /&gt;
* until d7=-1&lt;br /&gt;
&lt;br /&gt;
There you have it. If you're not using the counter for other purposes, you&lt;br /&gt;
can just as well use a dbra loop. It's simply much faster!&lt;br /&gt;
There are many ways to get this loop even faster, but you'll read more about&lt;br /&gt;
that in the next chapter.&lt;br /&gt;
Nested loops are a bit more complicated then this and I can hear you asking&lt;br /&gt;
what 'nested' actualy means. A 'nested' loop means 'a loop in a loop'! Wow!&lt;br /&gt;
That sounds GrOoVy! Like true industrial, man!!&lt;br /&gt;
A good example of a nested loop would be something like clearing the left&lt;br /&gt;
half of the screen. Again this situation goes for Falcon true color, but the&lt;br /&gt;
same can easily be adapted to the ST-low resolution.&lt;br /&gt;
&lt;br /&gt;
        move.l  #screenaddress,a0       * Screenaddress in a0.&lt;br /&gt;
        move.w  #200-1,d7               * Initialize for bigloop.&lt;br /&gt;
bigloop:&lt;br /&gt;
        move.w  #160-1,d6               * Initialize for loop.&lt;br /&gt;
loop:   clr.w   (a0)+                   * Clear one pixel and move to the next.&lt;br /&gt;
        dbra    d6,loop                 * Loop 160 times.&lt;br /&gt;
        adda.l  #160*2,a0               * Move to next line.&lt;br /&gt;
        dbra    d7,bigloop              * Loop 200 times.&lt;br /&gt;
&lt;br /&gt;
Note that the screen is 320 pixels wide so the half is 160 pixels wide. When&lt;br /&gt;
you've cleared those 160 pixels you need to adjust a0 by adding the length&lt;br /&gt;
in bytes of the 160 pixels. This brings you to the beginning of the next&lt;br /&gt;
line.&lt;br /&gt;
As you can see d7 is now reserved for 'bigloop' and d6 is reserved for&lt;br /&gt;
'loop'. This automatically means you can never have more than 7 nested loops&lt;br /&gt;
because you only have 8 data registers. It's ofcourse possible to backup&lt;br /&gt;
the registers and restore them again, but the more memory accesses.. the&lt;br /&gt;
slower it will get.&lt;br /&gt;
&lt;br /&gt;
Ofcourse the power of loops isn't only repeating the same operations over&lt;br /&gt;
and over again without using up much space. They can also be used to make&lt;br /&gt;
code more flexible. A flexible loop can for instance allow copying/drawing&lt;br /&gt;
differently sized blocks or maybe show a starfield from whatever angle you&lt;br /&gt;
want simply by putting some different values and addresses into registers&lt;br /&gt;
before running them.&lt;br /&gt;
&lt;br /&gt;
We're now gonna go up a level to see how you could construct very big loops.&lt;br /&gt;
If you have a game you need to rebuild your paying-screen every so often.&lt;br /&gt;
For this you need a really complicated loopstructure. If you checked out my&lt;br /&gt;
last book, you might know that I explained something about game-loops. I'm&lt;br /&gt;
gonna do more or less the same, but now in assembler.&lt;br /&gt;
Here's the situation:&lt;br /&gt;
 * We want our background refreshed everytime.&lt;br /&gt;
 * We want our enemy sprites moved accordingly to their programming and they&lt;br /&gt;
   can react to the main-player too.&lt;br /&gt;
 * We want the sprites displayed with animation and some FX like explosions&lt;br /&gt;
   too.&lt;br /&gt;
 * We want our main sprite drawn in the same way.&lt;br /&gt;
 * We want a nice panel at the side of the screen that shows realtime&lt;br /&gt;
   statistics.&lt;br /&gt;
 * We want to check for joystick input and read it and check if the spacebar&lt;br /&gt;
   is pressed (spacebar exits the game)&lt;br /&gt;
&lt;br /&gt;
The loop will look somthing like this:&lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
        bsr     HANDLE_INPUT            * Read stick+keys and update variables.&lt;br /&gt;
&lt;br /&gt;
        bsr     HANDLE_BACKGROUND       * Initialize position+masks+animation.&lt;br /&gt;
        bsr     HANDLE_MAINSPRITE       * Handle collision+masks+speed+weapons.&lt;br /&gt;
        bsr     HANDLE_ENEMIES          * Do same for all the enemies.&lt;br /&gt;
&lt;br /&gt;
        bsr     PLOT_BACKGROUND         * Put the background in screenbuffer.&lt;br /&gt;
&lt;br /&gt;
        bsr     PLOT_MAINSPRITE         * Put the main sprite in screenbuffer.&lt;br /&gt;
        bsr     PLOT_ENEMIES            * Put the enemies in screenbuffer.&lt;br /&gt;
&lt;br /&gt;
        bsr     WAIT_VBL                * Wait for the VBL (no flicker).&lt;br /&gt;
        bsr     SWAP_SCREENS            * Swap screens (no flicker).&lt;br /&gt;
&lt;br /&gt;
        tst.b   spacepress              * Test for space.&lt;br /&gt;
        beq.s   mainloop                * if no spacepress &amp;gt; again&lt;br /&gt;
&lt;br /&gt;
As you can see you divide all the hard work into subroutines. The&lt;br /&gt;
subroutines theirselves aren't in here, because it's irrelevant and it would&lt;br /&gt;
be far too much work.&lt;br /&gt;
About the order of subroutines.. The checking for input from hardware&lt;br /&gt;
decives MUST always be seperated in big loops. If you don't you're bound to&lt;br /&gt;
get crashes all over the place! If you simply let your program read joystick&lt;br /&gt;
input everywhere, you mess the code up so hard that even yourself won't know&lt;br /&gt;
what you did. Also note that you musn't put the call to HANDLE_INPUT&lt;br /&gt;
inbetween other routine-calls. If you do that you'll mess up the position&lt;br /&gt;
of your main-sprite.&lt;br /&gt;
The handling routines come in second. You should also keep these together.&lt;br /&gt;
They only calculate the frames, position and what FX are necesary. After&lt;br /&gt;
that, the plotting routines do the rest.&lt;br /&gt;
Then there is some waiting for the VBL. You should now what that is from&lt;br /&gt;
chapter 2. Also the screenbuffers are swapped, which also has to do with&lt;br /&gt;
flickerless animation. I'll explain that later on. It's absolutely&lt;br /&gt;
essential that you keep these together in the right order and do them&lt;br /&gt;
AFTER THE PLOTTING!&lt;br /&gt;
Finally a byte is tested to see if the looping can continue or not. This&lt;br /&gt;
byte is updated by the HANDLE_INPUT routine.&lt;br /&gt;
BTW for those of you that don't know how subroutines work I'd like to&lt;br /&gt;
explain it. It's quite important for the understanding of structures.&lt;br /&gt;
OK, when you use a bsr or 'branch to subroutine' the 680x0 remembers the&lt;br /&gt;
position of the following instruction. Then it jumps to a label and executes&lt;br /&gt;
all the instructions untill it reaches a 'rts' (return from subroutine).&lt;br /&gt;
Then it jumps back to the saved location. Just look at the picture:&lt;br /&gt;
&lt;br /&gt;
Step 1:                   |Step 2:&lt;br /&gt;
=/\====-------------------+=/\====------------------&lt;br /&gt;
        .......           |         .......&lt;br /&gt;
        ....              |         ....&lt;br /&gt;
        move.w  #1,d0     |         move.w  #1,d0&lt;br /&gt;
-&amp;gt;      bsr     routine   |         bsr     routine&lt;br /&gt;
        rol.l   #1,d0     |         rol.l   #1,d0&lt;br /&gt;
        .....             |         .....&lt;br /&gt;
        ...               |         ...&lt;br /&gt;
.                         |  .&lt;br /&gt;
.                         |  .&lt;br /&gt;
.                         |  .&lt;br /&gt;
routine:                  |  routine:&lt;br /&gt;
        .....             |  -&amp;gt;      .....&lt;br /&gt;
        ...               |          ...&lt;br /&gt;
        .....             |          .....&lt;br /&gt;
        rts               |          rts&lt;br /&gt;
&lt;br /&gt;
Step 3:                   |Step 4:&lt;br /&gt;
=/\====-------------------+=/\====------------------&lt;br /&gt;
        .......           |         .......&lt;br /&gt;
        ....              |         ....&lt;br /&gt;
        move.w  #1,d0     |         move.w  #1,d0&lt;br /&gt;
        bsr     routine   |         bsr     routine&lt;br /&gt;
        rol.l   #1,d0     |  -&amp;gt;     rol.l   #1,d0&lt;br /&gt;
        .....             |         .....&lt;br /&gt;
        ...               |         ...&lt;br /&gt;
.                         |  .&lt;br /&gt;
.                         |  .&lt;br /&gt;
.                         |  .&lt;br /&gt;
routine:                  |  routine:&lt;br /&gt;
        .....             |          .....&lt;br /&gt;
        ...               |          ...&lt;br /&gt;
        .....             |          .....&lt;br /&gt;
-&amp;gt;      rts               |          rts&lt;br /&gt;
&lt;br /&gt;
(Yeh! Now I'm real proud of myself that I made cool ASCII art again!)&lt;br /&gt;
&lt;br /&gt;
The little arrow is the current postion where the 680x0 is executing the&lt;br /&gt;
instructions.&lt;br /&gt;
&lt;br /&gt;
Using this bsr/rts combination is very common in most programs and it is&lt;br /&gt;
damn handy. It has the following advantages:&lt;br /&gt;
* Allows repetitive use of same piece of code without having to copy it.&lt;br /&gt;
* Allows the code to be called from different positions in the code.&lt;br /&gt;
* Makes loops more readable.&lt;br /&gt;
Ofcourse using this technique is only handy in places where not much speed&lt;br /&gt;
is required. In the 'mainloop'-example this is the case! But beware when&lt;br /&gt;
calling from within the innermost nested loops!&lt;br /&gt;
&lt;br /&gt;
A situation like the following will drasticly decrease the execution speed&lt;br /&gt;
of a loop structure:&lt;br /&gt;
&lt;br /&gt;
        movea.l screen_address,a0       * Set a0 to the first pixel on screen.&lt;br /&gt;
        move.w  #200-1,d7               * Prepare for 200 outer loops.&lt;br /&gt;
yloop:  bsr     DRAW_LINE               * Call routine to draw a screenline.&lt;br /&gt;
        adda.w  #160,a0                 * Set a0 to next screenline.&lt;br /&gt;
        dbra    d7,yloop&lt;br /&gt;
&lt;br /&gt;
* INPUT: a0: startaddress of screenline&lt;br /&gt;
DRAW_LINE:&lt;br /&gt;
        move.w  #320-1,d7               * Prepare to loop 320 times.&lt;br /&gt;
xloop:  bsr     PLOT_PIXEL&lt;br /&gt;
* Go to next pixel here..&lt;br /&gt;
        dbra    d7,xloop&lt;br /&gt;
        rts&lt;br /&gt;
&lt;br /&gt;
* INPUT: a0: address of actual pixel&lt;br /&gt;
PLOT_PIXEL:&lt;br /&gt;
* Code goes in here..&lt;br /&gt;
        rts&lt;br /&gt;
&lt;br /&gt;
This piece of code is well readable, but sadly it lacks speed. A bsr/rts&lt;br /&gt;
combination everytime a pixel is drawn is a bad idea. It causes enormous&lt;br /&gt;
overhead. So only use them in outer loops. This makes the global structure&lt;br /&gt;
of the program look somewhat better and easier to modify at high level.&lt;br /&gt;
The innerloops should best be kept without bsr/rts, saving/restoring of&lt;br /&gt;
registers (d0 to a6) and other costly instructions.&lt;br /&gt;
&lt;br /&gt;
So, when you start coding on loopstructures you always come across the well-&lt;br /&gt;
known two tradeoffs: speed and readability/adaptability. Coder's opinions on&lt;br /&gt;
those two differ most of the time. Some people like their code completely&lt;br /&gt;
readable, some optimise every byte, some make a mix of the two.&lt;br /&gt;
&lt;br /&gt;
Whether you do or don't make everything optimised, you should always consider&lt;br /&gt;
this way of working with optimisation in loops.&lt;br /&gt;
1) First of all, lay out your loop-structure from the highest level. Get your&lt;br /&gt;
   code running and please don't think about speed yet.&lt;br /&gt;
2) Check out where the bottleneck in the loop is. This is mostly (always :))&lt;br /&gt;
   the loop nested deepest.&lt;br /&gt;
3) Then only optimise these innerloops. Remove unneeded branches/subroutine&lt;br /&gt;
   jumps, replace costly instructions with cheaper ones (or combinations of&lt;br /&gt;
   cheaper ones), reduce flexibility by using simpler logic and instructions,&lt;br /&gt;
   etc, etc.&lt;br /&gt;
4) If you're a perfectionist you can also optimise other pieces of code besides&lt;br /&gt;
   the most inner loop. This often is the step where the code becomes&lt;br /&gt;
   unreadable and it's wise to only do this when you want to release your&lt;br /&gt;
   final product (i.e. game, program or demo).&lt;br /&gt;
&lt;br /&gt;
Using this method you keep a good overall view AND get the speed where it is&lt;br /&gt;
needed most.&lt;br /&gt;
&lt;br /&gt;
Let's conclude this chapter with a practical example. A loop that reads a table&lt;br /&gt;
with sprite-positions and plots sprites on screen accordingly. To begin with we&lt;br /&gt;
setup our initial sluggish, but readable code. Please note that the sprite-&lt;br /&gt;
routine is for Falcon highcolor, just to keep it simple.&lt;br /&gt;
&lt;br /&gt;
*==============================================================================&lt;br /&gt;
* :STep        _/I\_ Laying out the STructure:&lt;br /&gt;
&lt;br /&gt;
******** CODE MEMORY SECTION ********&lt;br /&gt;
&lt;br /&gt;
        TEXT&lt;br /&gt;
&lt;br /&gt;
* Routine that draws all sprites in the spritetable.&lt;br /&gt;
DRAW_SPRITETABLE:&lt;br /&gt;
        lea     sprite_table,a0                 * Get the spritetable.&lt;br /&gt;
        move.w  number_of_sprites,d0            * Get the number of sprites.&lt;br /&gt;
        moveq   #0,d1                           * Initialize loopcounter.&lt;br /&gt;
&lt;br /&gt;
draw_sprite_loop:&lt;br /&gt;
        movem.l d0-d1/a0,-(sp)                  * Save used registers.&lt;br /&gt;
&lt;br /&gt;
        move.w  (a0)+,d0                        * Get X center of sprite.&lt;br /&gt;
        move.w  (a0)+,d1                        * Get Y center of sprite.&lt;br /&gt;
        bsr.s   DRAW_SPRITE                     * Jump to the spriteroutine.&lt;br /&gt;
&lt;br /&gt;
        movem.l (sp)+,d0-d1/a0                  * Restore used registers.&lt;br /&gt;
        addq    #4,a0                           * Goto next sprite.&lt;br /&gt;
        addq.w  #1,d1                           * Increase the loopcounter.&lt;br /&gt;
        cmp.w   d0,d1                           * If not all sprites are done:&lt;br /&gt;
        bne.s   draw_sprite_loop                * then loop once again.&lt;br /&gt;
        rts&lt;br /&gt;
&lt;br /&gt;
* Routine that draws a 16*16 highcolor sprite on screen at a given position.&lt;br /&gt;
* INPUT: d0.w: center X coordinate of sprite&lt;br /&gt;
*        d1.w: center Y coordinate of sprite&lt;br /&gt;
DRAW_SPRITE:&lt;br /&gt;
        movea.l #screen,a0                      * Get address of the screen.&lt;br /&gt;
        movea.l #sprite,a1                      * Get address of the sprite.&lt;br /&gt;
        subq.w  #16/2,d0                        * Get left position of sprite.&lt;br /&gt;
        subq.w  #16/2,d1                        * Get right position of sprite.&lt;br /&gt;
        add.l   d0,d0                           * / Calculate sprite's&lt;br /&gt;
        mulu.w  #320*2,d1                       * | offset on&lt;br /&gt;
        add.l   d0,d1                           * \ the screen.&lt;br /&gt;
        adda.l  d1,a0                           * Add offset to the screenaddy.&lt;br /&gt;
        move.w  #16-1,d7                        * Setup Y loopcounter.&lt;br /&gt;
&lt;br /&gt;
yloop:  move.w  #16-1,d6                        * Setup X loopcounter.&lt;br /&gt;
&lt;br /&gt;
xloop:  move.w  (a1)+,(a0)+                     * Plot one pixel and goto next.&lt;br /&gt;
        dbra    d6,xloop&lt;br /&gt;
&lt;br /&gt;
        adda.w  #(320-16)*2,a0                  * Goto to next screenline.&lt;br /&gt;
        dbra    d7,yloop&lt;br /&gt;
        rts&lt;br /&gt;
&lt;br /&gt;
******** DATA MEMORY SECTION ********&lt;br /&gt;
&lt;br /&gt;
        DATA&lt;br /&gt;
&lt;br /&gt;
number_of_sprites:&lt;br /&gt;
        DC.W    4                               * four sprites in our table&lt;br /&gt;
&lt;br /&gt;
sprite_table:&lt;br /&gt;
        DC.W    167,89                          * X and Y centers of sprite 1&lt;br /&gt;
        DC.W    23,156                          * X and Y centers of sprite 2&lt;br /&gt;
        DC.W    230,53                          * X and Y centers of sprite 3&lt;br /&gt;
        DC.W    97,170                          * X and Y centers of sprite 4&lt;br /&gt;
&lt;br /&gt;
sprite: INCBIN  SPRITE.SPR                      * Include 16*16 binary sprite.&lt;br /&gt;
&lt;br /&gt;
******** RESERVED MEMORY SECTION ********&lt;br /&gt;
&lt;br /&gt;
        BSS&lt;br /&gt;
&lt;br /&gt;
screen: DS.W    320*200                         * Reserve a 320*200 HC screen.&lt;br /&gt;
&lt;br /&gt;
*==============================================================================&lt;br /&gt;
&lt;br /&gt;
*==============================================================================&lt;br /&gt;
* :STep        -=&amp;lt; II &amp;gt;=- Finding the bottleneck:&lt;br /&gt;
&lt;br /&gt;
Well... You found the innermost loop yet? Ofcourse, it's the &amp;quot;xloop&amp;quot; inside&lt;br /&gt;
the &amp;quot;yloop&amp;quot; inside &amp;quot;DRAW_SPRITE&amp;quot;. If you look closely you'll notice there are&lt;br /&gt;
4 sprites to draw and the sprites are 16*16=256 pixels to draw.&lt;br /&gt;
This equals a total of 4*256 = 1024 pixels to means 1024 (!) times the xloop&lt;br /&gt;
is executed!! (I the sound of that word &amp;quot;executed&amp;quot; :-])&lt;br /&gt;
&lt;br /&gt;
If you check out the relevance on the other loops, you'll see that an equal&lt;br /&gt;
amount is done in &amp;quot;yloop&amp;quot; and quite alot more is done in &amp;quot;draw_sprite_loop&amp;quot;.&lt;br /&gt;
This might be true, but &amp;quot;yloop&amp;quot; is done only 4*16 = 64 times and&lt;br /&gt;
&amp;quot;draw_sprite_loop&amp;quot; is done a mere 4 times.&lt;br /&gt;
&lt;br /&gt;
A close study of how much the each loop costs gives the following results:&lt;br /&gt;
&lt;br /&gt;
xloop:                 approx. 12 clockcycles on 68030 (without cachehit)&lt;br /&gt;
yloop:                 approx. 10 clockcycles&lt;br /&gt;
draw_sprite_loop:      approx. 70 clockcycles&lt;br /&gt;
&lt;br /&gt;
Then multiply this by the number of times each loop is done.&lt;br /&gt;
&lt;br /&gt;
xloop:                 12 * 1024 = 12288 cycles&lt;br /&gt;
yloop:                 10 *   64 =   640 cycles&lt;br /&gt;
draw_sprite_loop:      70 *    4 =   280 cycles&lt;br /&gt;
&lt;br /&gt;
It's very clear that xloop is the bottleneck here and hell, you needn't even&lt;br /&gt;
do the calculations in this case, it's all quite evident. As soon as you know&lt;br /&gt;
the innerloop, it's settled.&lt;br /&gt;
&lt;br /&gt;
*==============================================================================&lt;br /&gt;
* :STep        -=&amp;gt; III &amp;lt;=- Optimising the innerloop:&lt;br /&gt;
&lt;br /&gt;
How to some perform some serious clockcycle hackin' procedures on tha xloop's&lt;br /&gt;
ass?!?! Well.. You could always unroll the loop.&lt;br /&gt;
&lt;br /&gt;
yloop:&lt;br /&gt;
xloop:  move.w  (a1)+,(a0)+&lt;br /&gt;
        move.w  (a1)+,(a0)+&lt;br /&gt;
        move.w  (a1)+,(a0)+&lt;br /&gt;
        move.w  (a1)+,(a0)+&lt;br /&gt;
        move.w  (a1)+,(a0)+&lt;br /&gt;
        move.w  (a1)+,(a0)+&lt;br /&gt;
        move.w  (a1)+,(a0)+&lt;br /&gt;
        move.w  (a1)+,(a0)+&lt;br /&gt;
        move.w  (a1)+,(a0)+&lt;br /&gt;
        move.w  (a1)+,(a0)+&lt;br /&gt;
        move.w  (a1)+,(a0)+&lt;br /&gt;
        move.w  (a1)+,(a0)+&lt;br /&gt;
        move.w  (a1)+,(a0)+&lt;br /&gt;
        move.w  (a1)+,(a0)+&lt;br /&gt;
        move.w  (a1)+,(a0)+&lt;br /&gt;
        move.w  (a1)+,(a0)+&lt;br /&gt;
&lt;br /&gt;
        adda.w  (320-16)*2,a0&lt;br /&gt;
        dbra    d7,yloop&lt;br /&gt;
&lt;br /&gt;
Yep... this eliminates the move.w #16-1,d6 as well as the dbra 16,xloop. This&lt;br /&gt;
reduces the number of cycles for a pixel from 12 to 10. Very smart, but this&lt;br /&gt;
code looks kinda chunky. AND... There is still a way to reduce the size of the&lt;br /&gt;
code and speed it up even more!&lt;br /&gt;
&lt;br /&gt;
yloop:&lt;br /&gt;
xloop:  movem.l (a1)+,d0-d6/a2                  * Move 16 pixels into regs and&lt;br /&gt;
                                                * goto next spriteline.&lt;br /&gt;
        movem.l d0-d6/a2,(a0)                   * Move 16 pixels onto screen.&lt;br /&gt;
&lt;br /&gt;
        adda.w  #320*2,a0&lt;br /&gt;
        dbra    d7,yloop&lt;br /&gt;
&lt;br /&gt;
The movem.l instruction is specialized for moving large amount of LONGs in/out&lt;br /&gt;
of the registers. In this example we move 8 LONGs (d0,d1,d2,d3,d4,d5,d6 and a2)&lt;br /&gt;
and every LONG is two highcolor pixels. So 8 * 2 = 16 pixels.&lt;br /&gt;
&lt;br /&gt;
But how fast is this really? Well, according to the literature about&lt;br /&gt;
cyclefucking on the Falcon, this should be about 140 cycles for the pair, so&lt;br /&gt;
140 / 16 = a bit less than 9 cycles for a pixel. And hey, that's just a bit&lt;br /&gt;
faster.&lt;br /&gt;
&lt;br /&gt;
Let's check out the score so far:&lt;br /&gt;
Old number of cycles: 12288 + 320 + 280 = 12888&lt;br /&gt;
&lt;br /&gt;
Now for the new timings.. The xloop doesn't exists anymore.. All that's left&lt;br /&gt;
is a pair of movem's. The total time for the yloop is something like 140+8 =&lt;br /&gt;
148.&lt;br /&gt;
&lt;br /&gt;
New number of cycles: 148*64 + 280 = 9752&lt;br /&gt;
&lt;br /&gt;
This means a ((12888/9752) - 1) * 100% = 32 % speed increase!&lt;br /&gt;
&lt;br /&gt;
*==============================================================================&lt;br /&gt;
* :STep        /|\ IV /|\ Perfection:&lt;br /&gt;
&lt;br /&gt;
Yes kids, grandpa Earx sure met a few freaks in his lifetime. People who won't&lt;br /&gt;
give up till they killed every redundant bit of code and counted every single&lt;br /&gt;
cycle (Hi Defjam, Llama and mr. Ni!). =)&lt;br /&gt;
&lt;br /&gt;
I know some coders that don't give a damn about excessive optimisation, but&lt;br /&gt;
still it might be nice to optimise this example a bit further, eventhough&lt;br /&gt;
there isn't more than a few percent in speed to gain.&lt;br /&gt;
&lt;br /&gt;
Let's start with the most inner loop again. This now is &amp;quot;yloop&amp;quot;. As you can&lt;br /&gt;
see you could unroll this loop also, but you've now already seen the principle&lt;br /&gt;
of this, so we'll focus on something else..&lt;br /&gt;
&lt;br /&gt;
The adda.w #320*2,a0 is quite slow, because the immediate data in the&lt;br /&gt;
instruction (the number 320*2) needs to be fetched every time this instruction&lt;br /&gt;
is done by the CPU. A better option is to put the number in a register and add&lt;br /&gt;
with this register everytime. This should save maybe 3 or 4 cycles every loop&lt;br /&gt;
(shock, horror!).&lt;br /&gt;
&lt;br /&gt;
Also, you could optimise &amp;quot;draw_sprite_loop&amp;quot; quite a bit more by keeping track&lt;br /&gt;
of which registers are used in &amp;quot;DRAW_SPRITE&amp;quot; so you don't use overlapping&lt;br /&gt;
registers and hence needn't do the register-(re)storing. Furthermore the&lt;br /&gt;
addq.w/cmp.w/bne.s combination can be transformed into a simple dbra which is&lt;br /&gt;
more efficient.&lt;br /&gt;
&lt;br /&gt;
Phew.. That's it. Ok, hope you learned something from this looping bussiness.&lt;br /&gt;
The summary:&lt;br /&gt;
&lt;br /&gt;
Clockcycle:            A single tick generated by the CPU-clockcrystal. An&lt;br /&gt;
                       instruction takes up a number of these ticks. Some&lt;br /&gt;
                       instructions take less cycles than others.&lt;br /&gt;
Dbra instruction:      Nothing to do with certain female bodyparts, but&lt;br /&gt;
                       actually an instruction often used to keep count of the&lt;br /&gt;
                       number of loops done and decide whether to reloop or&lt;br /&gt;
                       terminate the loop.&lt;br /&gt;
Hardcoding:            A term often used to describe optimising a piece of code&lt;br /&gt;
                       completely and only to one specific situation. This&lt;br /&gt;
                       mostly leads to exceedingly speedy, but also very&lt;br /&gt;
                       unreadable and chunky code. The word is often used too&lt;br /&gt;
                       instead of &amp;quot;unrolling&amp;quot;.&lt;br /&gt;
Loop:                  A piece of code that is reran by the CPU.&lt;br /&gt;
Mainloop:              A term mostly used for the most important loop in a&lt;br /&gt;
                       program.&lt;br /&gt;
Nested loop:           A loop within another loop.&lt;br /&gt;
Overhead:              The time wasted when executing a certain piece of code&lt;br /&gt;
                       in a specific case. Highly adaptable code mostly suffers&lt;br /&gt;
                       from huge amounts of overhead. Highly specialised code&lt;br /&gt;
                       has minimal overhead.&lt;br /&gt;
Subroutine:            A block of code terminated with a &amp;quot;rts&amp;quot; instruction.&lt;br /&gt;
                       Yes, basicly that's all there is to it.. :)) But you&lt;br /&gt;
                       should always jump to a subroutine with &amp;quot;bsr&amp;quot; or &amp;quot;jsr&amp;quot;.&lt;br /&gt;
Unrolling:             Write out the number of loops you want to do in your&lt;br /&gt;
                       code, by repeating the looped instructions everytime.&lt;br /&gt;
                       Mostly leads to fast code, but can get very hugy and&lt;br /&gt;
                       maybe to large to fit into the 68K cache.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>&gt;Simonsunnyboy</name></author>
	</entry>
</feed>