<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://www.temlib.org/AtariForumWiki/index.php?action=history&amp;feed=atom&amp;title=Earxtutchap7</id>
	<title>Earxtutchap7 - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://www.temlib.org/AtariForumWiki/index.php?action=history&amp;feed=atom&amp;title=Earxtutchap7"/>
	<link rel="alternate" type="text/html" href="https://www.temlib.org/AtariForumWiki/index.php?title=Earxtutchap7&amp;action=history"/>
	<updated>2026-05-13T17:35:20Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.39.2</generator>
	<entry>
		<id>https://www.temlib.org/AtariForumWiki/index.php?title=Earxtutchap7&amp;diff=13030&amp;oldid=prev</id>
		<title>&gt;Wongck at 15:31, 12 October 2011</title>
		<link rel="alternate" type="text/html" href="https://www.temlib.org/AtariForumWiki/index.php?title=Earxtutchap7&amp;diff=13030&amp;oldid=prev"/>
		<updated>2011-10-12T15:31:35Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 11:31, 12 October 2011&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l481&quot;&gt;Line 481:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 481:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Back to [[ASM_Tutorial]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Back to [[ASM_Tutorial]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Programming&lt;/del&gt;]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Making optimized assembly code by Earx&lt;/ins&gt;]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>&gt;Wongck</name></author>
	</entry>
	<entry>
		<id>https://www.temlib.org/AtariForumWiki/index.php?title=Earxtutchap7&amp;diff=13029&amp;oldid=prev</id>
		<title>&gt;Silver Surfer: Added category</title>
		<link rel="alternate" type="text/html" href="https://www.temlib.org/AtariForumWiki/index.php?title=Earxtutchap7&amp;diff=13029&amp;oldid=prev"/>
		<updated>2009-05-02T17:12:34Z</updated>

		<summary type="html">&lt;p&gt;Added category&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 13:12, 2 May 2009&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l481&quot;&gt;Line 481:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 481:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Back to [[ASM_Tutorial]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Back to [[ASM_Tutorial]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;[[Category:Programming]]&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>&gt;Silver Surfer</name></author>
	</entry>
	<entry>
		<id>https://www.temlib.org/AtariForumWiki/index.php?title=Earxtutchap7&amp;diff=13028&amp;oldid=prev</id>
		<title>&gt;Zorro 2 at 08:11, 9 October 2006</title>
		<link rel="alternate" type="text/html" href="https://www.temlib.org/AtariForumWiki/index.php?title=Earxtutchap7&amp;diff=13028&amp;oldid=prev"/>
		<updated>2006-10-09T08:11:22Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 04:11, 9 October 2006&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l479&quot;&gt;Line 479:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 479:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;totally unpredictable and incompatible with newer CPU's.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;totally unpredictable and incompatible with newer CPU's.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Back to [[ASM_Tutorial]]&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>&gt;Zorro 2</name></author>
	</entry>
	<entry>
		<id>https://www.temlib.org/AtariForumWiki/index.php?title=Earxtutchap7&amp;diff=13027&amp;oldid=prev</id>
		<title>&gt;Simonsunnyboy at 17:49, 6 October 2006</title>
		<link rel="alternate" type="text/html" href="https://www.temlib.org/AtariForumWiki/index.php?title=Earxtutchap7&amp;diff=13027&amp;oldid=prev"/>
		<updated>2006-10-06T17:49:09Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;&amp;lt;pre&amp;gt;&lt;br /&gt;
                       //==/\=/\=/\=/\=/\=/\=/\=/\=/\==\\&lt;br /&gt;
                      &amp;lt;&amp;lt; CHAPTER 7 : EXTREME OPTIMISING &amp;gt;&amp;gt;&lt;br /&gt;
                       \\==\/=\/=\/=\/=\/=\/=\/=\/=\/==//&lt;br /&gt;
&lt;br /&gt;
This chapter is dedicated to 680x0 instruction optimising. In chapter 5 is&lt;br /&gt;
some information about higher level optimisations. This chapter is more like&lt;br /&gt;
a big appendix with a huge amount of small optimisation tips gathered from&lt;br /&gt;
here and there.&lt;br /&gt;
&lt;br /&gt;
Now there are two rules when it comes to optimising instructions:&lt;br /&gt;
1) Shorter is &amp;quot;mostly&amp;quot; faster. If an instruction takes up 10 bytes and can&lt;br /&gt;
   be replaced with one of only 2 bytes, the latter will be faster most of the&lt;br /&gt;
   time.&lt;br /&gt;
2) Simple instructions are often the fastest. Instructions like multiplies,&lt;br /&gt;
   subroutine jumps, traps, etc. have loads of functionality in them. Simple&lt;br /&gt;
   instructions like add, move, etc (much used stuff) are optimised to run&lt;br /&gt;
   in only a small number of cycles. (sometimes only 1 on a 68060).&lt;br /&gt;
3) On the 680x0 series, optimisation is mostly the case of getting the most&lt;br /&gt;
   used constant and variable data in the registers instead of ram or&lt;br /&gt;
   immediate data. This is a rule you must always stick to! The trick is to&lt;br /&gt;
   store all values in registers when initialising a loop.&lt;br /&gt;
4) A good strategy has always been precalculating or precalcing as coders&lt;br /&gt;
   say. Basicly the same as the preshifted sprites things, but a more&lt;br /&gt;
   general term. Precalcing all kinds of bitmaps is fast because the heavy&lt;br /&gt;
   calculations are done only once and the CPU is faster at moving bitmaps&lt;br /&gt;
   then calculating bitmaps realtime.&lt;br /&gt;
&lt;br /&gt;
Addressing:&lt;br /&gt;
&lt;br /&gt;
When accessing the highmemory registers (videocontrol/colorpalette,&lt;br /&gt;
interrupt control) or lowmemory vars (the exceptionvectors or cookies). You&lt;br /&gt;
can use the word-based addressingmode instead of the long-based&lt;br /&gt;
&lt;br /&gt;
$xxxx.l -&amp;gt; $xxxx.w&lt;br /&gt;
&lt;br /&gt;
$ffffxxxx.l -&amp;gt; $xxxx.w&lt;br /&gt;
&lt;br /&gt;
        move.l  a0,$00000000.l&lt;br /&gt;
        =&lt;br /&gt;
        move.l  a0,$0000.w&lt;br /&gt;
&lt;br /&gt;
        move.l  d0,$ffff8200.l&lt;br /&gt;
        =&lt;br /&gt;
        move.l  d0,$ffff8200.w&lt;br /&gt;
&lt;br /&gt;
Using programcounter-relative addressingmodes (pc-modes) can also be a bit&lt;br /&gt;
faster. Sometimes data can be nearby the programcounter. If it isn't more&lt;br /&gt;
than 32KB away you can use it. (I hate the word &amp;quot;pc&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
        move.w  localvar(pc),d0&lt;br /&gt;
&lt;br /&gt;
localvar:&lt;br /&gt;
        DC.W    0&lt;br /&gt;
&lt;br /&gt;
Branching:&lt;br /&gt;
&lt;br /&gt;
All branch-type instructions (bsr, bra, bcc) offer short versions. This&lt;br /&gt;
means they are only as big as a two bytes instead of four bytes. Ofcourse&lt;br /&gt;
this label you branch to mustn't be more than 128 bytes away! AND a&lt;br /&gt;
shortbranch can never branch 0 bytes away!&lt;br /&gt;
&lt;br /&gt;
        bra.s   locallabel&lt;br /&gt;
        move.w  #56,d0&lt;br /&gt;
locallabel:&lt;br /&gt;
        move.w  d0,d1&lt;br /&gt;
&lt;br /&gt;
Moving:&lt;br /&gt;
&lt;br /&gt;
Moving small immediate values into a register longword, can sometimes be&lt;br /&gt;
replaced by &amp;quot;moveq&amp;quot; or &amp;quot;move quickly&amp;quot;. Values can only range from -128 to&lt;br /&gt;
127.&lt;br /&gt;
&lt;br /&gt;
        move.l  #-128,d0&lt;br /&gt;
        =&lt;br /&gt;
        moveq   #128,d0&lt;br /&gt;
&lt;br /&gt;
A really nice trick to save up some bytes is:&lt;br /&gt;
&lt;br /&gt;
        move.l  #$10000000,d0&lt;br /&gt;
        =&lt;br /&gt;
        moveq   #1,d0&lt;br /&gt;
        ror.l   #4,d0&lt;br /&gt;
&lt;br /&gt;
Saves 2 bytes =). Not much, but every byte counts when making 128bytro's.&lt;br /&gt;
&lt;br /&gt;
When moving from memory into registers, the &amp;quot;momem&amp;quot; instruction is always&lt;br /&gt;
nice. If you're doing 3 or more word/long moves this is an ideal solution.&lt;br /&gt;
&lt;br /&gt;
        move.l  (a0)+,d0&lt;br /&gt;
        move.l  (a0)+,d1&lt;br /&gt;
        move.l  (a0)+,d2&lt;br /&gt;
        =&lt;br /&gt;
        movem.l (a0)+,d0-d2&lt;br /&gt;
&lt;br /&gt;
Ofcourse this can also be done for word-sizes:&lt;br /&gt;
&lt;br /&gt;
        move.w  (a0)+,d0&lt;br /&gt;
        move.w  (a0)+,d1&lt;br /&gt;
        move.w  (a0)+,d2&lt;br /&gt;
        ext.l   d0&lt;br /&gt;
        ext.l   d1&lt;br /&gt;
        ext.l   d2&lt;br /&gt;
        =&lt;br /&gt;
        movem.w (a0)+,d0-d2&lt;br /&gt;
&lt;br /&gt;
This has the added advantage of automaticly extending the words to longs!!&lt;br /&gt;
Can be very handy in some cases!&lt;br /&gt;
&lt;br /&gt;
Add's:&lt;br /&gt;
&lt;br /&gt;
Add's and sub's with small immediate data numbers can often be replaced with&lt;br /&gt;
&amp;quot;addq&amp;quot; and &amp;quot;subq&amp;quot;. These mean &amp;quot;add quickly&amp;quot; and &amp;quot;subtract quickly&amp;quot; and&lt;br /&gt;
they're called that for a reason =)&lt;br /&gt;
&lt;br /&gt;
        add.l   #1,d0&lt;br /&gt;
        =&lt;br /&gt;
        addq.l  #1,d0&lt;br /&gt;
&lt;br /&gt;
        add.l   #7,d0&lt;br /&gt;
        =&lt;br /&gt;
        addq.l  #7,d0&lt;br /&gt;
&lt;br /&gt;
        sub.l   #1,d0&lt;br /&gt;
        =&lt;br /&gt;
        subq.l  #1,d0&lt;br /&gt;
&lt;br /&gt;
        sub.l   #7,d0&lt;br /&gt;
        =&lt;br /&gt;
        subq.l  #7,d0&lt;br /&gt;
&lt;br /&gt;
Note, &amp;quot;subq.w&amp;quot; and &amp;quot;addq.w&amp;quot; are just a bit faster than &amp;quot;subq.l&amp;quot; and &amp;quot;addq.l&amp;quot;&lt;br /&gt;
on a standard 68000.&lt;br /&gt;
&lt;br /&gt;
When using &amp;quot;addq&amp;quot; and &amp;quot;subq&amp;quot; with address registers there is no&lt;br /&gt;
speeddifference. Only note that there is a difference between adding with a&lt;br /&gt;
long and a word!!&lt;br /&gt;
&lt;br /&gt;
        adda.l  d0,a0&lt;br /&gt;
is faster than:&lt;br /&gt;
        adda.w  d0,a0&lt;br /&gt;
&lt;br /&gt;
Also, when doing an address increment/decrement with immediate data it's the&lt;br /&gt;
best idea to use &amp;quot;lea&amp;quot; for this. Ofcourse this again faster.&lt;br /&gt;
&lt;br /&gt;
        adda.w  #3126,a0&lt;br /&gt;
        =&lt;br /&gt;
        lea     3126(a0),a0&lt;br /&gt;
&lt;br /&gt;
But this isn't the only good use of lea..&lt;br /&gt;
&lt;br /&gt;
        movea.l a0,a1&lt;br /&gt;
        adda.w  #3126,a1&lt;br /&gt;
        =&lt;br /&gt;
        lea     3126(a0),a1&lt;br /&gt;
&lt;br /&gt;
Cool, eh? A very fast and compact solution.&lt;br /&gt;
&lt;br /&gt;
Anding:&lt;br /&gt;
&lt;br /&gt;
When only having to modify the low word of a dataregister it might worth&lt;br /&gt;
considering only using a word for it. Byte sizes won't do any good, since&lt;br /&gt;
the instruction will be as big as the wordwise &amp;quot;and&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
        andi.l  #$fffffff0,d0                   * Slow.&lt;br /&gt;
        =&lt;br /&gt;
        andi.w  #$fff0,d0                       * Faster.&lt;br /&gt;
        =&lt;br /&gt;
        andi.b  #$f0,d0                         * Not faster or smaller.&lt;br /&gt;
&lt;br /&gt;
Clearing:&lt;br /&gt;
&lt;br /&gt;
clr.l/clr.w/clr.b on a register is dead stupid. Motorola made a serious&lt;br /&gt;
error in these instructions, so that they aren't that fast anymore. They&lt;br /&gt;
work, but should be avoided, because they are slow.&lt;br /&gt;
&lt;br /&gt;
        clr.l   d0&lt;br /&gt;
        =&lt;br /&gt;
        moveq   #0,d0&lt;br /&gt;
&lt;br /&gt;
        clr.l   a0                              * Slow.&lt;br /&gt;
        =&lt;br /&gt;
        suba.l  a0,a0                           * Faster.&lt;br /&gt;
        =&lt;br /&gt;
        movea.l d0,a0                           * Fastest, when d0 already contains 0.&lt;br /&gt;
&lt;br /&gt;
        clr.w   d0&lt;br /&gt;
        =&lt;br /&gt;
        sub.w   d0,d0&lt;br /&gt;
&lt;br /&gt;
        clr.b   d0&lt;br /&gt;
        =&lt;br /&gt;
        sub.b   d0,d0&lt;br /&gt;
&lt;br /&gt;
For clearing linear parts of memory the clr instruction can also be used.&lt;br /&gt;
It's fast enough for doing medium sized blocks and the big advantage is it&lt;br /&gt;
doesn't use up data registers. But, please note one thing..&lt;br /&gt;
&lt;br /&gt;
        clr.l   (a0)+&lt;br /&gt;
is actually slower than:&lt;br /&gt;
        clr.l   -(a0)&lt;br /&gt;
&lt;br /&gt;
When clearing big amounts of memory the movem.l command is unmissable.&lt;br /&gt;
Simply clear some data registers and movem.l them to the ram.&lt;br /&gt;
&lt;br /&gt;
* Clear all free registers.. Yes.. addressegisters too.&lt;br /&gt;
        moveq   #10-1,d7                        * Prepare to loop 100 times.&lt;br /&gt;
        adda.l  #100*14*4,a0                    * Set a0 to top of block.&lt;br /&gt;
        moveq   #0,d0&lt;br /&gt;
        moveq   #0,d1&lt;br /&gt;
        moveq   #0,d2&lt;br /&gt;
        moveq   #0,d3&lt;br /&gt;
        moveq   #0,d4&lt;br /&gt;
        moveq   #0,d5&lt;br /&gt;
        moveq   #0,d6&lt;br /&gt;
        movea.l d0,a1&lt;br /&gt;
        movea.l d0,a2&lt;br /&gt;
        movea.l d0,a3&lt;br /&gt;
        movea.l d0,a4&lt;br /&gt;
        movea.l d0,a5&lt;br /&gt;
        movea.l d0,a6&lt;br /&gt;
&lt;br /&gt;
loop:   movem.l d0-d6/a1-a6,-(a0)               * Move 13 longs to mem.&lt;br /&gt;
        movem.l d0-d6/a1-a6,-(a0)               * etc.&lt;br /&gt;
        movem.l d0-d6/a1-a6,-(a0)               * etc.&lt;br /&gt;
        movem.l d0-d6/a1-a6,-(a0)&lt;br /&gt;
        movem.l d0-d6/a1-a6,-(a0)&lt;br /&gt;
        movem.l d0-d6/a1-a6,-(a0)&lt;br /&gt;
        movem.l d0-d6/a1-a6,-(a0)&lt;br /&gt;
        movem.l d0-d6/a1-a6,-(a0)&lt;br /&gt;
        movem.l d0-d6/a1-a6,-(a0)&lt;br /&gt;
        movem.l d0-d6/a1-a6,-(a0)&lt;br /&gt;
        dbra    d7,loop&lt;br /&gt;
&lt;br /&gt;
Why &amp;quot;movem.l ...,-(a0)&amp;quot; instead of &amp;quot;movem.l ...,(a0)+&amp;quot;? Well.. because the&lt;br /&gt;
(a0)+ doesn't exist! =)&lt;br /&gt;
&lt;br /&gt;
Testing:&lt;br /&gt;
&lt;br /&gt;
After almost every move-/calculation- instruction the statusregister (sr)&lt;br /&gt;
takes on a conditioncode. The result of the operation (what is written in&lt;br /&gt;
the destination-operand) is tested and bits are set in the sr.&lt;br /&gt;
&lt;br /&gt;
This means you don't always have to perform a test after an operation:&lt;br /&gt;
&lt;br /&gt;
        move.w  d0,d1                           * Copy d0.w to d1.w.&lt;br /&gt;
        tst.w   d1                              * This is redundant!&lt;br /&gt;
        =&lt;br /&gt;
        move.w  d0,d1                           * Automaticly tests d1.w!&lt;br /&gt;
&lt;br /&gt;
The same goes for all operations to memory. However, when operating with&lt;br /&gt;
address-registers as the destination, automatic testing is not done! If you&lt;br /&gt;
want to test values, you HAVE TO TEST with a seperate instruction.&lt;br /&gt;
&lt;br /&gt;
        move.l  a0,a1                           * Copy a0.l in a1.l!&lt;br /&gt;
        cmpa.l  #0,a1                           * Test most be done!&lt;br /&gt;
&lt;br /&gt;
Also note that the basic 68000 has no special &amp;quot;tst&amp;quot; instruction for the&lt;br /&gt;
address-registers. You have to do this with a compare to zero. The 68020 and&lt;br /&gt;
above however, do have a special &amp;quot;tst&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
The most important bit is that with dataregister tests a copy-operation to&lt;br /&gt;
itself is faster than a &amp;quot;tst&amp;quot;!!&lt;br /&gt;
&lt;br /&gt;
        move.l  d0,d0&lt;br /&gt;
is faster than:&lt;br /&gt;
        tst.l   d0&lt;br /&gt;
&lt;br /&gt;
Alignment:&lt;br /&gt;
&lt;br /&gt;
This memory moving-stuff brings us to alignment of longwords. With the&lt;br /&gt;
coming of the 68030 and it's burstcache, it is advisable to put big longword&lt;br /&gt;
buffers on a longword egde or a 16 byte egde (16 bytes = size of one 68030&lt;br /&gt;
cacheline). A longword operation on an address on a word boundary is 30%&lt;br /&gt;
slower than on a longword edge.&lt;br /&gt;
&lt;br /&gt;
So when using large amounts of longs in ram, be sure to allocate all buffers&lt;br /&gt;
on 16 byte edges by ANDing addresses with #$fff0.&lt;br /&gt;
&lt;br /&gt;
Swaps:&lt;br /&gt;
&lt;br /&gt;
Swap is used to swap the highword and lowword of a dataregister. It is a&lt;br /&gt;
quite fast instruction. Or at least much faster than this:&lt;br /&gt;
&lt;br /&gt;
        moveq   #16,d0&lt;br /&gt;
        rol.l   d0,d1&lt;br /&gt;
        =&lt;br /&gt;
        swap    d1&lt;br /&gt;
&lt;br /&gt;
Multiplies:&lt;br /&gt;
&lt;br /&gt;
mulu/muls are costly instructions. Mostly ranging from 20 to 50 cycles (!)&lt;br /&gt;
&lt;br /&gt;
        mulu.w  #2,d0&lt;br /&gt;
        =&lt;br /&gt;
        add.w   d0,d0&lt;br /&gt;
&lt;br /&gt;
        mulu.w  #4,d0&lt;br /&gt;
        =&lt;br /&gt;
        lsl.w   #2,d0&lt;br /&gt;
&lt;br /&gt;
        mulu.w  #3,d0&lt;br /&gt;
        =&lt;br /&gt;
        move.w  d0,d1&lt;br /&gt;
        add.w   d0,d0&lt;br /&gt;
        add.w   d1,d0&lt;br /&gt;
&lt;br /&gt;
        mulu.w  #7,d0&lt;br /&gt;
        =&lt;br /&gt;
        move.w  d0,d1&lt;br /&gt;
        lsl.w   #3,d1&lt;br /&gt;
        sub.w   d0,d1&lt;br /&gt;
&lt;br /&gt;
But beware.. Don't overdo this!! Trying to change for a multipication with a&lt;br /&gt;
too complex number you'll get too much moves, add's or sub's. and this will&lt;br /&gt;
especially on 030 and above cost more cycles than an actual multiply&lt;br /&gt;
instruction. The Pure C compiler mostly does make this mistake and the code&lt;br /&gt;
generated will be very big and also slower.&lt;br /&gt;
&lt;br /&gt;
Multiply instructions do however have one major advantage.. They automaticly&lt;br /&gt;
extend words to longwords. Doing this in a move,add,sub combination costs&lt;br /&gt;
you an extra ext.l command.&lt;br /&gt;
&lt;br /&gt;
Divides:&lt;br /&gt;
&lt;br /&gt;
Divides are even more expensive than multiplies. They can sometimes be&lt;br /&gt;
replaced by simple shifts!! Whenever you divide with a power of 2&lt;br /&gt;
(2,4,8,16,32...) you can do this for instance.&lt;br /&gt;
&lt;br /&gt;
        divu.w  #8,d1&lt;br /&gt;
        =&lt;br /&gt;
        lsr.w   #3,d1&lt;br /&gt;
&lt;br /&gt;
When this is possible, please do so.. It saves up a tremendous amount of&lt;br /&gt;
cycles. When you really need an awful amount of divisions it's best to&lt;br /&gt;
prepare your data for shifting instead of divides.&lt;br /&gt;
&lt;br /&gt;
Divide instructions do also have one advantage. They automaticly perform a&lt;br /&gt;
modulo function as well! The modulo from a division is stored in the&lt;br /&gt;
highword of the destination register.&lt;br /&gt;
&lt;br /&gt;
        divu.w  #10,d0                          * Perform division and modulo.&lt;br /&gt;
        swap    d0                              * Get highword in lowword.&lt;br /&gt;
&lt;br /&gt;
Sometimes fixed-point divides can be replaced by multiplies. Instead of&lt;br /&gt;
first adjusting values for fixedpoint divisions and then doing expensive&lt;br /&gt;
divides, you can also use only one multiply!&lt;br /&gt;
&lt;br /&gt;
        swap    d0                              * Shift d0.w up 16 bits.&lt;br /&gt;
        divu.w  #3,d0                           * Divide it.&lt;br /&gt;
        =&lt;br /&gt;
        mulu.w  #$5555,d0                       * Mulitply d0.w with 1/3.&lt;br /&gt;
&lt;br /&gt;
Modulo:&lt;br /&gt;
&lt;br /&gt;
Which brings us to our next topic. When trying to get the modulo of a&lt;br /&gt;
number, you can again make this a modulo function with a power of 2.&lt;br /&gt;
&lt;br /&gt;
        andi.w  #4-1,d0                         * = d0 MOD 4&lt;br /&gt;
&lt;br /&gt;
        andi.w  #8-1,d0                         * = d0 MOD 8&lt;br /&gt;
&lt;br /&gt;
An even better option is making this a modulo function with 256 or 65536.&lt;br /&gt;
Why? This is exactly the range of the byte/word unit.&lt;br /&gt;
&lt;br /&gt;
* Perform calc on d0 here...&lt;br /&gt;
        move.b  d0,d1                           * = d0 MOD 256&lt;br /&gt;
&lt;br /&gt;
* Perform calc on d0 here&lt;br /&gt;
        move.w  d0,d1                           * = d0 MOD 65536&lt;br /&gt;
&lt;br /&gt;
Shifting:&lt;br /&gt;
&lt;br /&gt;
Now you understand that a combination of shifts and moves is actually faster&lt;br /&gt;
than a multiply (except on 68060, which has razorfast multiplying). Shifting&lt;br /&gt;
left by one can be replaced with a simple &amp;quot;add&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
        lsl.l   #1,d0&lt;br /&gt;
        =&lt;br /&gt;
        add.l   d0,d0&lt;br /&gt;
&lt;br /&gt;
        asl.l   #1,d0&lt;br /&gt;
        =&lt;br /&gt;
        add.l   d0,d0&lt;br /&gt;
&lt;br /&gt;
On a 68000 upto 68030 the shift instruction takes up 8 cycles and an &amp;quot;add&amp;quot;&lt;br /&gt;
takes up 4 on 68000 and only 2 on 68030.&lt;br /&gt;
&lt;br /&gt;
With 680x0 coding you're often confronted with having to shift right or left&lt;br /&gt;
8 bits. This because you need a byte in the lowest part of the register for&lt;br /&gt;
instance. Sometimes this can be avoided with &amp;quot;addx&amp;quot; loops as shown in&lt;br /&gt;
chapter 6.&lt;br /&gt;
&lt;br /&gt;
In some other situations, where you need to spilt up a word into two&lt;br /&gt;
seperate bytes spread over a longword (i.e. 0000xxyy -&amp;gt; 00xx00yy), you can&lt;br /&gt;
use the &amp;quot;movep&amp;quot;-instruction. A movep must have a data-register as one&lt;br /&gt;
operand and a memory-address as the other.&lt;br /&gt;
&lt;br /&gt;
For instance:&lt;br /&gt;
&lt;br /&gt;
        movep.w d0,(a0)                         * 0000xxyy -&amp;gt; 00xx00yy&lt;br /&gt;
or:&lt;br /&gt;
        movep.w (a0),d0                         * 00xx00yy -&amp;gt; 0000xxyy&lt;br /&gt;
&lt;br /&gt;
The weird thing about the instruction was, that it was originally meant as&lt;br /&gt;
an easier way to access the hardware-registers (often using bytes instead of&lt;br /&gt;
words), but you really don't need it for that so often as you'd need it for&lt;br /&gt;
optimising bitplane copying.&lt;br /&gt;
&lt;br /&gt;
Offsets:&lt;br /&gt;
&lt;br /&gt;
The 680x0 offers the lovely (an,dn) addressingmode.&lt;br /&gt;
&lt;br /&gt;
        adda.l  d0,a0&lt;br /&gt;
        move.w  d1,(a0)&lt;br /&gt;
        suba.l  d0,a0&lt;br /&gt;
        =&lt;br /&gt;
        move.w  d1,0(a0,d0.l)&lt;br /&gt;
&lt;br /&gt;
Ofcourse this is only useful in particular cases. When you need a different&lt;br /&gt;
offset for every move to an address, this offset addressingmode is&lt;br /&gt;
unbeatable.&lt;br /&gt;
&lt;br /&gt;
From the 68020 and on the scaled offset addressingmode is at your disposal.&lt;br /&gt;
This is absolutely gorgeous.&lt;br /&gt;
&lt;br /&gt;
        move.l  d0,d2&lt;br /&gt;
        lsl.l   #3,d0&lt;br /&gt;
        adda.l  d0,a0&lt;br /&gt;
        move.w  d1,(a0)&lt;br /&gt;
        suba.l  d0,a0&lt;br /&gt;
        move.l  d2,d0&lt;br /&gt;
        =&lt;br /&gt;
        move.w  d1,0(a0,d0.l*8)&lt;br /&gt;
&lt;br /&gt;
There is another big advantage to these addressingmodes.. They can be used&lt;br /&gt;
by &amp;quot;lea&amp;quot; as well!&lt;br /&gt;
&lt;br /&gt;
        move.l  d0,d1&lt;br /&gt;
        lsl.l   #3,d0&lt;br /&gt;
        movea.l a0,a1&lt;br /&gt;
        adda.l  d0,a1&lt;br /&gt;
        move.l  d1,d0&lt;br /&gt;
        =&lt;br /&gt;
        lea     (a0,d0.l*8),a1&lt;br /&gt;
&lt;br /&gt;
Stack moves:&lt;br /&gt;
&lt;br /&gt;
As most demo-coders.. I hate the stack. But sometimes when moving immediate-&lt;br /&gt;
data onto it you can optimise the lot.&lt;br /&gt;
&lt;br /&gt;
        move.w  #1,-(sp)&lt;br /&gt;
        move.w  #2,-(sp)&lt;br /&gt;
        =&lt;br /&gt;
        move.l  #$00020001,-(sp)&lt;br /&gt;
&lt;br /&gt;
        move.l  #$00001000,-(sp)&lt;br /&gt;
        =&lt;br /&gt;
        pea     $1000.w&lt;br /&gt;
&lt;br /&gt;
Selfmodifying code:&lt;br /&gt;
&lt;br /&gt;
Please people, avoid this technique. It will only work on simple 68000 or on&lt;br /&gt;
&amp;gt;68020 with the caches turned off. Still, on a basic 8MHz 68000 it might be&lt;br /&gt;
considering if you absolutely want to get everything out of your machine.&lt;br /&gt;
&lt;br /&gt;
Selfmodifying code is mostly used in loops where you don't have enough&lt;br /&gt;
registers left (i.e. used up all 8 data and 7 address registers). Using&lt;br /&gt;
variables in RAM is a poor solution since this slows down, quite alot&lt;br /&gt;
compared to using registers.&lt;br /&gt;
&lt;br /&gt;
The technique relies on immediate data instructions and modifying the&lt;br /&gt;
immediate data everytime you need to change the variable. This means loading&lt;br /&gt;
the address of the immediate data field of the instruction and changing the&lt;br /&gt;
value.&lt;br /&gt;
&lt;br /&gt;
But selfmodifying code hasn't been used by most coders for quite a while now&lt;br /&gt;
and that's a good thing. It makes code horribly unreadable as well as&lt;br /&gt;
totally unpredictable and incompatible with newer CPU's.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>&gt;Simonsunnyboy</name></author>
	</entry>
</feed>