-------------------------------------------------------------------------------- CHUNK 2 PLANAR GRAPHICS - THE WAY TO DO IT -------------------------------------------------------------------------------- introduction -------------- Hi freaks... I'm Ray, the new coder of .tSCc. This is my first tutorial so let me explain some basic things: First of all I got to tell you that all the examples will be plain 68000 asm code and sometimes it'll get quite machine-dependend. So my tutorials will only be useful with the ST and its "brothers" - so don't even get the idea of taking the following descriptions for valid on a TT or something, but I think that's clear somehow. On the second I'm expecting you to understand the basics of 68k programing because I don't have the time to teach you 68k coding in all its sometimes 'bizarre' details like fx. The diffrent addresing modes or something (but watch out 'cause I'm going to release a 68k-tutorial in one of the next magazines). And if there's something you don't understand just read it 2 or more times just until you got it - if this doesn't help just ask me. So, if you wanna contact me don't hesitate: 1) e-mail : reimund.dratwa@freenet.de 2) homepage: http://rd-developments.de.gs (my page before I joined .tSCc.) (don't forget to grab the newest tutorial!) let's get it on ----------------- Let me tell you some sentences about the purpose of this tutorial. Today I'll try to explain how a technique called 'chunk 2 planar graphics', which is the secret of all those modern demos, can be realized on the ST. Now you might ask, what the hell means chunk to planar?!? Since I think you know how the ST's bitplane-screen works (if that isn't the case read on in the next paragraph) you'll propably know how hard it is to set a pixel and even worse, how slow. So chunk 2 planar 'conversion' means that the bitplanes bits are set according to byte-values in a so called chunky-buffer where every byte represents the color of one single plot on the screen (this buffer could be compared to the PC's VGA-memory of a 320x200x256 screen - don't get me wrong, Intel really suckz! - I just wanted to give an example). First you set up your effect or what ever you want in that chunky-buffer by simply moving the according bytes into it and in a further step the chunky- buffer is converted to bitplane-data (hence planar simply means bitplane- graphics). How this works exactly will be described later on... a brief description of the bitplanes -------------------------------------- Skip this paragraph if you already know how the ST's screen is arranged. As already mentioned, it's hard to set pixels 'by hand' because the ST's screen is somehow splitted into bitplanes. Ok I'll be some more exact, but notice: I'm only covering ST low rez 320x200x16. In this resolution the screen is divided into 4 so called 'bitplanes' every four 'corresponding' bits of these bitplanes indicate in which color one pixel on the screen should appear. So lets take a look at the first 4 words of screen-memory. The diagram should make it clear: screen 0 319 ------------------------------------------- 0 -* - - - - - - - - - . . . 199 - - ------------------------------------------- Assuming we wanted to set the pixel in the upper left corner of the screen (0/0), marked with *, in color $D (=13 if you don't know what's hexadecimal just stop reading) this would mean we would have to manipulate the highest bit of the first 4 words in screen memory. $D (hex) = %1101 (bin) screen memory: bit 15 7 0 -------------------------------------------------------------------------------- bitpl/word 1 | 1 | | | | | | | | | | | | | | | | -------------------------------------------------------------------------------- bitpl/word 2 | 1 | | | | | | | | | | | | | | | | -------------------------------------------------------------------------------- bitpl/word 3 | 0 | | | | | | | | | | | | | | | | -------------------------------------------------------------------------------- bitpl/word 4 | 1 | | | | | | | | | | | | | | | | -------------------------------------------------------------------------------- Got it, no - then let's do some more examples: Just imagine we wanted set the pixel right beside the one before (ie. 1/0) in the same color. Then we'd just mainpulate bit 14 of the 4 words. But now be careful, what if we wanted to set pixel 16/0 (the 17th on the sceen)? The answer is just as simple as this: since every single of the 4 words only holds 16 bits (=16 pixels) we just had to skip over to the next 4 words in memory (5,6,7,8). I think you're getting the idea, right? But I'll think now you'll see why setting pixels is really slow on the ST, as well. It's just because you have to 'bit set' 4 places in screen-memory just for one pixel - and I think you know that bit operations are kinda hard for a byte/word/long aligned cpu. Now guys, I think it's time for some code at the end of this paragraph (let's go back to the first example when we wanted to set the upper left pixel with color $D): move.l screen,A0 * assume the screen addr is stored in 'screen' clr.w D0 * start at offset 0 ori.w #%1000000000000000,0(A0,D0.w) * set bit 15 of the 1st word of bitpl 1 ori.w #%1000000000000000,2(A0,D0.w) * set bit 15 of the 1st word of bitpl 2 andi.w #%0111111111111111,4(A0,D0.w) * clear bit 15 of the 1st word of bitpl 3 ori.w #%1000000000000000,6(A0,D0.w) * set bit 15 of the 1st word of bitpl 4 Hey - we've just set the first pixel! Now try what happens if you fx. move #8 into D0 instead of clearing it... the secret of c2p, how it works and why it's so quick ------------------------------------------------------- The only significant thing I can mention on top of this paragraph is that the whole let's call it secret of the c2p conversion is the 68k's movep-instruction: what movep actually does is moving word or long values 'byte per word' aligned by splitting them up into bytes, so let's take a look at a little example: move.l #$FE23D2E0,D0 movep.l D0,(A0) * this instruction kinda covers 64 bit or in other * words only the upper bytes of 4 following words in * memory (already smelling what we'll do ? ;) ) after the instruction: (A0) = $FE 2(A0) = $23 4(A0) = $D2 6(A0) = $E0 or if we start at an odd addr. (if A0 holds an even one): move.l #$FE23D2E0,D0 movep.l D0,1(A0) * now movep will will use the lower 8 bits of the data- * bus or some more simple : now it covers the lower * bytes of 4 following words after the instruction: 1(A0) = $FE 3(A0) = $23 5(A0) = $D2 7(A0) = $E0 if we didn't have movep we'd have to to: move.l #$FE23D2E0,D0 move.b D0,6(A0) lsr.l #8,D0 move.b D0,4(A0) lsr.l #8,D0 move.b D0,2(A0) lsr.l #8,D0 move.b D0,(A0) I think then we could forget our chunk 2 planar graphics - thanx motorola! And now go on reading very carefully because now I'll describe the core of the c2p conversion: Well, what we need to do a c2p conv. is, like mentioned above, 1st a 'chunky- buffer' holding the chunky byte data and at 2nd a c2p table which is finally used for the conversion. Don't be afraid I'll give you some more detail...first of all let's talk about the chunky buffer: to get the conversion anywhere near reasonable quick we need to double the pixels horizontally and vertically, what means that one 'chunky-byte' in fact represents 2x2 pixels on the screen and it means that we'll get a virtual rez of 160x100 'chunky-pixels', of course that's why the chunky-buffer is sized 160x100 bytes = 16000 bytes (notice: the size of the chunky buffer may be variable but to keep it simple I'm talking of the fullscreen conversion). Now some words on the c2p table and the actual conversion: what we will do now - let's assume the first 4 bytes of the chunky buffer would have the values $03,$04,$0D,$02 (these 4 values represent 4 chunky-pixels or 4 double pixels on the screen). because only the lower 4 bits of these values are used (16 colors!) we're now able to 'pack' those 4 bytes into one word the following way: lea chunkybuffer,A0 * A0 now points to the chunky buffer moveq #0,D0 * clear D0.l move.w (A0)+,D0 * D0 = $0304 - 1st 2 values out of the chunky buffer lsl.w #4,D0 * D0 = $3040 - shift them up one nybble or.w (A0)+,D0 * D0 = $3D42 - that's the four 1st values in one word But stop, values 2 and 3 have been swapped somehow, wouldn't it be more simple if D0 would contain $34D2 like in the chunkybuffer?? That brings us to the next element of the c2p conv. - the c2p table (which is, apart from the low resolution of c2p graphics, a further disatvantage against the planar-graphics. ie. the high memory consumption!) The chunk 2 planar table holds all the possible bitplane-combinations of 4 double-pixels on the screen. And since every one of those pixels can have 16 colors it's 16 * 16 * 16 * 16 * 4 bytes = 256 kb huge! (* 4 because we get longwords out of it). But when we precalc this c2p-tbl we need to remeber that pixels 2 and 3 of that 4 pixel 'quad' are swapped - because now we gonna use the value of D0 as offset: lsl.l #2,D0 * get the longword alingment lea c2p,A1 * set up the address of the c2p tbl move.l screen,A2 * A2 points to our screen move.l 0(A1,D0.l),D0 * move the needed value into D0 movep.l D0,0(A2) * set the first 4 doublepixels on the screen movep.l D0,160(A2) * do the scanline below Wow, this little program-block has set 4 double pixles on the screen! It's also managed to skip the lower bytes of the bitplane words using the movep instruction, meaning that the other byte of the bitplane word won't be affected. What you'll do now is getting the next for double-pixels and converting them to the next screenoffset and so on. But notice that you have to skip one scanline when you've done 160 double-pixels because the pixels are also doubled vertically. So, that's all. Did you get it? I hope so... And now I think it's at the time to say a few words about the demo: what it does is 1. setting up a pattern in the chunkybuffer CHUNKYBUFFER: REPT 1000 * 16000 bytes DC.L $00010203 DC.L $04050607 DC.L $08090A0B DC.L $0C0D0E0F ENDR 2. doing the c2p conversion (do this 100 times - one for each 2 scanlines) move.w #20-1,D7 * convert one scanline SETPIXEL: * = 20 * 8 doublepixels moveq #0,D0 move.w (A0)+,D0 lsl.w #4,D0 or.w (A0)+,D0 lsl.l #2,D0 move.l 0(A1,D0.l),D0 movep.l D0,0(A2) movep.l D0,160(A2) moveq #0,D0 move.w (A0)+,D0 lsl.w #4,D0 or.w (A0)+,D0 lsl.l #2,D0 move.l 0(A1,D0.l),D0 movep.l D0,1(A2) movep.l D0,161(A2) addq.w #8,A2 bra D7,SETPIXEL . . . 3. scrolling the chunkybuffer lea CHUNKYBUFFER,A3 lea CHUNKYBUFFER,A4 addq.l #1,A4 move.w #16000-1,D4 SCROLLLOOP: move.b (A4)+,(A3)+ dbra D4,SCROLLLOOP 4. repeat step 2 until space is pressed The whole thing runs at about 8.5 fps on a plain ST - which is as I think quite fast for a fullscreen fine-scroller (imagine how you would code this with planar graphics X(. but keep in mind this demo is very basic and not optimized, 'cause I wanted to keep it simple. last remarks -------------- Some words on optimizing....one way to optimize this algo a little bit is to preshift every color up 2 bits (I mean your texture or image colors or whatever) before you store them in the chunky buffer. That saves the lsl.l #2,D0 when calculating the offset in the c2p-tbl. There are even ways to let the chunky-buffer fall away completly. But believe me, it shouln't be that hard to figure it out yourself and by the way it could be a real good practice for you. Ok, I'll give you a little hint. Just write a rout that directly sets a 4 double pixel 'quad' not getting the data out of a chunky buffer... Another way of speeding it up will be to just resize the chunky buffer when your effect will not take up the whole screen. Just keep coding, you'll get it ;) What's for next time? Hmm... I'm thinking of doing something on raycasting, at least on the basics, so keep looking for my tuts, he he... hope to see you next time, .tSCc. Ray -------------------------------------------------------------------------------