Saturday, September 15, 2012

Clock Tower: A Road to Dejapanesification [Part 2: Alone in the Dark]

All alone. Wandering down dark hallways in a dark world, amassed with document after lengthy document full to overflowing of technical jargon and tables and numbers.

Where does one start to comprehend?


There is, contrary to common misconception, a vast difference between a n00b and a newb. n00bs are simply those dreamers with delusional aspirations, who are unwilling to put the effort into sticking at something until they 'get it'. So often you find them milling around on countless forums whining 'wot do I dooo? wea do I goooo??? HEEELLLLP', often followed up by someone giving them step by step instructions on how to use Google.
Newbs, on the other hand, are much smarter. They already know how to use Google, and best of all, they get started. They start actually doing things.

Starting is therefore the hardest step in almost any difficult task. But its also the most important step, because if you haven't started, its not likely that you're going to get much else done either, except the dreaming. But by starting, sooner or later you're going to amass enough knowledge of the fact that you're steering down the wrong path, and after a couple more false starts you should, with enough luck, converge on the right one.

So here is how I started.

Having a little knowledge of game programming as well as computers and their inner workings, I knew that first off I needed to know a little bit of how the PS1 worked. The storage media for the game data in this case is a cd-rom. I had little idea of how data on CDs were read or arranged. After obtaining a cue/bin image of the Clock Tower cd, I mounted it using Daemon Tools, and had a look at the contents.

Figure 1. directory layout of Clock Tower cd.

Boom. Right off the bat we have a little assortment of folders all with relatively descriptive names. That's nice, obviously not something you'd get if you were hacking a GBA rom, for example, which is just one big hunk of data referenced by offset (address number).
From just looking at the directory names, we can make a couple of assumptions about what sort of files or data they hold. For example, the BG# folders probably have some sort of background graphics in them, and probably not background music since there is already a folder called "SOUND".
Figure 2. The Ritual Room
Figure 3. Director? Producer? ihavenoclue XP
Opening up one of the BG folders I found a heap of .TIM files, not a file extension I'd come across before. A quick search in Google revealed that they were the common extension name for a type of image format widely used in PS1 games. I also found a link to a nifty little program called PSX MC that could view these image files, as well as play the .SEQ music files in the 'SOUND' folder. A quick flick through the image files revealed that most of the levels were comprised of single images, which while didn't really have anything to do with the purposes of a translation hack, was comforting to know that I wouldn't have to be rummaging around in some complex level format for pointers to text or something.
I also found out a couple of other interesting things, for example (hazarding a guess based on my rudimentary knowledge of Chinese characters) the credits appear to be images of text so translating them would simply require replacing the text in those images.
On a side note, in the folder called DA1, I found a suspicious music file called LIKE55.WAV and throughout the time I was listening to it I had plastered on my face an expression something like == so sincerely hoping its not the credits song or something like that.

Anyway, we now have a basic idea of how some of the in-game media is stored. How about the dialogue? Perhaps the first thing most people would do is start opening up files in a hex editor and start looking for text from the dialog. However I faced a bit of a dilemma, that is I didn't know how to read, and therefore write, Japanese in order to search for it. If a standard text encoding had been used, it either have to be some variant or subset of Unicode, which wouldn't be very likely since the encoding consisted of more than 100000 characters and I was quite sure it wasn't a widely used standard in Japan for encoding Japanese. The other option was Shift JIS which is a more commonly used encoding in Japan, if I have got my facts right.
Anyhow, there wasn't much point in trying to do any searches just yet. I needed to find out where they got their font from so that I could know how to change it to use English letters. There was little chance that the developers would have spent the time making an entire Japanese font for their game, what with all the Kanji they'd need, though there was a possibility they could have made a font for the subset of characters that they would use exclusively for the game. For the moment I was potentially ruling out that possibility because of the inconvenience that would have posed for the developers.

What I needed to do then was to somehow trace the code that was used to access the needed characters, and hopefully that would lead me to the place where it was stored. For this I would need some sort of debugger in order to log the machine instructions executed by the PSX (unofficial abbreviation for the PS1 console) at various times. Unsurprisingly I found a couple of debuggers built on top of the PCSX emulator. These would allow me to check for reads and writes to various places in memory. One of these, called Agemo, turned out to be an extremely nifty thing. Its main functionality includes the setting of breakpoints in the code when a certain area in memory (only 2MB of RAM!) is read or overwritten. For those that don't know, a breakpoint is something used when debugging a program, it pauses execution of the program when a specific place in the code is reached or a specific event happens. When the execution is paused, we can then dump the state of memory at that particular point in time, or we can 'step' through the following instructions of the process. The emulator is able to do this since it emulates a PSX processor, and can directly control its execution.
This was very cool and all, but it wasn't as I could just start dumping logs of assembly code and start rifling through them. Assembly code is a sort of extremely watered down version of your general programming language. Its probably the lowest level language you'd come across before you get into the realms of binary code, which the average human wouldn't sensibly try to comprehend. By low level, I mean that the assembly 'instructions' or operations almost directly reflect the actions taken by the processor itself. I don't quite mean on an electronic level, but more a conceptual level. There are instructions for adding, subtracting, loading and storing data, and different ways of manipulating bit-patterns, which are sequences of binary digits. There are also instructions for 'jumping' or 'branching' to different locations in the code, and since we don't truly have this high level concept of functions, we need to be able to jump to specific addresses in memory in order to repeat tasks or as the result of making decisions.
To give you a taster of the assembly language used by the processor in the PSX:

800158c4 : BEQ     00000082 (v0), 00000000 (r0), 80015a14
800158c8 : ADDU    00000154 (s3), 00000000 (r0), 00000000 (r0)
800158cc : ANDI    00000000 (s5), 00000001 (a3), ffff (65535)
800158d0 : ADDIU   00000000 (s7), 801ffe48 (sp), 0010 (16)
800158d4 : LUI     00000000 (s6), 8004 (32772)

Yes I was also utterly confuzzled when I first came across this. This is a sample of the logged output of the agemo debugger. It nicely provides the address of the instruction on the left, followed by the name of the instruction, and its parameters and their values.
The values are all in hex which is simply a base-16 counting system using the digits 0-9 and the letters A-F. To express a single hexadecimal digit would require 4 binary digits therefore its a very compact and nice way to express bit patterns and data values.
Reading these instruction sequentially from the top, we have:
BEQ    v0    r0    80015a14
This is called a 'pseudo-instruction' which is basically a substitute for a commonly used sequence of basic instructions, in this case for carrying out the task of checking whether the value in the register v0 is equal to to that in r0, and jumping to the address 80015a14 in memory if it is. A register is simply a piece of memory in the processor that holds a bit pattern, and in the case of this processor, a 32-bit one. The jumping effect is actually delayed a single machine cycle (so even if the equality is true, the next instruction down is executed, and only then the jump made), but I won't go any more into that.
In the log sample above, the values of each of the registers at the time of the execution of each instruction are also logged, which makes it possible to trace where specific values came from.
The next instruction adds the value in register r0 to that in r0, i.e. it doubles the value of r0 and stores the result in s3. The next instruction performs a bitwise AND operation on the value in a3 with the value FFFF. A bitwise operation compares the same order bits of each value and places a 1 in the same position of the result if both bits are 1s. For example, the value in a3 is a 1, which in 32-bit binary is essentially 00000000000000000000000000000001, and FFFF is 1111111111111111. So the result will have a 1 in the low order bit, and all the other bits 0, just like the value in a3 at the moment. The result is stored in s5.
The next instruction adds the hex value 0x16 (I'll use the 0x prefix from now on to denote a hex value) to the value in the register sp. The result is stored in s7.
Finally the instruction LUI stores the 2-byte constant 0x8004 in the high order bytes of s6, and replaces the low order bytes with 0x0000. The resulting value passed to s6 is 0x80040000. For those who aren't familiar with it, a byte is an 8-bit binary sequence, which can be expressed using 2 digits in hex, e.g. the largest value able to be expressed by a single byte is the value 255 in the decimal system, which is 0xFF in hex and 11111111 in binary.

Anyway, I won't go too indepth about this because its a topic complex enough to write a textbook about.
But suffice to say that I found out somewhere that the processor in the PSX was extremely similar to a MIPS processor, so that meant that if I learned how MIPS assembly language worked, I would be able to understand the output of the debugger. Consequently I spent about a week reading through half of the MIPS assembly language tutorial on this site:
http://chortle.ccsu.edu/AssemblyTutorial/TutorialContents.html
This tutorial is superb, and was actually all I needed in order to be able to read through the logged output of the debugger actually knowing what each line meant.
However, one does not simply 'read' through a log of assembly instructions from cover to cover. There are a lot of lines of code there, and as I found out, between the mere drawing of two frames, over 500000 instructions were executed.
I had to be a little more clever.

And that's it for Part 2. In the next part I'll apologize profusely for this cliffhanger ending, and I'll also go over how I ended up finding where the font was stored, and how that affected my sanity.

Stay tyoond ;)

Friday, September 14, 2012

Clock Tower: A Road to Dejapanesification [Part 1: A Presentiment]

Before I commence the insurmountable task of recording this epic novel, I would just like to point out that the Sunlight shining through the gap in my curtains is simply otherworldly. Its golden ethereality illuminates a drifting trail of dust, transmuting it into a lazily floating cloud of celestial glitter. Accompanied by the serene tones of music iconic to Balamb Garden in a world so far away, and behind that the busy and muffled vroom of the vacuum cleaner, this enchanting scene evokes a strikingly nostalgic moment transporting me somewhere far back in the unimaginable folds of time.

I only need to draw back the curtains a fraction more to utterly destroy this little magical setting and remind me that it is indeed 5 o'clock on a Thursday afternoon, and most disappointingly my dusty windows are actually the main cause of the lighting setup, artificially enhancing the colourizing effect of the weak pre-summer twilight on the general haze of dust particles pervading my room. All this and I have four assignments I'm meant to be working on. Great.

Ah, procrastination. Why art thou so enticing?

Casting a world-wearied glance outside, I then yank the curtains shut. It goes dark, and behind me, the music changes.
A low, uniform electronic tone.
My heart skips a beat. Then nothing.

It plays again, higher this time. Chilling me to the bone.

And again. Pertinent. Unforgiving.

And again.


Clock Tower.
I know it even before the piece suddenly splits into a series of steady shrieking tones evoking an image of being chased down dark hallways by an invisible psycho killer. At the end of each hallway, lies either your doom, or a hair's breadth escape, for the present.

I absolutely love this game. I discovered it a while back, while reading some obscure forum post discussing scary games. I'd played games like Amnesia: The Dark Descent and Call of Cthulhu: Dark Corners of the Earth, and was looking for something more. Something tastefully creepy. As such I was very interested on coming across a comment about an obscure 2D point-and-click horror game by a company called Human Entertainment for the SNES long ago that was considered by many to be the platform's scariest. On further research I learned that the game had never been localized for release in the West, however the SNES version had received an English fan-translation.
For those who haven't heard of this under-discovered gem, the version I'm talking about is the very first one in the series, originally released for the SNES console way back in the dark ages of 1995.
And one dark game it is. The story follows the tribulations of a young orphan, Jennifer after she is adopted into the Barrows family. On arriving at the Barrow's mansion, she and her adoptee sisters discover a very unexpected fate waiting for them. For me, the icon of the game is the main antagonist, Bobby, a deranged and deformed killer child who stalks his prey throughout the mansion wielding a giant pair of shears. The scary scenes in the game are mainly comprised of 'chase scenes' in which Bobby chases Jennifer while making snipping noises with his shears as she frantically tries to find a suitable place to hide. Although a very short game with a playthrough time averaging under an hour, this is actually one of its strengths. This is because of the multiple endings that are obtainable by taking different actions throughout the story. Playing the game through at least nine times in order to get all nine endings isn't as tedious as it sounds. Because the story is so short, the actions that you take differ quite a lot between different endings.
The game was apparently pretty popular in Japan because it got a port to the PlayStation 1 in 1997, and two more after that, for the Wonderswan and PC in 1999. More recently, a company called Artimatica, under the direction of a guy named Chris Darril, have taken it upon themselves to develop a tributary re-imagining of the original game, which they titled Remothered. I was initially very excited to hear this, but after a while I heard something that made me a little skeptical, since it seems like they're ditching the original point-and-click mechanic for a free-style 3D control system like that of Clock Tower 3 and Haunting Ground, both 3D horror games for the PS2 related to the series. They share vaguely similar gameplay mechanics to the original game. Its not that I'm old fashioned but its just that I truly believe that the old system really works. Some people may think that point and click is something of the past, but for me it really made the game. Frantically clicking on things during a chase scene to find something to use in order to save your skin created tension in a unique way. Sure, it wasn't immersive in the sense that you could feel a direct connection with the player, but this slight break in the connection really helped keep you on edge in a cinematic sense. You are forced to watch helplessly as the main character pays for her mistakes that you somehow influenced her to make. And on the other hand, what really destroys the realism in 3D games is that it sometimes feels as if you have too much control. You can make the character do funny things. Like face-planting in walls. Spooky.

Anyway, I could go on forever singing Clock Tower's praises but that isn't quite what we're here for. For all my superfanboyism I admit I have only got around to playing the SNES version. The reason for this is simply because it has an English translation available. Only the SNES and PC versions got fan-translations, and while the PC version is rare, the SNES version is the most widely played. Upon hearing that the PS1 version (officially subtitled The First Fear) had some additional content, I was disappointed upon discovering that it had yet to have a translation hack made of it.

And so I've decided to take up the challenge of making a translation hack of it myself.

Ah, romhacking. Ever since I made a bet with my older sister when I was little that I would be able to find some way to change various stuff in games, I have been fascinated by the skills of people who have managed to hack games and even make whole new games out of existing ones. Way back I did get into some modding for a PC game called Age of Mythology, but absolutely nothing requiring too much brain power. So I think that perhaps this is a better time to start than never! So while I absolutely can't read a word of Japanese, I know someone who can, and that someone amazingly agreed to help with the hack, so bingo, that's the first step sorted.
Get help, Check.

Anyway, I think this will do nicely for an introduction, so basically what I'm going to do in each of these posts is describe the process of my learning how to make a translation hack of a PS1 game step by step of the way, using Clock Tower: The First Fear as a kind of case study. I'll write exactly what my thoughts were at key points in the process and point out various helpful resources for those who might be interested in getting into the same thing. This will not, however, be a lesson in basic computer science, so just a word of warning for those not particularly interested in such things: this will get pretty technical pretty fast.

And for those of you in for the ride, hold on tight 8)