The Answer to the DRM questions for Where in the world is Carmen Sandiego? Enhanced (DOS, 1990) are, in no particular order:
23
Kent
dragon
calcium
1796
Warren
revenue
1792
Willard
1937
Crater
Tanzania
Hartford
Duluth
London
Gem
Silent
squeaker
The Answer to the DRM questions for Where in the world is Carmen Sandiego? Enhanced (DOS, 1990) are, in no particular order:
23
Kent
dragon
calcium
1796
Warren
revenue
1792
Willard
1937
Crater
Tanzania
Hartford
Duluth
London
Gem
Silent
squeaker
@foone Garbage decompiler that doesn't choose suitable type for synthesized locals and vomits redundant integer type casts...
if ((0x80 >> ((byte)local_4 & 7) &
(int)(char)*(byte *)((int)((int *)param_1 + 1) + (local_4 >> 3))) != 0) {
COULD YOU USE SOME MORE CASTS MAYBE?
@foone Visualizations of memory contents is a vastly unexplored area. I suspect you could do automated statistical analysis of patterns with some way to display the output and seriously accelerate reverse engineering.
that might not be TOO hard to hack in, hmm.
I'm stepping through a high-level loading routine I don't understand yet, trying to figure out when it decompresses an image by watching the RAM it uses for file loading and decompression and spotting when the image appears
tool that'd really be handy right now:
a "live" version of binxelview, so I can step through the DOSBox-x debugger and see how memory is changing in real time, as an image.
and then 13 will be:
b'\x02asked questions about Shinto rituals\x00said\x81was researching an archipelago\x00'
so when it sets up a city that has hints to lead to Tokyo, it picks 3 of these sets of questions, then picks a question in each set.
So like, the 12 chunk for Tokyo says:
b'\x05asked about the exchange rate for yen\x00was practicing Japanese characters\x00said\x81planned to take photographs of Mount Fuji\x00asked about tours of the Imperial Palace\x00was interested in visiting Shinto shrines\x00'
So it picks from one of those 5 options
okay now that I can decode the chunks (well, most of them) I can identify a lot more of them:
00 Name and (some other info)
01 ???
02 Image
03 City descriptions
04 Items to steal
10 ???
11&up: Hints leading here
It starts with \x03 to indicate there's three strings: then it describes the city three times. at runtime it uses select_string function with a random input to select one of the three strings
it was a trivial off-by-one error.
I was doing saved_byte=input[3]
but while I needed the 3rd byte, that's at input[2]
yess!
C:\DOSBox-X\drive_c\carmen\py>python datfile.py cities.dat --dump=12803 --decompress
"\x03Sydney, with a population of more than 3.3 million people, is Australia's largest city. A well-known sight is Sydney's distinctively designed Opera House\x00An island continent, Australia is nearly as large as the United States but has only one-fifteenth the population\x00The capital of Australia is Canberra, located in the southeast corner of the country between Sydney and Melbourne\x00"
I somehow confused the dosbox-x debugger into not accepting letters anymore
that's supposed to read:
"\x03Lima is Peru's capital and largest city. A well-known landmark is the Archbishop's Palace, a reminder of Peru's colonial past\x00Peru is slightly smaller than Alaska and is bordered by Ecuador, Colombia, Brazil, Bolivia and Chile\x00Peru, once the center of the mighty Incan Empire, is a rugged land dominated by the Andes Mountains. Forests and jungles cover half its land area\x00"
"vs ses oa is isgit's tc eital and largest t u anhtA ttggh os nnotosnhrdsmarosogdn ss drte tishoth's isdhsceohtsnthminder of isgit's t nuorhdhtpast\x00 geru is slightltsn oaller than ndhd na and is o nnsgtgstbtst oa dotlalssaaolootbiaoht Sal gh, sonuhvia and sl ghh\x00isgit, ontvdn ss nhsiaalgarsnadlfnaatawlarst oadrlhrs i is a rugged land dooousr'casrbhe nrdsgs fountainsnht iah"
I mean, it's not 100% wrong, but it's not right either
but I've got the predefined table, an input file, an output file, and now I need to write some python code to replicate this, hopefully without crying
the predefined table starts with NUL, space, then:
aetonisrdlhugfcwypbmk,vSA.T'PMxBCIRGDWHqE-zNFKL0j:51YJ8\U?73Q;2!469
\r\nOVXZ()*+"#$%&<=>/@[]^_`
given that the most comment symbols are near the beginning, this is presumably a sort of lazy huffman coding
it's some kind of shifting bit mask but it starts at encoding values in 4 bits, then it can increase (or decrease, I guess) based on the input stream.
then it has an output filter, where if the number specified wasn't 8 bits, it's actually an index into a predefined text table
I think this compression is specifically designed for ASCII text, which is annoying because they've also got compressed images... which probably use a DIFFERENT COMPRESSION!
it looks like this chunk has length 256, which means 253 usable bytes, and it expands to 374 bytes.
Not the greatest compression. a little better than just doing 6-bit ASCII.
well I found the decompression method.
as always, I hate it. decompression routines are probably my least favorite thing to reverse engineer
oh it's because ghidra's near/far pointer support is shit.
I had param2 defined as a byte*32 and it was casting it to a byte* before using it
if I define it as byte* and let the calling convention implicitly define it as 32bit, it doesn't do the cast
@dalias @foone This is something I'd love to see more exploration of. Especially with changes over time. Scanlime had that "temporal hex dump" tool but that's only scratching the surface of what could be done UI wise.
I would love a tool integrating with ngscopeclient that lets you look at e.g. QSPI bus activity in a graphical manner to somehow visualize access patterns and understand how a boot image is constructed by dynamic analysis.
I think this might be the GUI system doing a screenshot of the image under a window, so it can restore it at the end. And it still does that here, even though we'll never need to restore that image: we're about to overwrite it
Here's what I want a tool to do:
I hit a breakpoint in the debugger, I turn it on, set another breakpoint, and hit go.
between those two breakpoints, every time a CALL instruction is hit, it dumps my selected memory region. If it's identical to the last dump, it's ignored.
At the end, each dump is rendered as an image, and the combined set are an animation I can scroll through.
I'm in Paris, I look at work ram, I see the image of the Eiffel. I head to Rome, and before I load the next image, I can see that the Eiffle tower in workram now has the wrong stride.
That's odd, because it means it had to rewrite the image in memory, the image it's about to unload.
ooh, I'd also need to be able to watch multiple address ranges at once. that'd be sweet, multiple windows of visibility into RAM
sadly DOSBox-X's memory breakpoints don't let you set up a breakpoint that covers a whole 64k. you only get one byte. A shame.
@foone Nice?
well the good news is that I think I've found the decompress_image function. the bad news is that now I have to reverse engineer it :(
it's currently doing the obvious thing for a decompressor to do:
write the byte 04 every 69 bytes
wait no, the colors are wrong... I bet I'm seeing it decompress the binary, but that's using the full width of the bytes. it then gets expanded out to a 16-color image.
wait is this image format vertically interlaced!?
It loads the half-width version, then a few functions later, it's been replaced with a full-width version.
Strange!
I think they're just trying to keep their RAM usage down by not having both halves in memory at once
it's in a function I already found, temporarily named "blit_related".
I guess they don't decode the image until RIGHT before it needs to go up on the screen!
if definitely decompresses and then blits the image as two parts, which aren't evenly sized, and it starts from the bottom
GOT YOU, YOU SON OF A BITCH! I FOUND YOU.
I need a higher order debugger. I'm doing too much shit manually
worst thing that could happen just happened:
I just realized the portable Where in the World is Carmen Sandiego? is based on the same version I'm hacking, meaning it's in-scope for me to get this, dump the ROM, and compare.
That just increased the cost and complexity of this project by bunch
being able to do a textual find-replace on VRAM is a weird but occasionally useful ability
I find-replaced the background from palette entry 0 to palette entry C:
Now I can confirm how big this image is. Previously it was set into a black background, which made it harder
I don't think there's any reason why this would support SVGA. It always use 320x200 at a maximum of 256 colors. VGA is more than enough to handle it
PUSH ES
PUSH AX
RETF
why must you hurt me, carmen sandiego?
I'm confused by the graphics detection routines. I thought it was returning 0 for "no graphics" or something, but it turns out 0 means MCGA.
So the GraphicsMode enum goes:
0: MCGA
1: CGA
2: Hercules
3: EGA
4: Tandy
5: VGA
6: ???
I think that says that it doesn't matter much. The biggest error is in the biggest distances, which are all saturated to the max of 7-hours anyway.
here's all 30 city locations:https://gist.github.com/foone/0992517879877e0e995259d08a0941a7
it's currently way too 6am to do more calculations, though. I'll do that tomorrow
Good news: @modulusshift did the calculations for me!
also, it's the 90s, I can afford a sqrt().
I should fix it up for my version.
or use a squared lookup table. you could do this REAL easy by making it a table search: there's only 6 possible results: 2,3,4,5,6,7. each entry in the lookup table contains the maximum squared distance that can generate that number of hours
I finally figured out how it calculates travel times.
It's the difference in X coordinate between the two cities, plus the difference between the Y coordinate, plus one.
that quantity divided by 40, then has 2 added. if the result is over 7, it's set to 7.
Weird! that's not how you measure distance, Carmen.
TODO: plot all the distances between all 30 cities and compare how inɐccurate this mess is
patching 0x148C9 in the EXE to 90 90 will stop the clock advancing, so you now have Infinite Time to catch the culprit
why do they store the day of the week as a 16bit int?
future proofing in case the calendar gets updated and has more than 256 days in the week?
I accidentally applied a patch backwards and put the detective to sleep, forever.
They're in Rome and they've just slept through about two months of nothing
heh. I was checking different near-death animations by overriding the randomness, so I had to tell my debugger to set AX to 0
guess which animation that is? The one with the AXe.
I did a little looking into the contents of MIDISND.DAT
It's got 12 small tracks, and each of them is a valid MIDI file if you remove the first byte.
no, this should be working. Hmm. Maybe they just missed one of the four images?
nope! it's fully functional, based on the system date.
a fun kind of reverse engineering tactic that I practice probably more than I should is a version of The Scream Test (which is the principle that the easiest way to find who "owns" a server is to turn it off and see who screams): if you don't know what some code does, break it. and see what screams.
I think I may have found unused graphics for a feature that'd change the Acme Detective Agency at the beginning to be season-specific. There's summer, fall, winter, and spring variants, but the game seems to be hardcoded to summer
okay don't change that byte, GOT IT.
I think I failed to load the cursor, which caused it to corrupt the mouse cursor catastrophically
note to self:
maybe do it for everything MS-DOS.
it's like a Super Game Boy, but for your PC! Plug in this extra hardware, and now your system is compatible with a ton more software!
note to self: figure out how Ghidra fidb works, so I can apply it to MSC5.1 (which was sadly overlooked by the developers of ghidra)
I'm gonna build an m.2 addon that's just a drop in x86 coprocessor. I know a lot of computers that could use an x86 processor these days.
darn. Compiler Explorer doesn't support MS C Compiler 5.1 from 1988. Guess I gotta spin up an emulator again
the annoying thing is that MS C Compiler 5.1 is the most mundane-ass DOS application. If I had a 32bit windows install rather than 64bit, it would probably just run natively on my system
oh thank god, that was a bit of confusion from manually tracking stack frames.
it actually LoadDatFile, which makes a HELL of a lot more sense
what the fuck do you mean that carmen.dat is opened on the first call to finish_draw_maybe()?
like, I know there's a "maybe" in that name, but it's not THAT big of a maybe.
ahh, now that I've looked it up, it seems I was wrong!
closing isn't 3D, that's 3E! 3D is open!
no wonder I couldn't remember it, I had it confused with another call
looking it up took less than 10 seconds, but that's 10 seconds I'll never get back.
man, running on 4 hours of sleep is killing me.
I can't even remember the MS-DOS interrupt to open a file!
I know reading it is int 21 ah=3f, closing it is int 21 ah=3d, and I'll never forget that seeking is int 21 ah=42, but how do you open a file?
I mean, not the int 21 ax=6c00 way, that one is only for DOS 4.0+, and obviously a game released in 1990 isn't gonna use that.
The three fonts
font_alloc = malloc(local_a);
if (font_alloc == (void *)0x0) {
font_alloc = (void *)0x0;
}
Ahh yes. remember, if you get a null pointer back from malloc(), make sure to set that variable to NULL so it won't be left as... NULL?
The game loads the BoldFont first, then the SmallFont, then the NormalFont.
Annoyingly this isn't how they're laid out in memory:
It's SmallFont, then BoldFont, then NormalFont
Weirdly, swapping the NormalFont for the SmallFont causes the printer text to be VERTICAL, for reasons I do not remotely understand!
It has a surprisingly robust UI engine. I swapped from BoldFont to SmalFont and the menu adapted perfectly.
(the number of mountain climbing hints is 3, by the way)
I think I accidentally hacked my debugger
the only problem with using Ghidra to hack children's games instead of, like, Serious Things like firmwares or malware or whatever, is sometimes you have to make a label named NUM_MOUNTAIN_CLIMBING_HINTS
Ahh, it's using a different version of the DrawFont call: DrawFontN
You see this little About dialog box? Guess how many times the DrawText function is called?
Once! and just to draw "Where in the World is Carmen Sandiego?".
The rest of the text is draw elsewhere, and I have no idea why.
correction: it calls it once to draw "Where in the World is Carmen Sandiego?" but that's unrelated to the one on screen WHAT?
I hate dealing with the internals of memory allocation systems. I prefer to leave that to smarter people than me
I think the memory allocation system here is that every malloc returns 2 extra bytes, which is a pointer to the previous block.
unless it's an odd number, in which case it's a free block. and pointer to the previous block, once you make it even again
yeah, doom did that too, but Doom was a 2.5D image that had to do pseudo-raycasting.
THIS GAME DOES NOT
it allocates a 1024 byte buffer, then makes a pointer to the end of it, minus -0x42?
why would you need a link to the end of a new, freshly cleared buffer, minus 62?
oh sweet jesus, that's the left two pixels of the image.
it's loading the image vertically!
at least it's top to bottom.
I did not realize they implemented a file browser in this program! I only found it by hiding all the DAT files from the EXE, to see if it'd ask me to put in floppies in.
I found two different copies of the disk images, in different places.
both are imaged off a 3.5" disk version, which of course comes on only one (double density, 720kb) disk!
That version has no installer. Just the usual files (and a "DESKTOPD.CFG" file that I don't understand)
I should just check. I'm sure disk images can be tracked down in places.
the video and audio detection seems to be excellent, by the way. it just silently figures it out, without asking questions or requiring special arguments or configuration.
Perfect for a game aimed at the little childrens.
This game autodetects everything (video and audio modes) and you can install it by just doing "copy A:*.* C:\CARMEN" on each disk, so I don't think they would have needed a fancy installer.
I am currently, as in this very thread, reverse engineering Carmen Sandiego Enhanced (1990, DOS)!
I've seen the code that asks for you to put in the other disk! And it only asks for DISK1 and DISK2!
just looking at the files, not the code (and not having seen original disk images yet that I can recall), I bet the answer is that they put CITIES.DAT on DISK2.
the whole game - cities.dat is ~300kb, with cities.dat being 168kb.
They could do the whole game - carmen.dat and cities.dat in only 200kb, which'd give them 160kb (luxury!) for a fancy installer.
also after all this wondering about "how many disks does Carmen Sandiego Enhanced (1990, DOS) come on?" is even sillier because I ALREADY KNEW THE ANSWER, I JUST FORGOT I KNEW IT
is it gonna matter? not in the slightest (assuming there's no format-mismatching, which their shouldn't be: these are all the same density of disks, I think).
The PC doesn't check for a notch there, so it won't notice either.
It's just funny because this is, like, technically wrong?. These aren't PC disks, but the difference doesn't matter, so why not?
It probably saved them a decent amount of money because of bulk discounts and inventory simplicity.
here's why they shipped it on a double-notched disk anyway:
Broderbund was releasing games on a bunch of other systems that DID have single-sided drives. For simplicity they just bought Xty-thousand double-notched disks
The answer for "what's wrong with these floppies?" is that they're double-notched. That's needed for double-sided disks... on systems which have single-sided drives!
The PC has basically always been double-sided, so they only need one notch, on the top/a side.
Secondly, the TurboGrafx-16 version didn't even THINK about using the same GUI!
I don't want to go through a million platforms but all the other ports of this game tweaked some art here and there or put in different location-photos, but all of them have the same basic tall-window-on-the-left, smaller-window-in-the-top-right, four-buttons-in-lower-right design
I happened to look at mobygames, and noticed two interesting things.
First, the Mac version is very similar to the DOS version, other than the expected changes you'd get from it being on a monochrome system with a GUI.
But wow, that's a completely different font! Is that built into macs or something? (EDIT: @amr confirms it is)
(also, the dialogue box is top-aligned. DOS bottom-aligns them)
ARG, they mislabeled this.
Admittedly, this isn't really their fault, this is confusing shit.
This is the 1992 Where in the World Is Carmen Sandiego? Deluxe, not the 1990 Where in the World Is Carmen Sandiego? Enhanced.
okay I finally found a boxed copy of the Enhanced 1990 DOS edition. (confusingly labeled the 1993 edition)
It comes on two 5.25" disks: presumably double-density, so that's 720kb in total.
Floppy Disk Pop Quiz: What's weird about these floppies, specifically given that this is MS-DOS version?
The copy of this game I have is approximately 470kb! You could fit this on ONE disk!
WAIT HOLD ALL THE PHONES.
Here's a photo from a MS-DOS version. It does that thing some companies (like Sierra) did back in the day, and included both 3.5" and 5.25" disks in the package.
BUT WHY ARE THERE SO MANY DISKS?
9000:8006 9a d7 05 b7 1f CALLF SUB_2000_0147
Hey ghidra I can read the machine code. That's CALL FAR 1fb7:05d7, not CALL FAR 2000:0147! WHY ARE YOU CONFUSED BY THIS?
I guess I could test it anyway. Move it to 9000:8000 and see what breaks. (like everything)
so the program has three main code segments, as it has approximately 111kb of code
The problem is that ghidra gets confused when the relative addresses are too big.
so the first one is at 1000:0000 and the second was at 1fb7:0009. I moved it to 5000:7000, and the second segment seems to be working fine now.
the problem is that I was only able to do that because the segment is only 82a7h long. the first segment, the 1000:0000 one, is FB79 long. So I can't just move it so it's in the middle of a segment, since it'll end up spanning into the next 64k chunk, which is where ghidra fucks up
this isn't supposed to be possible but apparently it is
THE MEMORY ADDRESSES ARE OVERLAPPING AGAIN
okay I reverted back to my old mapping, then created a new memory mapping: I made up some bytes at 2000:xxxx where it incorrectly thinks it's going, and set up a JMP $CORRECT_ADDRESS there by editing the bytes, then telling Ghidra it's a thunk.
this is deeply stupid but it appears to mostly work
ugh. I pulled the thread to try and remap the memory to avoid ghidra disassembling it wrong, and it keeps getting worse. this is a mess.
looks like GameBlaster (GBLAST) has extra options, so you can do like GBLAST260 to set the IO addr
stdsnd can also be written as stdsnd! which does something different. What? I have no idea.
I would say "especially if they're on a network!" but... this program is from 1990. Not many schools had networks in '90.
the other argument you can pass is ROSTER=$FILENAME
This lets you reset which file it uses for the list of registered players, setting it to something other than the default ACME.DAT
Not mentioned in the manual, but I can see how that might be useful for schools and such
I'm an idiot, this isn't a driver check... it's an argv check!
you can pass "ega" or "vga" or whatever to carmen.exe to select those types.
Anyway it seems it doesn't have a VideoDetect function, it's a DriverDetect function, since it's used for sound too.
First it goes through the video drivers in the following order:
VGA, TGA, EGA, HGA, HERC, and CGA.
Then it goes into the audio drivers:
stdsnd, adlib, covox, gblast, ibmg, sblast, tandy.
stdsnd is pc speaker,
adlib is adlib, covox is the speech thing, gblast is game blaster, most likely, ibmg is... I'm not sure. The PS-1 Audio card?
sblash is soundblaster and tandy is tandy 3-voice
similar things in the test.com file. I moved stuff around in the memory map and it's not erroring now. I've probably created endless glitches elsewhere though
I think this saves one byte?
a call FAR absolute would be 5 bytes for the call, whereas push CS + call NEAR is 3+1 bytes
I might have to make a NASM test case. This could be Ghidra fucking up at decoding this one instruction
eww. They're using the NEAR version of CALL to call a FAR procedure.
You might say "wait, won't that break when it tries to do RETF?" and yes, it would, unless they manually do PUSH CS before they call it!
the +3 is because E8 5D F7 is 3 bytes, and it goes off the address of the next instruction
Ghidra even recognizes there's a function at 1fb7:000A! It's called VideoDetect
So I've got code at 17DA:08AA, which is E8 5D F7. DOSBox decodes that as CALL 000A.
Manually decoding it myself, it should be a relative jump, and it's a jump to $-0x8a3. following the jump it ends up at 17DA:000A.
BUT GHIDRA thinks this code is at 1fb7:08aa, and it decodes it as call SUB_2000_fb7a, which doesn't exist.
I'm not sure how (0x08aa+3)-0x8a3 = 2000:fb7a. Something weird is going on. Why is the number BIGGER?
GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.
All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.