15. July 2002

ZZIP-EXT/IO

Customizing the file access

The EXT/IO calls

There were quite some requests from game developers and graphics-apps developers who wanted various extensions to be included into the zziplib library, but most of them were only of specific usage. After some discussions we came up with a model to customize the zziplib library calls in a number of ways - and adding two or three arguments to the zzip_* calls. The standard zziplib library will actually call these *_ext_io functions with these extra arguments to be set to zero.

The EXT feature describes a way to customize the extensions used in the magic wrapper to find a .ZIP file. It turned out that there are quite a number of applications that did chose the zip file format as their native document format and where just the file extension had been changed. This includes file types like Quake3 ".PK3" files from ID Software, the Java library files called ".JAR", and lately the OpenOffice-6 (resp. StarOffice-6) documents which carry xml-files along. Just build a zero-termined string-list of file-extensions and submit it to the _ext_io calls to let the zziplib find parts of those zip-documents automagically.

In quite some of these cases, it is very benefical to make use of the o_modes functionality that allows to submit extra bit-options into the zziplib - this includes options like ZZIP_PREFERZIP or even ZZIP_ONLYZIP which modifies the default behaviour of looking for real files first instead of some within a zipped directory tree. Other bit-options include ZZIP_CASELESS to imitate win32-like filematching for a zipped filetree.

Other wishes on zziplib circulated around obfuscation or access to zip-files wrapped in other data areas including encrpyted resources from other applications. This has been adressed with the IO-handlers that you can explicitly submit to the *_ext_io functions - the default will be posix-IO open/read/write and seek/tell. An application using zziplib can divert these to its own set of these calls - and it only needs to declare them on opening a zipped file.

The EXT stringlist

Declaring an EXT stringlist is very simple as it is simply a list of strings, the zziplib provides you with a double-const zzip_strings_t type to help you move a global declaration into the writeonly segment of your app - it turned out that about all developers wanted just some extensions on the default and they were fine with having them global-const for their application, nothing like dynamically modifying them. Well, you are still allowed to make it fully dynamic... if you find a use case for that.

Extending the magic zip-extensions is just done by adding the additional extensions to be recognized - just remember to add the uppercased variants too since those will be needed on (unx-like) filesystems that are case-sensitive. In the internet age, quite some downloaded will appear in uppercased format since the other side declared it as that and that other end was happy with it as being a (w32-like) case-insensitive server. Therefore, it should look like

     static zzip_strings_t my_ext[] = { ".zip", ".ZIP", ".jar", ".JAR", 0 };
  

There is one frequently asked question in this area - how to open a zipped file as "test.zip/README" instead of "test/README". Other than some people might expect, the library will not find it - if you want to have that one needs a fileext list that contains the empty string - not the zero string, an empty string that is. It looks like

     static zzip_strings_t my_ext[] = { ".zip", ".ZIP", "", 0 };
  

And last not least, people want to tell the libary to not try to open a real file that lives side by side with the same path as the file path that can be matched by the zziplib. Actually, the magic wrappers were never meant to be used like - the developer should have used zzip_dir_* functions to open a zip-file and the zzip_file_* functions to read entries from that zip-file. However, the magic-wrappers look rather more familiar, and so you will find now a bit-option ZZIP_ONLYZIP that can be passed down to the _ext_io variants of the magic-wrapper calls, and a real-file will never get tested for existance. Actually, I would rather recommend that for application data the option ZZIP_PREFERZIP, so that one can enter debugging mode by unpacking the zip-file as a real directory tree in the place of the original zip.

The IO handlers

While you will find the zzip_plugin_io_t declared in the zziplib headers, you are not advised to make much assumptions about their structure. Still we gone the path of simplicity, so you can use a global static for this struct too just like one can do for the EXT-list. This again mimics the internals of zziplib. There is even a helper function zzip_init_io that will copy the zziplib internal handlers to your own handlers-set. Actually, this is barely needed since the zziplib library will not check for nulls in the plugin_io structure, all handlers must be filled, and the zziplib routines call them unconditionally - that's simply because a conditional-call will be ten times slower than an unconditional call which adds mostly just one or two cpu cycles in the place so you won't ever notice zziplib to be anywhat slower than before adding IO-handlers.

However, you better instantiate your handlers in your application and call that zzip_init_io on that instance to have everything filled, only then modify the entry you actually wish to have modified. For obfuscation this will mostly be just the read() routine. But one can also use IO-handlers to wrap zip-files into another data part for which one (also) wants to modify the open/close routines as well.

Therefore, you can modify your normal stdio code to start using zipped files by exchaning the fopen/fread/fclose calls by their magic counterparts, i.e.

    // FILE* file = fopen ("test/README", "rb");
    ZZIP_FILE* file = zzip_fopen ("test/README", "rb");
    // while (0 < fread (buffer, 1, buflen, file))) 
    while (0 < zzip_fread (buffer, 1, buflen, file)))
       { do something }
    // fclose (file);
    zzip_fclose (file);
  

and you then need to prefix this code with some additional code to support your own EXT/IO set, so the code will finally look like

    /* use .DAT extension to find some files */
    static zzip_strings_t ext[] = { ".dat", ".DAT", "", 0 }; 
    /* add obfuscation routine - see zzxorcat.c examples */
    static zzip_plugin_io_t io;
    zzip_init_io (& io, 0);
    io.read = xor_read; 
    /* and the rest of the code, just as above, but with ext/io */
    ZZIP_FILE* file = zzip_open_ext_io ("test/README", O_RDONLY|O_BINARY,
                                        ZZIP_ONLYZIP|ZZIP_CASELESS, ext, io);
    while (0 < zzip_fread (buffer, 1, buflen, file)))
        { do something }
    zzip_fclose (file);
  

Finally

What's more to it? Well, if you have some ideas then please mail me about it - don't worry, I'll probably reject it to be part of the standard zziplib dll, but perhaps it is worth to be added as a configure option and can help others later, and even more perhaps it can be somehow generalized just as the ext/io features have been generalized now. In most respects, this ext/io did not add much code to the zziplib - the posix-calls in the implemenation, like "read(file)" were simply exchanged with "zip->io->read(file)", and the old "zzip_open(name,mode)" call is split up - the old entry still persists but directly calls "zzip_open_ext_io(name,mode,0,0,0)" which has the old implementation code with just one addition: when the ZIP_FILE handle is created, it uses the transferred io-handlers (or the default ones if io==0), and initialized the io-member of that structure for usage within the zzip_read calls.

This adds just a few bytes to the libs and just consumes additional cpu cycles that can be rightfully called to be negligable (unlike most commerical vendors will tell you when they indeed want to tell you that for soooo many new features you have to pay a price). It makes for greater variability without adding fatness to the core in the default case, this is truly efficient I'd say. Well, call this a German desease :-)=) ... and again, if you have another idea, write today... or next week.