There are various descriptions of the CPI file format around the web; this is my attempt at one. The structure names and definitions used are based on those in Andries Brouwer's format documentation. These, in turn, appear to originate from the MS-DOS Programmer's Reference (my copy is for MS-DOS 5: ISBN 1-55615-329-5).
CPI files are used to store fonts allowing devices to display in multiple codepages. They can refer either to screen fonts, or printer fonts. Screen CPI files can hold one or more fonts per codepage - usually, at 8x16, 8x14 and 8x8 sizes. DRDOS screen codepage files also contain an 8x6 font (actually 6x6, but the file headers all say 8x6) which is used by ViewMAX screen drivers.
According to this blog comment by Larry Osterman, one of the developers of MSDOS, NLS functions were ported to PC-DOS by IBM from their mainframe systems. Presumably this included codepages, in which case the CPI file format may be derived from a mainframe file format.
There are three main CPI format variants -- FONT (used by MSDOS, PCDOS and Windows 9x), FONT.NT (used by Windows NT and its successors) and DRFONT (used by DRDOS screen fonts). There is a file format specification in the MSDOS programmer's reference which covers FONT; I know of no formal specification for FONT.NT or DRFONT. Even in the case of FONT, a bit of expansion and clarification wouldn't come amiss in some places.
In this document (on the principle of being conservative in what you generate and liberal in what you accept) emphasized text indicates restrictions on the file format that you should try to follow when generating a CPI file, but which you shouldn't rely on when reading. It is sometimes followed by a footnote [0] saying which utility has this restriction.
Here's one, for instance: CPI files in FONT format should not exceed 64k in size - use FONT.NT or DRFONT if you need to get more codepages in a file than will fit in 64k [1]. If you know that your CPI file will only be parsed by utilities that understand 32-bit file offsets, you can write CPI files bigger than 64k. Just don't try to use them with, in this case, the PC-DOS 3.3 DISPLAY.SYS. And don't assume that all FONT-format CPI files will be 64k or less.
The principal programs which have to parse CPI files - and on which I've based this specification - are:
All numbers are stored in little-endian format. 'short' is 2 bytes, 'long' is 4 bytes.
FontFileHeader FontInfoHeader CodePageEntryHeader | | | | either | +---> CodePageInfoHeader } | . ScreenFontHeader } | . Screen font bitmaps } Code page body | . ScreenFontHeader } | . Screen font bitmaps } | . ... | . or | +---> CodePageInfoHeader } | PrinterFontHeader } Code page body | Printer font data } v CodePageEntryHeader | | | +---> Code page body ... ...
FontFileHeader DRDOSExtendedFontFileHeader | FontInfoHeader | CodePageEntryHeader | | | | | +---> CodePageInfoHeader } | | ScreenFontHeader } | | ScreenFontHeader } Code page body | | ... } | | Character index table } | v | CodePageEntryHeader | | | | | +---> Code page body | ... ... v Screen font bitmaps
A CPI file begins with a fixed header. In theory its size could range from 18 bytes to just over 320k, but in practice its length is always 23 bytes, for two reasons:
struct
{
char id0;
char id[7];
char reserved[8];
short pnum;
char ptyp;
long fih_offset;
} FontFileHeader;
In a DRFONT font, this immediately follows the FontFileHeader.
struct
{
char num_fonts_per_codepage;
char font_cellsize[N];
long dfd_offset[N];
} DRDOSExtendedFontFileHeader;
struct
{
short num_codepages;
} FontInfoHeader;
This should immediately follow the FontFileHeader or DRDOSExtendedFontFileHeader [2] .
The FontInfoHeader is immediately followed by the first CodePageEntryHeader; these form a linked list of codepages that the CPI file implements.
struct
{
short cpeh_size;
long next_cpeh_offset;
short device_type;
char device_name[8];
short codepage;
char reserved[6];
long cpih_offset;
} CodePageEntryHeader;
The CodePageInfoHeader for a codepage should immediately follow the CodePageEntryHeader - rather than, for example, all the CodePageEntryHeaders together at the start and then all the CodePageInfoHeaders with their fonts. [3]. This is particularly important in a DRFONT file [4].
The fields next_cpeh_offset and cpih_offset should not point to addresses earlier in the file than this CodePageEntryHeader, for the same reason.
At the start of the data block for each codepage is a CodePageInfoHeader:
struct
{
short version;
short num_fonts;
short size;
} CodePageInfoHeader;
This is 1 if the following codepage is in FONT format, 2 if it is in DRFONT format. Putting a DRFONT codepage in a FONT-format file will not work. You shouldn't put a FONT codepage in a DRFONT-format file either [5].
LCD.CPI from Toshiba MS-DOS 3.30 sets this field to 0, which should be treated as 1.
If the CPI is for a printer, the CodePageInfoHeader is followed by:
struct
{
short printer_type;
short escape_length;
} PrinterFontHeader;
This structure is in turn followed by the printer data. If printer_type is 1, there are two escape sequences; if printer_type is 2, there is one. The first escape sequence selects the builtin code page; the second selects the downloaded codepage. An escape sequence is stored as a Pascal string (the first byte is the length). After the escape sequence(s), any remaining data up to the size given in CodePageInfoHeader are the definition of the font, to be downloaded to the printer.
If the CPI is for the screen, the CodePageInfoHeader is followed by screen font definitions for each size. In a FONT or FONT.NT file, each entry consists of a ScreenFontHeader followed by the font bitmap; in a DRFONT, just the ScreenFontHeader is provided.
struct
{
char height;
char width;
char yaspect;
char xaspect;
short num_chars;
} ScreenFontHeader;
Except in DRFONT fonts, the bitmap follows the ScreenFontHeader; its
length is num_chars * height * ((width+7)/8), and it contains
glyphs for each character in increasing order. Some loaders calculate the
size simply as height * num_chars, and so will miscalculate if
the width is wider than 8.
struct
{
short FontIndex[256];
} CharacterIndexTable;
The DRDOS utilities assume that there are always 256 entries in this table; so the character count in a DRFONT ScreenFontHeader should always be 256 [9] .
Each entry in FontIndex describes the number of the bitmap for the corresponding character in the bitmap tables pointed to by the DRDOSExtendedFontFileHeader. To find the bitmap for a particular letter, take the FontIndex entry, multiply it by the character length in bytes, and add the dfd_offset for the size in question.
To determine the number of characters in bitmap tables in a DRFONT, a program therefore has to walk all FontIndex entries in the file and take the highest value.
Some CPI files don't end immediately after the last font. Usually, what follows is a copyright message (possibly terminated by 0x1A) and/or some zero bytes. The MS-DOS 5 Programmer's Reference says that a CPI file 'always ends with a copyright notice' and that this is at most 0x150 bytes long.
Among the things that the format seems to support but some or all utilities do not, we find:
If pnum were to be greater than 1, there are two possibilities for how the extra data would be stored:
struct struct
{ {
char id0; char id0;
char id[7]; char id[7];
char reserved[8]; char reserved[8];
short pnum; short pnum;
char ptyp[N]; struct {
long fih_offset[N]; char ptyp;
long fih_offset
} pointers[N];
} FontFileHeader; } FontFileHeader;
-- that is, either all the types come first and then all the pointers, or
types and pointers alternate. The second is backward-compatible, in that
programs which only understood the 1-pointer format would be able to
follow the first pointer as usual.
ptyp is always 1. What might other values mean?
Technically, there's no reason why a CPI file shouldn't hold codepages for multiple devices (eg, each codepage appears three times: once for "EGA", once for "LCD", and once for the "4201" printer). How would utilities handle this?
Even if a CPI file can't be streamed because of the order of the records, all the pointers in it will almost certainly point forwards - that is, to bytes further from the start of the file than where the pointer is. What happens if the blocks are so perversely arranged that this is not the case?
In this situation, a FONT.NT file would actually have negative values in its offset fields, and this might cause trouble on systems that treated them as unsigned.
How should utilities handle the case of the same codepage appearing multiple times for the same device, or the same font size appearing multiple times within a codepage?
What was the aspect ratio intended for? Can the same font size appear multiple times in a codepage if the aspect ratio is different?
These explain the reasons for particular recommendations.
EXEC-NW.CPI Version E3Therefore "EGA.ICE" is not the original filename, and the file was not created by Microsoft.
437 850 860 861 865
Copyright (c) 1991, AST Europe Ltd. All rights reserved.