Jump to content

User:AlexanderRichardson/Structure Definitions

From KDE UserBase Wiki
Revision as of 15:06, 22 July 2013 by AlexanderRichardson (talk | contribs) (Property encoding: string: fix external links)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Note

This information here is valid for Okteta 0.10 (released with KDE SC 4.10). Some functionality may not be available with older versions.

It is possible to define structures using either XML or JavaScript. Errors in the definition are viewable by opening the script console in Okteta.

Directory layout

Each structure definition consists of a folder containing two files. Inside this folder there must be a .desktop file (recommended name is metadata.desktop) for the metadata.

If you decide to use XML you will need a <id>.osd file. The id is the value of the X-KDE-PluginInfo-Name entry in the metadata

Note

As of Oketeta 0.11 (Released with KDE SC 4.11) the filename main.osd will be checked first, if it doesn't exist <id>.osd will be used.

If you use JavaScript instead you will need a file named main.js

The metadata file

The metadata file is a standard .desktop file. It has the following entries in the [Desktop Entry] section:

Entry details
Encoding (required) Always use UTF-8 here
Icon (optional) The icon that will be displayed in the configuration UI. You can use any icon name known to KDE, or alternatively an absolute filesystem path.
Type (required) Always use Service here
ServiceTypes (required) Always use KPluginInfo here
Name (required) The name of this structure. Will be displayed in the selection UI.
Comment (optional) A short description of this structure. Will be displayed in the selection UI.
X-KDE-PluginInfo-Author (optional) Your name. Will be displayed in the selection UI.
X-KDE-PluginInfo-Email (optional) Your email address. Will be displayed in the selection UI.
X-KDE-PluginInfo-Name (required) This entry will be used as the ID of this structure
X-KDE-PluginInfo-Version (required) A version number for your structure.
X-KDE-PluginInfo-Website (optional) A website for this structure like e.g. http://kde-files.org/content/show.php/rpm+structure+definition?content=147699
X-KDE-PluginInfo-Category (required) The value here must be either structure if you are writing an XML structure or structure/js if you are writing a JavaScript structure
X-KDE-PluginInfo-License (optional) A license like e.g. GPLv3


A valid sample metadata.desktop could look as follows:

[Desktop Entry]
Encoding=UTF-8
Icon=application-zip
Type=Service
ServiceTypes=KPluginInfo

Name=Foo files
Comment=My own custom compression format (.foo)

X-KDE-PluginInfo-Author=Foo Bar
X-KDE-PluginInfo-Email=[email protected]
X-KDE-PluginInfo-Name=compressed-file
X-KDE-PluginInfo-Version=0.1
X-KDE-PluginInfo-Website=http://www.example.org/
X-KDE-PluginInfo-Category=structure/js
X-KDE-PluginInfo-License=GPLv3

Available datatypes and their properties

Structures

A structure is a container type in which the children are read sequentially. This is analogous to the C/C++ struct type.

Property fields: list of datatypes

This property holds a list with the children of this struct. They will be read in the order they are defined. This property is also available at runtime to allow modifying the field.

Property childCount (runtime only): unsigned integer

This read-only property holds the number of fields.

Unions

A union is a container type in which the children are read sequentially, but always from the same starting offset. This is analogous to the C/C++ union type.

Unions have the same properties as structures

Tagged unions

This datatype exists to simplify using structures that have a different layout depending on one key field. This is intended for C/C++ structures like this (example from Wikipedia):

enum ShapeKind { Square, Rectangle, Circle };
 
struct Shape {
  int centerx;
  int centery;
  enum ShapeKind kind;
  union {
    struct { int side; }           squareData;
    struct { int length, height; } rectangleData;
    struct { int radius; }         circleData;
  } shapeKindData;
};
struct

In the structures view only the relevant type will be displayed. E.g. if kind has the value Square the structures view will display a struct Square with the fields centerx, centery, kind and side.

Property fields: list of datatypes

This is the list of fields that are common to each type. In the example this would be

  • int centerx
  • int centery
  • enum ShapeKind kind

Property alternatives: list of object

This is a list with the alternatives that exist for this tagged union. Each alternative consists of three properties:

  • fields: The list of fields that this alternative has
  • selectIf: A function that returns true if this alternative should be selected. If there is only one field in the tagged union then a integer value is also permitted. In that case this alternative will be selected whenever that integer value is selected.
  • structName: The name that should be displayed for the whole structure if this alternative is selected.


The value of alternatives for the example
fields selectIf structName
int side function() { return this.kind == Square; } "Square"
int length, int height function() { return this.kind == Rectangle; } "Rectangle"
int radius function() { return this.kind == Circle; } "Circle"

Property defaultFields: list of datatypes

This property defines which fields should be displayed if none of the alternatives matched. By default this will be none.


Arrays

A collection of elements which have the same type. This is analogous to the C/C++ array concept.

Note

Array length is limited to 10000, since larger arrays would use to much memory. If this is a problem for your file format please file a bug report.


Property type: datatype

The type of this array. Can be any other element, even another array.

Property length: unsigned integer or function

Holds the length of the array. Can be either a fixed number or a JavaScript function that returns a number. If set to a JavaScript function, this function will be called everytime before the array is read and set the array length to the return value. Since arrays are limited to 10000 return values larger that that are set to the maximum. Example:

function() { return this.parent.datalen.value }

A shorthand is also available: You can specify the name of another element (must be a primitive type like integers, pointers, enums, flags)

Note

Referencing the name of another element only available in JavaScript as of Okteta 0.11, XML has always supported it. For older versions the function() { return ... } syntax must be used.


When reading at runtime it will always return the current length as a number. There is no way to obtain the length function dynamically at runtime.

Warning

When writing to this property currently only unsigned integers are accepted. The only way to change this function is to replace the array with a new array with a different function. This will be fixed in Okteta 0.11.


Strings

Represents a string with a specified encoding. By default strings will be C-style null terminated strings.

Property encoding: string

This property can be any of the following.

  • ascii for a US-ASCII encoded string
  • latin1 for a ISO 8859-1 encoded string
  • utf-8 for UTF-8 encoded string
  • utf-16 or utf-16-le for a UTF-16 little endian encoded string
  • utf-16-be for a UTF-16 big endian encoded string
  • utf-32 or utf32-le for a UTF-32 little endian encoded string
  • utf-32-be for a UTF-32 big endian encoded string

Note

The hyphens may be omitted. I.e. utf16le is the same as utf-16-le


Property terminatedBy: unsigned integer

This property determines the length of the string. The string extends until the current [1] is equal to terminatedBy. For C-style null terminated strings set this property to zero.

Property maxByteCount: unsigned integer

Set the maximum number of bytes in this string.

Note

For UTF-16 maxCharCount is no equal to 2*maxByteCount, since there may be surrogate characters

Property maxCharCount: unsigned integer

Set the maximum number of bytes in this string.

Property byteCount (runtime only): unsigned integer

This read-only property holds the number of bytes this string contains.

Property charCount (runtime only): unsigned integer

This read-only property holds the number of code points in this string

Primitive data types

The following primitive data types are available

  • int8, int16, int32, int64: Signed integers with 8, 16, 32 or 64 bits precision
  • uint8, uint16, uint32, uint64: unsigned integers with 8, 16, 32 or 64 bits precision
  • bool8, bool16, bool32, bool64: A boolean value (0 is false, any other value is true)
  • float: a 32 bit IEEE754 floating point number
  • double: a 64 bit IEEE754 floating point number
  • char: a single ASCII character (8 bits, although only values up to 0x80 are valid)

Property value (runtime only): number or string

Holds the value of this element.

Warning

Due to JavaScript limitations (every number is stored as a 64 bit floating point number) some values larger than 32 bits cannot be represented exactly, therefore for all 64 bit values decimal strings are used instead. This means you should always use strings in comparisons with 64 bit values. E.g,: if (this.value == "9007199254740993")


Bitfields

noframe
noframe

TODO

TODO


Pointers

Pointers are primitive data types (the value property is also available) that also act as containers. The children of a pointer will be read at the offset equal to the value of the pointer.

Note

At the moment (Okteta 0.10) only absolute pointers are supported. Relative pointers will be available in Okteta 0.11.

Property type: datatype

The underlying primitive type of this pointer. Must be on of uint8, uint16, uint32 or uint64.

Property target: datatype

This property holds the type that is being pointed to. Can be any other element, even another pointer.

Enumerations

Enumerations are a primitive type where the textual value will be displayed instead of the numeric value. This is analogous to the C/C++ enum type. Since enumerations are primitive types they have the same properties as all primitive data types.

Note

The value property holds the numeric value not the textual one

Property type: string

This property must hold one of the strings int8, int16, int32, int64, uint8, uint16, uint32, uint64, since only integer type enumerations are allowed. A workaround for floating-point enumerations is interpreting the bit pattern of the corresponding floating-point value as an integer and using that for the enumeration value.

Property enumName: string

Contains the name of the underlying enumeration. This property exists so that the type column can display the name of the enumeration referred to instead of simply the string enum

Property enumValues: map

A list of key-value pairs which is used to perform the integer value to text translation

Bitflags

Bitflags are very similar to enumerations, only that a bitwise-or of the appropriate textual values will be displayed. For flags usually the enumerated values will be single set bits (i.e. numbers that are powers of 2), but any other value is also supported. If there are enum values that completly contain the bits of other values (e.g. 7 contains 2 and 4) only that value will be displayed.

Example: Asumming you have an enumeration representing UNIX file access rights: R = 4, W = 2, X = 1. Then the value 7 will be displayed as R | W | X. If you add another value ALL_RIGHTS = 7 to your enumeration the value 7 will be displayed as ALL_RIGHTS instead. This can be useful if you want to have a shorter or more readable string displayed.

Properties are the same as the properties for enumerations.

Properties common to all types

Property defaultLockOffset: number

Since Okteta 0.11

Setting this property allow you to ensure that the structure is locked at that offset whenever a new file is opened. Otherwise it will be read from the cursor position. When this property is set you can of course still unlock the structure manually from the UI.

This is mainly intended for e.g. file headers, which will usually start at offset 0. This way you no longer have to move the cursor to offset 0 select the structure and then press the lock button.

The property is only useful for the root element. If you put it on any other element it will be ignored. For an example look here (XML) or here (JavaScript),

Property name: string

The name of this element in the resulting structure. Note that if you name your element the same as an property you cannot access it with the normal syntax in script code. If you have an element named byteOrder you have to write parent.child("byteOrder") instead of parent.byteOrder since the latter will return the value of the byteOrder property of parent instead of the child element.

Property byteOrder: string

Set the endianess. The following values are possible:

  • big-endian: Always use big endian
  • little-endian: Always use little endian
  • from-settings: Always use the value specified in the settings page
  • inherit: Use the value of the parent element . For the root element this is equivalent to from-settings

The default value is inherit.

Property updateFunc: function

A JavaScript function which gets called every time this element is read. Allows you to modify this element and its children. This allows your structure to dynamically change its visualization depending on the data. Since this function gets called before the data for this element is read you can only read the values of elements that come before. Specifically accessing this.value will not work. See #Update_and_validation_functions

Property validationFunc:function

A JavaScript function which gets called whenever the user presses the Validate button. This function should return a boolean value or a string. If a string is returned that string will be displayed as the validation failure message. Returning true will mark the element as sucessfully validated, false will display a validation error without a message.

Examples:

function() { return this.value == 0x42 }
function() { if (this.value >= 0x80) return "Invalid ASCII character"; else return true; }

Warning

When writing definitions in XML you have to escape some characters, since otherwise the document may be malformed

Property parent (runtime only): datatype

This read-only property can be used at runtime to access the parent element. Should not be read in the root element.

Property wasAbleToRead (runtime only): boolean

This property holds the value true if the value could be read, or false if end of file was reached.

Property validationError (runtime only): string

This property can be written to inside a validation function. It is useful if you want to validate more than one child without writing a validation function for each of them. Example:

function() {
  //ensure that the magic values hold 0xdeadbeef
  var valid = true;
  if (this.magic[0] != 0xde) {
    this.magic[0].validationError = "This byte must have the value 0xde";
    valid = false;
  } else if (this.magic[1] != 0xad) {
    this.magic[1].validationError = "This byte must have the value 0xad";
    valid = false;
  } else if (this.magic[2] != 0xbe) {
    this.magic[2].validationError = "This byte must have the value 0xbe";
    valid = false;
  } else if (this.magic[3] != 0xef) {
    this.magic[3].validationError = "This byte must have the value 0xef";
    valid = false;
  }
  return valid;
}

Obviously this example does not make that much sense since it would be easier to simply write

function() {
  //ensure that the magic values hold 0xdeadbeef
  if (this.magic[0] == 0xde && this.magic[1] == 0xad && this.magic[2] == 0xbe && this.magic[3] == 0xef) {
    return true;
  } else {
    return "Magic bytes must be equal to 0xde, 0xad, 0xbe, 0xef";
  }
}

but there may be cases where this makes sense.

XML structures

A sample .osd is a XML file with <data> as the root element and may look as follows:

<?xml version="1.0" encoding="UTF-8"?>
<data>
  <struct name="pngHeader">
    <array name="signature" length="12">
      <primitive name="val" type="char" />
    </array>
    <struct name="ImageHeader">
      <array name="signature" length="4">
        <primitive name="val" type="char" />
      </array>
      <primitive name="width" type="uint32" />
      <primitive name="height" type="uint32" />
      <primitive name="bitDepth" type="uint8" />
      <primitive name="colourType" type="uint8" />
      <primitive name="compressionMethod" type="uint8" />
      <primitive name="filterMethod" type="uint8" />
      <primitive name="interlaceMethod" type="uint8" />
    </struct>
  </struct>
</data>

Type elements

The types are represented in .osd files with the following XML elements:

Properties may be specified either as an XML attribute or as an XML child element (mostly useful for properties which contain longer text and linebreaks like e.g. updateFunc). The following two declarations are equivalent:

<primitive name="foo" type="uint32">
<primitive>
  <name>foo</name>
  <type>uint32</type>
</primitive>

Obviously properties which require a list or a datatype cannot be used as a XML attribute, but must use XML elements instead.

Special rules

In general the properties in the datatypes section will map one-to-one to XML attributes/elements. The following exceptions exist:

<struct>, <union>, <taggedUnion>

All subelements that are a valid type elements are added to the fields property. A <fields> element will be ignored.

<array>

To reduce verbosity the type attribute may be omitted. If not specified the first child element with a valid type will be used instead.

I.e. the following two declarations are equivalent:

<array name="signature" length="4">
  <primitive name="val" type="char" />
</array>

and

<array name="signature" length="4">
  <type>
    <primitive name="val" type="char" />
  </type>
</array>

As of Okteta 0.11 it is possible to write e.g <array type="uint8"> as a further shorthand. This works for all primitive type strings

<pointer>

A shorthand for using the type property exists. Instead of needing a child <type> element, it is also possible to write <pointer type="uint32" />. Remember that only unsigned integer types are allowed.

To reduce verbosity the target attribute may be omitted. If no <target> element exists, the first child element with a valid type will be used instead.

<enum>, <flags>

The enumName and enumValues properties do not map directly to XML. Instead there is an XML attribute enum which references an <enumDef> element defined somewhere below the <data> element. This is so that the enum values can be reused by other <enum> elements

The enumName property maps to the attribute name of the <enumDef>

The enumValues property maps to the list of <entry> elements in the <enumDef>

Example:

<data>
  <enumDef name="numbers" type="uint8">
    <entry name="ONE" value="1" />
    <entry name="TWO" value="2" />
    <entry name="THREE" value="3" />
    <entry name="FOUR" value="4" />
  </enumDef>
  <array length="4" name="enums">
    <enum name="number" enum="numbers" type="uint8" />
  </array>
</data>

Note

The type attribute on the <enumDef> element is needed so that it can be verified that all values are within the representable range for that type. This information is pretty redundant, but is due to the way the code was originally written.


<primitive>

As of Okteta 0.11 it is possible to write e.g. <uint8 />, <float /> instead of <primitive type="..." />

JavaScript structures

A JavaScript structure definition must contain a function called init which returns an object that can be converted to one of the datatypes.

noframe
noframe

TODO

0.11: list of objects


The sample from the #XML_structures section would look as follows in JavaScript:

function init() {
  var obj = struct({
    signature: array(char(), 12),
    ImageHeader: struct({
      signature: array(char(), 4),
      width: uint32(),
      height: uint32(),
      bitDepth: uint32(),
      colourType: uint32(),
      compressionMethod: uint8(),
      filterMethod: uint8(),
      interlaceMethod: uint8()
    }
  });
  obj.name = "pngHeader";
  return obj;
}

Type functions

The following functions are available to create data types:

struct(fields)

for #Structures

union(fields)

for #Unions

taggedUnion(fields, alternatives, defaultFields)

for #Tagged_unions

The parameters fields and defaultFields are the same as the fields parameter of struct() or union(). The parameter alternatives is a JavaScript list of objects that have the properties fields, selectIf and optionally structName. To simplify creating this object an function alternative(selectIf, fields, structName) exists (3rd parameter is optional).

The example from the #Tagged_unions section would look as follows in JavaScript:

var shapeKinds = { Square: 0, Rectangle: 1, Circle: 2 };
var shapes = taggedUnion(
  {centerx: int32(), centery: int32(), kind: enumeration("ShapeKind", int32(), shapeKinds)},
  [alternative(function() { return this.kind == shapeKinds.Square; }, { side: int32() }, "Square"),
   alternative(function() { return this.kind == shapeKinds.Rectangle; }, { length: int32(), height: int32() }, "Rectangle"),
   alternative(function() { return this.kind == shapeKinds.Circle; }, { radius: int32() }, "Circle")]);

array(type, length)

for #Arrays

string(encoding)

for #Strings

primitive types

To create each of the #Primitive_data_types a function with the same name exists. E.g. uint8(), int32(), float(), char(), etc.

bitfield(type, width)

for #Bitfields

pointer(type, target)

for #Pointers

enumeration(enumName, type, enumValues)

for #Enumerations. Using enum is not possible since this is a JavaScript reserved keyword. Enum values can be any JavaScript object that can be interpreted as a string-number map. For example:

var enumValues = { RED: 1, GREEN: "2", BLUE: "0xffeeddccbbaa9988" };

Note

64 bit values must represented as a string. If the string starts with "0x" it will be interpreted as a hexadecimal number, otherwise as a decimal number.

flags(enumName, type, enumValues)

for #Bitflags


Setting properties

To set the additional properties that are not set by these functions you can either set them using standard JavaScript property assignment or alternatively there is also a set() function defined on all the returned objects:

  var foo = string("utf8");
  foo.maxByteCount = 12;
  foo.validationFunc = function() { ... };
  //also possible like this:
  var foo2 = string("utf8").set({maxByteCount: 12, validationFunc: function() { ... }});
  //this syntax is mainly useful in inline expressions (the deeper nested you structure is the more useful)
  var something = struct({x: uint8(), y: uint8(), name: string("utf8").set({maxByteCount: 12})});
  //without the set function this would look like this:
  var something2 = struct({x: uint8(), y: uint8(), name: string("utf8")});
  something2.fields.name.maxByteCount = 12;

There is also a setUpdate(func) and setValidation(func) function defined for all those objects to save a bit of typing:

  //these 3 object are equivalent
  var x1 = array(uint8(), 12).setUpdate(function() { ... });
  var x2 = array(uint8(), 12).set(updateFunc: function() { ... });
  var x3 = array(uint8(), 12);
  x3.updateFunc = function() { ... };
  //the same here
  var y1 = array(uint8(), 12).setValidationUpdate(function() { ... });
  var y2 = array(uint8(), 12).set(validationFunc: function() { ... });
  var y3 = array(uint8(), 12);
  y3.validationFunc = function() { ... };

Update and validation functions

At runtime (during evaluation of updateFunc or validationFunc) accessing properties is slightly different than within the init() function, or the XML definition. This is due to the fact that at runtime you are accessing a JavaScript proxy to a C++ object, and at definition time it is just a plain JavaScript object.

A update function looks as follows:

function(root) {
   //modify this object
}

The argument root is optional, if you do not need it, it can be omitted. It holds the root element of the current structure so that you don't have to write this.parent.parent.... to access it. Within that function this refers to the current element. Examples for what is possible in an updateFunc:

function init() {
  var obj = struct({foo: uint32(), numbers: array(int16(), 10), childCount: int8() });
  //note that field name childCount is also the name of a property of struct
  obj.updateFunc = updateMyStruct;  //does not have to be defined inline, can be anywhere in the file
  return obj;
}
function updateMyStruct(root) {
  var fooValue = this.foo.value; //reads the value of field foo
  var arrayLen = this.numbers.length; //10
  var wrongChildCountValue = this.childCount.value; //ERROR: cannot access childCount like this
  //this.childCount will return the number 3, which has no property named value
  //the correct way to access it is by writing
  var childCountValue = this.child("childCount").value;
  //to access fields whose name matches that of a property you must always use the .child("...") syntax
  //children of an array can be accessed using standard array syntax (starting at 0):
  var firstNumber = this.numbers[0];
  this.name = "updatedStruct";
}

Properties can be written to just the same as within the init() function. However an error will be logged evertime you try to access a property that does not exist. This is not possible in the init function due to the fact that these objects are plain JavaScript objects. Every property read or write access in the update functions will be handled by C++ code, which makes the logging of errors possible.

Examples

There are some examples in Git

noframe
noframe

TODO

Commented Examples on wiki