Thursday, December 30, 2010

x86 encoding

Dammit, x86. There are FPU escape codes, 1-byte, 2-bytes and 3-bytes opcodes, prefixes to specify which command does specific opcode refer to, sometimes this define a group of commands and specific command is determined using reg/opc field in ModR/M byte (so these are only instructions with one argument or those which have an immediate as a second argument), but this isn't an end: since first argument can refer either to memory or to register some commands sharing the same opcode and reg/opc are distinguished using this information (mod field of ModR/M). And the hell is there are two pairs of operations in a regular opcodes tables which one can separate only by the type (memory or register) of the second argument:

0x0f 0x12  movlps VqMq   or  movhlps VqUq
0x0f 0x16  movhps VqMq   or  movlhps VqUq

arm carry flag vs x86 carry flag

For add operation arm_cf and x86_cf are the same. For sub they differ. This is because in arm sub rd, rn, rm is equivalent to add rd, rn, -rm. Carry flag is set to 1 after add operation if the sum of operands is greater than 2^32. So on arm carry flag after sub is set if rn + (2^32 - rm) > 2^32 equivalently rn > rm. But on x86 carry would be set for sub if rn < rm which is exactly the opposite.

intel x86 far ret

if after return stack size is 16 then esp[31:16] (higher half of esp) is not touched
if after return stack size is 32 and before return operand size is 16 then read 16-bit sp is zero extended to 32-bit esp

Saturday, December 25, 2010

Understanding CRC

There are some documents on the internet describing CRC32 and related algorithms. A Painless Guide to CRC Error Detection Algorithms by Ross Williams explains CRC as an engineer would do explicitly describing what is happening with bits. It also covers reasons of great variety of CRC types. A mathematician would probably be more happy reading an article The iSCSI CRC32C Digest and the Simultaneous Multiply and Divide Algorithm by Luben Tuikov and Vicente Cavanna. CRC is presented rigorously as a division over F_2 field. The second article can lead to an idea how to split CRC calculation over different threads. Same thoughts appear in article Fast Parallel CRC Algorithm and Implementation on a Configurable Processor by H. Michael Ji and Earl Killian.