OSU CSE 2421
CSE 2421
X86-64 Assembly Language: Special Arithmetic Operations Multiplication
J.E.Jones
OSU CSE 2421
Recall that when multiplying any 2 numbers represented with w bits, we must store the result in 2w bits to avoid overflow.
x86-64 has a way for us to accomplish this even when using 8-byte registers.
There are 2 different instructions to accomplish multiplication. imulX for signed interpretations and mulX for unsigned interpretations
J. E. Jones
OSU CSE 2421
Signed – 3 alternatives
◦ imulX ◦ imulX ◦ imulX
Src #Result in 2w bits relative to suffix Src, Dest #Result in w bits equivalent to suffix Imm, Src, Dest #Result in w bits equivalent to suffix
Unsigned – only one choice
◦ mulX Src #Result in 2w bits relative to suffix
where X can be q, l, w, or b
J. E. Jones
OSU CSE 2421
<64-bit value> * <64-bit value> = <64-bit value> or
<128-bit value> depending upon multiply instruction syntax
Let’s look at 128-bit result first
x86-64 naming conventions: ◦ Byte – 1 byte
◦ Word – 2 bytes
◦ Double Word – 4 bytes
◦ Quad Word – 8 bytes ◦ Oct Word – 16 bytes
J. E. Jones
OSU CSE 2421
ALU uses two 8-byte registers to accomplish a 16-byte result %rdx %rax
High 8 bytes Lower 8 bytes
This ONLY works with the %rdx/%rax register pair. No other registers can be paired in this manner.
One of the 8-byte values (multiplier or multiplicand) *MUST* be in %rax for this to work.
Either the mulq (unsigned) or the imulq (signed) instruction can be used.
Example using unsigned multiply (C99 or above):
void store_uprod(uint128_t *dest, uint_64 x, uint64_t y) {
*dest = x * (uint128_t) y; }
J. E. Jones
OSU CSE 2421
void store_uprod(uint128_t *dest, uint64_t x, uint_64_t y)
store_uprod: movq %rsi, %rax
# *dest in %rdi, x in %rsi, y in %rdx .
%rdx = y
#Copy x to multiplicand
%rax = x
. . . . . . . . . . . .
3rd Parameter/Multiplier
Multiplicand
Address in %rdi
. . .
Address in %rdi+8
J. E. Jones
OSU CSE 2421
void store_uprod(uint128_t *dest, uint64_t x, uint_64_t y)
store_uprod: movq %rsi, %rax
# *dest in %rdi, x in %rsi, y in %rdx .
%rdx = y
#Copy x to multiplicand
%rax = x
. .
3rd Parameter/Multiplier Multiplicand .
mulq %rdx #Multiply by y (y can be in any other 8-byte reg) # Note mulq has only one operand
.
%rdx = x*y
. %rax = x*y .
Product High 8 bytes
Product Lower 8 bytes
. . . . . .
Address in %rdi
. . .
Address in %rdi+8
J. E. Jones
OSU CSE 2421
void store_uprod(uint128_t *dest, uint64_t x, uint_64_t y)
# *dest in %rdi, x in %rsi, y in %rdx
. . . . . . . . . . . . . .
store_uprod: movq %rsi, %rax
#Copy x to multiplicand
%rax = x
ret
%rdx = y
3rd Parameter/Multiplier Multiplicand
mulq %rdx
#Multiply by y (y can be in any other 8-byte reg) # Note mulq has only one operand
%rdx = x*y
%rax = x*y
Product High 8 bytes Product Lower 8 bytes
movq %rax, (%rdi) movq %rdx, 8(%rdi)
#Store lower 8 bytes at dest Address in #Store upper 8 bytes at dest+8 %rdi+8
#Little Endian means lower order bytes # stored at lower address
Address in %rdi
. .
J. E. Jones
OSU CSE 2421
void store_uprod(uint128_t *dest, uint64_t x, uint_64_t y)
# *dest in %rdi, x in %rsi, y in %rdx
store_uprod: movq %rsi, %rax
#Copy x to multiplicand
%rax = x
ret
%rdx = y
3rd Parameter/Multiplier Multiplicand
mulq %rdx
#Multiply by y (y can be in any other 8-byte reg) # Note mulq has only one operand
%rdx = x*y
%rax = x*y
. . . . . . .
Product High 8 bytes Product Lower 8 bytes
movq %rax, (%rdi) movq %rdx, 8(%rdi)
#Store lower 8 bytes at dest Address in #Store upper 8 bytes at dest+8 %rdi+8
#Little Endian means lower order bytes # stored at lower address
Address in %rdi
%rax (LSB) .
.
.
.
.
. %rax (MSB)
J. E. Jones
OSU CSE 2421
void store_uprod(uint128_t *dest, uint64_t x, uint_64_t y)
# *dest in %rdi, x in %rsi, y in %rdx
store_uprod: movq %rsi, %rax
#Copy x to multiplicand
%rax = x
ret
%rdx = y
3rd Parameter/Multiplier Multiplicand
mulq %rdx
#Multiply by y (y can be in any other 8-byte reg) # Note mulq has only one operand
%rdx (LSB) .
%rdx = x*y
%rax = x*y
.
.
.
.
. %rdx (MSB)
Product High 8 bytes Product Lower 8 bytes
movq %rax, (%rdi) movq %rdx, 8(%rdi)
#Store lower 8 bytes at dest Address in #Store upper 8 bytes at dest+8 %rdi+8
#Little Endian means lower order bytes # stored at lower address
Address in %rdi
%rax (LSB) .
.
.
.
.
. %rax (MSB)
J. E. Jones
OSU CSE 2421
void store_uprod(uint128_t *dest, uint64_t x, uint_64_t y)
# dest in %rdi, x=0x6261600000000000 in %rsi, y = 0x10 in %rdx .
store_uprod:
movq %rsi, %rax #Copy x to multiplicand
. . . . . . . . . . . .
%rdx = 0x10 (16 decimal)
%rax = 0x6261600000000000
3rd Parameter/Multiplier
Multiplicand
Address in %rdi
. . .
Address in %rdi+8
J. E. Jones
OSU CSE 2421
void store_uprod(uint128_t *dest, uint64_t x, uint_64_t y)
# dest in %rdi, x=0x6261600000000000 in %rsi, y = 0x10 in %rdx .
store_uprod:
movq %rsi, %rax #Copy x to multiplicand
. . . . . . . .
%rdx = 0x10 (16 decimal) %rax = 0x6261600000000000
3rd Parameter/Multiplier Multiplicand
mulq %rdx #Multiply by y (y can be in any other 8-byte reg) # Note mulq has only one operand
%rdx = 0x0000000000000006 %rax = 0x2616000000000000
Product High 8 bytes
Product Lower 8 bytes
movq %rax, (%rdi) movq %rdx, 8(%rdi)
#Store lower 8 bytes at dest #Store upper 8 bytes at dest+8
Address in %rdi+8
Address in %rdi
. . .
#Little Endian means lower order bytes .
# stored at lower address
. . .
ret
NOTE: multiplying by 0x10 would be equivalent to a shlq $4 Stack frame construction omitted from this program
J. E. Jones
OSU CSE 2421
void store_uprod(uint128_t *dest, uint64_t x, uint_64_t y)
# dest in %rdi, x=0x6261600000000000 in %rsi, y = 0x10 in %rdx 0x00
store_uprod:
movq %rsi, %rax #Copy x to multiplicand
0x00
0x00
0x16
0x26
%rdx = 0x10 (16 decimal) %rax = 0x6261600000000000
3rd Parameter/Multiplier Multiplicand
mulq %rdx #Multiply by y (y can be in any other 8-byte reg) # Note mulq has only one operand
. .
%rdx = 0x0000000000000006 %rax = 0x2616000000000000
Product High 8 bytes
Product Lower 8 bytes
.
movq %rax, (%rdi) movq %rdx, 8(%rdi)
#Store lower 8 bytes at dest #Store upper 8 bytes at dest+8
Address in %rdi
0x00
0x00
0x00
. #Little Endian means lower order bytes .
# stored at lower address
. . .
ret
NOTE: multiplying by 0x10 would be equivalent to a shlq $4 Stack frame construction omitted from this program
Address in %rdi+8
J. E. Jones
OSU CSE 2421
void store_uprod(uint128_t *dest, uint64_t x, uint_64_t y)
# dest in %rdi, x=0x6261600000000000 in %rsi, y = 0x10 in %rdx 0x00
store_uprod:
movq %rsi, %rax #Copy x to multiplicand
0x00
0x00
0x16
0x26
0x06
0x00
0x00
0x00
%rdx = 0x10 (16 decimal) %rax = 0x6261600000000000
3rd Parameter/Multiplier Multiplicand
mulq %rdx #Multiply by y (y can be in any other 8-byte reg) # Note mulq has only one operand
%rdx = 0x0000000000000006 %rax = 0x2616000000000000
Product High 8 bytes
Product Lower 8 bytes
movq %rax, (%rdi) movq %rdx, 8(%rdi)
#Store lower 8 bytes at dest #Store upper 8 bytes at dest+8
Address in %rdi+8
Address in %rdi
0x00
0x00
0x00
#Little Endian means lower order bytes 0x00
# stored at lower address
0x00
ret
NOTE: multiplying by 0x10 would be equivalent to a shlq $4 Stack frame construction omitted from this program
0x00
0x00
J. E. Jones
OSU CSE 2421
Not only does this work with 2 64-bit multiplicands, when we want a 128-bit product (both signed and unsigned), but it works when:
1. We have 2 32-bit multiplicands, and we want a 64-bit product (both signed and unsigned)
2. We have 2 16-bit multiplicands, and we want a 32-bit product (both signed and unsigned)
3. We have 2 8-bit multiplicands, and we want a 16-bit product (both signed and unsigned)
When using mulX or imulX instructions in this manner, the instruction takes only 1 operand, and the other operand must be in some portion of the %rax register.
When using mulX or imulX instructions in this manner, the more significant half of the product will be in some portion of %rdx and the less significant half of the product will be in some portion of %rax…usually. Always TEST it before you submit code to system test group.
J. E. Jones
OSU CSE 2421
# multiplier in %ecx, multiplicand in %eax
# if %ecx = 0x00001000 and %eax = 0x75747372
%rdx %rax
Note: 1 operand!
Some value 0x0000000075747372
mull %ecx # OR imull %ecx %rdx %rax = x
0x0000000000000757 0x0000000047372000
#8-byte result is 0x00000757 47372000
Note value in %rdx is changed!
Recall from Lab 5 that when a 4-byte value is written to a 4-byte register, the upper 4-bytes are set to zero.
J. E. Jones
OSU CSE 2421
%eax: 0x47372000
%edx: 0x00000757
want 8-byte result: 0x0000075747372000
shlq $32, %rdx
%rdx: 0x0000075700000000 %rax= 0x0000000047372000
orq %rdx, %rax or
0x0000075700000000
0x0000000047372000
0x0000075747372000
addq %rdx, %rax
J. E. Jones
OSU CSE 2421
If we have two 2-byte values that we multiply together, using mulw %cx, giving us a 4-byte result where the least significant 2 bytes are in %ax and the most significant 2 bytes in %dx, then:
◦ Garbage can be in the 6 upper bytes in both the %rax and %rdx registers
◦ We must combine the 2-byte values in %ax and %dx so that we have a 4-byte value we can use
J. E. Jones
OSU CSE 2421
# multiplier in %cx, multiplicand in %ax # if %cx = 0x0100 and %ax = 0x7574
Note: 1 operand!
mulw %cx
# OR imulw %cx %rdx %rax = x
Note only value in %dx is changed!
%rdx %rax
Some value 0x????????????7574
0x????????????0075 0x????????????7400
#4-byte result is 0x0075 7400
Recall from Lab 5 that when a 2-byte values is written to a 2-byte register, the upper 6-bytes do not change.
J. E. Jones
OSU CSE 2421
%ax: 0x1234, assume %eax = 0x784f1234 %dx: 0x0056, assume %edx = 0xab330056 want 4-byte result: 0x00561234
shll $16, %edx %edx: 0x00560000 movzwl %ax, %eax
or %eax=0x00001234
andl $0x0000ffff, %eax
orl %edx, %eax or
0x00560000
0x00001234
0x00561234
addl %edx, %eax
J. E. Jones
OSU CSE 2421
When in doubt, write a test program.
The 15 minutes it takes to write it and evaluate the results, will save you days of debugging.
J. E. Jones