can move 256-bit memory location ymm registers? if want fill xmm register, use in inline asm in gcc:
"movlpd mytest_1(%rip),%xmm1 \n\t" "movhpd mytest_1+8(%rip),%xmm1 \n\t"
can made easier guess?
furthermore: same procedure move aligned or not 4 quadwords in 1 step ymm0? reverse of vmovdqa ymm1, mem256 source -> destination.
"movlpd mytest_1(%rip),%xmm1 \n\t" "movhpd mytest_1+8(%rip),%xmm1 \n\t"
these 2 instructions can combined 1 movdqu
/movdqa
, because x86 little endian architecture
"movdqu mytest_1(%rip),%xmm1 \n\t" // 16-byte unaligned or "movdqa mytest_1(%rip),%xmm1 \n\t" // 16-byte aligned 'mytest_1'
both can used avx 32-bit memory transfer (vmovdqu
/vmovdqa
):
"vmovdqu mytest_1(%rip),%ymm1 \n\t" // 32-byte unaligned or "vmovdqa mytest_1(%rip),%ymm1 \n\t" // 32-byte aligned 'mytest_1'
regarding second part of question:
i reverse of vmovdqa ymm1, mem256 source -> destination.
this work in both directions, e.g. possible instructions vmovdqa
:
vmovdqa ymm1, ymm2/m256 rm v/v avx move aligned packed integer values ymm2/mem ymm1. vmovdqa ymm2/m256, ymm1 mr v/v avx move aligned packed integer values ymm1 ymm2/mem.
Comments
Post a Comment