x86 instruction listings
From Wikipedia, the free encyclopedia
The x86 instruction set has undergone numerous changes over time. Most of them were to add new functionality to the instruction set.
[edit] x86 integer instructions
This is the full 8086/8088 instruction set, but most, if not all of these instructions are available in 32-bit mode, they just operate on 32-bit registers (eax, ebx, etc) and values instead of their 16-bit (ax, bx, etc) counterparts. See also x86 assembly language for a quick tutorial for this chip.
[edit] Original 8086/8088 instructions
| Instruction | Meaning | Notes | 
|---|---|---|
| AAA | ASCII adjust AL after addition | used with unpacked binary coded decimal | 
| AAD | ASCII adjust AX before division | buggy in the original instruction set, but "fixed" in the NEC V20, causing a number of incompatibilities | 
| AAM | ASCII adjust AX after multiplication | |
| AAS | ASCII adjust AL after subtraction | |
| ADC | Add with carry | |
| ADD | Add | |
| AND | Logical AND | |
| CALL | Call procedure | |
| CBW | Convert byte to word | |
| CLC | Clear carry flag | |
| CLD | Clear direction flag | |
| CLI | Clear interrupt flag | |
| CMC | Complement carry flag | |
| CMP | Compare operands | |
| CMPSB | Compare bytes in memory | |
| CMPSW | Compare words | |
| CWD | Convert word to doubleword | |
| DAA | Decimal adjust AL after addition | (used with packed binary coded decimal) | 
| DAS | Decimal adjust AL after subtraction | |
| DEC | Decrement by 1 | |
| DIV | Unsigned divide | |
| ESC | Used with floating-point unit | |
| HLT | Enter halt state | |
| IDIV | Signed divide | |
| IMUL | Signed multiply | |
| IN | Input from port | |
| INC | Increment by 1 | |
| INT | Call to interrupt | |
| INTO | Call to interrupt if overflow | |
| IRET | Return from interrupt | |
| Jxx | Jump if condition | (JA, JAE, JB, JBE, JC, JCXZ, JE, JG, JGE, JL, JLE, JNA, JNAE, JNB, JNBE, JNC, JNE, JNG, JNGE, JNL, JNLE, JNO, JNP, JNS, JNZ, JO, JP, JPE, JPO, JS, JZ) | 
| JMP | Jump | |
| LAHF | Load flags into AH register | |
| LDS | Load pointer using DS | |
| LEA | Load Effective Address | |
| LES | Load ES with pointer | |
| LOCK | Assert BUS LOCK# signal | (for multiprocessing) | 
| LODSB | Load byte | |
| LODSW | Load word | |
| LOOP/LOOPx | Loop control | (LOOPE, LOOPNE, LOOPNZ, LOOPZ) | 
| MOV | Move | |
| MOVSB | Move byte from string to string | |
| MOVSW | Move word from string to string | |
| MUL | Unsigned multiply | |
| NEG | Two's complement negation | |
| NOP | No operation | opcode (0x90) equivalent to XCHG EAX, EAX | 
| NOT | Negate the operand, logical NOT | |
| OR | Logical OR | |
| OUT | Output to port | |
| POP | Pop data from stack | (Only works with register CS on 8086/8088.) | 
| POPF | Pop data into flags register | |
| PUSH | Push data onto stack | |
| PUSHF | Push flags onto stack | |
| RCL | Rotate left (with carry) | |
| RCR | Rotate right (with carry) | |
| REPxx | Repeat CMPS/MOVS/SCAS/STOS | (REP, REPE, REPNE, REPNZ, REPZ) | 
| RET | Return from procedure | |
| RETN | Return from near procedure | |
| RETF | Return from far procedure | |
| ROL | Rotate left | |
| ROR | Rotate right | |
| SAHF | Store AH into flags | |
| SAL | Shift Arithmetically left (signed shift left) | |
| SAR | Shift Arithmetically right (signed shift right) | |
| SBB | Subtraction with borrow | |
| SCASB | Compare byte string | |
| SCASW | Compare word string | |
| SHL | Shift left (unsigned shift left) | |
| SHR | Shift right (unsigned shift right) | |
| STC | Set carry flag | |
| STD | Set direction flag | |
| STI | Set interrupt flag | |
| STOSB | Store byte in string | |
| STOSW | Store word in string | |
| SUB | Subtraction | |
| TEST | Logical compare (AND) | |
| WAIT | Wait until not busy | Waits until BUSY# pin is inactive (used with floating-point unit) | 
| XCHG | Exchange data | |
| XLAT | Table look-up translation | |
| XOR | Exclusive OR | 
[edit] Added in specific processors
[edit] Added with 80186/80188
| Instruction | Meaning | Notes | 
|---|---|---|
| BOUND | Check array index against bounds | raises software interrupt 5 if test fails | 
| ENTER | Enter stack frame | equivalent to PUSH BP MOV BP, SP | 
| INS | Input from port to string | equivalent to IN (E)AX, DX MOV ES:[(E)DI], (E)AX | 
| LEAVE | Leave stack frame | equivalent to MOV SP, BP POP BP | 
| OUTS | Output string to port | equivalent to MOV (E)AX, DS:[(E)SI] OUT DX, (E)AX | 
| POPA | Pop all general purpose registers from stack | equivalent to POP DI, SI, BP, SP, BX, DX, CX, AX | 
| PUSHA | Push all general purpose registers onto stack | equivalent to PUSH DI, SI, BP, SP, BX, DX, CX, AX | 
[edit] Added with 80286
| Instruction | Meaning | Notes | 
|---|---|---|
| ARPL | Adjust RPL field of selector | |
| CLTS | Clear task-switched flag in register CR0 | |
| LAR | Load access rights byte | |
| LGDT | Load global descriptor table | |
| LIDT | Load interrupt descriptor table | |
| LLDT | Load local descriptor table | |
| LMSW | Load machine status word | |
| LOADALL | Load all CPU registers, including internal ones such as GDT | Undocumented, (80)286 and 386 only | 
| LSL | Load segment limit | |
| LTR | Load task register | |
| SGDT | Store global descriptor table | |
| SIDT | Store interrupt descriptor table | |
| SLDT | Store local descriptor table | |
| SMSW | Store machine status word | |
| STR | Store task register | |
| VERR | Verify a segment for reading | |
| VERW | Verify a segment for writing | 
[edit] Added with 80386
| Instruction | Meaning | Notes | 
|---|---|---|
| BSF | Bit scan forward | |
| BSR | Bit scan reverse | |
| BT | Bit test | |
| BTC | Bit test and complement | |
| BTR | Bit test and reset | |
| BTS | Bit test and set | |
| CDQ | Convert double-word to quad-word | Sign-extends EAX into EDX, forming the quad-word EDX:EAX. Since (I)DIV uses EDX:EAX as its input, CDQ must be called after setting EAX if EDX is not manually initialized (as in 64/32 division) before (I)DIV. | 
| CMPSD | Compare string double-word | Compares ES:[(E)DI] with DS:[SI] | 
| CWDE | Convert word to double-word | Unlike CWD, CWDE sign-extends AX to EAX instead of AX to DX:AX | 
| INSB, INSW, INSD | Input from port to string with explicit size | same as INS | 
| IRETx | Interrupt return; D suffix means 32-bit return, F suffix means do not generate epilogue code (i.e. LEAVE instruction) | Use IRETD rather than IRET in 32-bit situations | 
| JCXZ, JECXZ | Jump if register (E)CX is zero | |
| LFS, LGS | Load far pointer | |
| LSS | Load stack segment | |
| LODSW, LODSD | Load string | can be prefixed with REP | 
| LOOPW, LOOPD | Loop | Loop; counter register is (E)CX | 
| LOOPEW, LOOPED | Loop while equal | |
| LOOPZW, LOOPZD | Loop while zero | |
| LOOPNEW, LOOPNED | Loop while not equal | |
| LOOPNZW, LOOPNZD | Loop while not zero | |
| MOVSW, MOVSD | Move data from string to string | |
| MOVSX | Move with sign-extend | |
| MOVZX | Move with zero-extend | |
| POPAD | Pop all double-word (32-bit) registers from stack | Does not pop register ESP off of stack | 
| POPFD | Pop data into EFLAGS register | |
| PUSHAD | Push all double-word (32-bit registers) onto stack | |
| PUSHFD | Push EFLAGS register onto stack | |
| SCASD | Scan string data double-word | |
| SETA, SETAE, SETB, SETBE, SETC, SETE, SETG, SETGE, SETL, SETLE, SETNA, SETNAE, SETNB, SETNBE, SETNC, SETNE, SETNG, SETNGE, SETNL, SETNLE, SETNO, SETNP, SETNS, SETNZ, SETO, SETP, SETPE, SETPO, SETS, SETZ | Set byte to one on condition | |
| SHLD | Shift left double-word | |
| SHRD | Shift right double-word | |
| STOSx | Store string | 
[edit] Added with 80486
| Instruction | Meaning | Notes | 
|---|---|---|
| BSWAP | Byte Swap | Only works for 32 bit registers. | 
| CMPXCHG | CoMPare and eXCHanGe | |
| INVD | Invalidate Internal Caches | |
| INVLPG | Invalidate TLB Entry | |
| WBINVD | Write Back and Invalidate Cache | |
| XADD | Exchange and Add | 
[edit] Added with Pentium
| Instruction | Meaning | Notes | 
|---|---|---|
| CPUID | CPU IDentification | See note below | 
| CMPXCHG8B | CoMPare and eXCHanGe 8 bytes | |
| RDMSR | ReaD from Model-Specific Register | |
| RDTSC | ReaD Time Stamp Counter | |
| WRMSR | WRite to Model-Specific Register | |
| RSM | Resume operation of interrupted program | SMM [System Management Mode] | 
The CPUID instruction was fully introduced with the Pentium processor. It was also added to later 80486 processors.
[edit] Added with Pentium MMX
| Instruction | Meaning | Notes | 
|---|---|---|
| RDPMC | Read the PMC [Performance Monitoring Counter] | Specified in the ECX register into registers EDX:EAX | 
[edit] Added with Pentium Pro
Conditional MOV: CMOVA, CMOVAE, CMOVB, CMOVBE, CMOVC, CMOVE, CMOVG, CMOVGE, CMOVL, CMOVLE, CMOVNA, CMOVNAE, CMOVNB, CMOVNBE, CMOVNC, CMOVNE, CMOVNG, CMOVNGE, CMOVNL, CMOVNLE, CMOVNO, CMOVNP, CMOVNS, CMOVNZ, CMOVO, CMOVP, CMOVPE, CMOVPO, CMOVS, CMOVZ, SYSENTER (SYStem call ENTER), SYSEXIT (SYStem call EXIT), RDPMC*, UD2
- RDPMC was introduced in the Pentium Pro processor and the Pentium processor with MMX technology.
[edit] Added with AMD K6-2
SYSCALL, SYSRET (functionally equivalent to SYSENTER and SYSEXIT)
[edit] Added with SSE
MASKMOVQ, MOVNTPS, MOVNTQ, PREFETCH0, PREFETCH1, PREFETCH2, PREFETCHNTA, SFENCE (for Cacheability and Memory Ordering)
[edit] Added with SSE2
CLFLUSH, LFENCE, MASKMOVDQU, MFENCE, MOVNTDQ, MOVNTI, MOVNTPD, PAUSE (for Cacheability)
[edit] Added with SSE3
LDDQU (for Video Encoding)
MONITOR, MWAIT (for thread synchronization; only on processors supporting Hyper-threading and some dual-core processors like Core 2, Phenom and others)
[edit] Added with Intel VT
VMPTRLD, VMPTRST, VMCLEAR, VMREAD, VMWRITE, VMCALL, VMLAUNCH, VMRESUME, VMXOFF, VMXON
[edit] Added with AMD-V
CLGI, SKINIT, STGI, VMLOAD, VMMCALL, VMRUN, VMSAVE (SVM instructions of AMD-V)
[edit] Added with x86-64
CMPXCHG16B (CoMPaRe and eXCHanGe 16 bytes), RDTSCP (ReaD Time Stamp Counter and Processor ID)
[edit] Added with SSE4a
LZCNT, POPCNT (POPulation CouNT) - advanced bit manipulation
[edit] x87 floating-point instructions
[edit] Original 8087 instructions
F2XM1, FABS, FADD, FADDP, FBLD, FBSTP, FCHS, FCLEX, FCOM, FCOMP, FCOMPP, FDECSTP, FDISI, FDIV, FDIVP, FDIVR, FDIVRP, FENI, FFREE, FIADD, FICOM, FICOMP, FIDIV, FIDIVR, FILD, FIMUL, FINCSTP, FINIT, FIST, FISTP, FISUB, FISUBR, FLD, FLD1, FLDCW, FLDENV, FLDENVW, FLDL2E, FLDL2T, FLDLG2, FLDLN2, FLDPI, FLDZ, FMUL, FMULP, FNCLEX, FNDISI, FNENI, FNINIT, FNOP, FNSAVE, FNSAVEW, FNSTCW, FNSTENV, FNSTENVW, FNSTSW, FPATAN, FPREM, FPTAN, FRNDINT, FRSTOR, FRSTORW, FSAVE, FSAVEW, FSCALE, FSQRT, FST, FSTCW, FSTENV, FSTENVW, FSTP, FSTSW, FSUB, FSUBP, FSUBR, FSUBRP, FTST, FWAIT, FXAM, FXCH, FXTRACT, FYL2X, FYL2XP1
[edit] Added in specific processors
[edit] Added with 80287
FSETPM
[edit] Added with 80387
FCOS, FLDENVD, FNSAVED, FNSTENVD, FPREM1, FRSTORD, FSAVED, FSIN, FSINCOS, FSTENVD, FUCOM, FUCOMP, FUCOMPP
[edit] Added with Pentium Pro
- FCMOV variants: FCMOVB, FCMOVBE, FCMOVE, FCMOVNB, FCMOVNBE, FCMOVNE, FCMOVNU, FCMOVU
- FCOMI variants: FCOMI, FCOMIP, FUCOMI, FUCOMIP
[edit] Added with SSE
- FXRSTOR*, FXSAVE*
- Also supported on later Pentium IIs, though they do not contain SSE support
[edit] Added with SSE3
FISTTP (x87 to integer conversion)
[edit] Undocumented instructions
FFREEP performs FFREE ST(i) and pop stack
[edit] SIMD instructions
[edit] MMX instructions
added with Pentium MMX EMMS, MOVD, MOVQ, PACKSSDW, PACKSSWB, PACKUSWB, PADDB, PADDD, PADDSB, PADDSW, PADDUSB, PADDUSW, PADDW, PAND, PANDN, PCMPEQB, PCMPEQD, PCMPEQW, PCMPGTB, PCMPGTD, PCMPGTW, PMADDWD, PMULHW, PMULLW, POR, PSLLD, PSLLQ, PSLLW, PSRAD, PSRAW, PSRLD, PSRLQ, PSRLW, PSUBB, PSUBD, PSUBSB, PSUBSW, PSUBUSB, PSUBUSW, PSUBW, PUNPCKHBW, PUNPCKHDQ, PUNPCKHWD, PUNPCKLBW, PUNPCKLDQ, PUNPCKLWD, PXOR
[edit] MMX+ instructions
[edit] added with Athlon
Same as the SSE SIMD Integer Instructions which operated on MMX registers.
[edit] EMMX instructions
[edit] added with 6x86MX from Cyrix, deprecated now
PAVEB, PADDSIW, PMAGW, PDISTIB, PSUBSIW, PMVZB, PMULHRW, PMVNZB, PMVLZB, PMVGEZB, PMULHRIW, PMACHRIW
[edit] 3DNow! instructions
[edit] added with K6-2
FEMMS, PAVGUSB, PF2ID, PFACC, PFADD, PFCMPEQ, PFCMPGE, PFCMPGT, PFMAX, PFMIN, PFMUL, PFRCP, PFRCPIT1, PFRCPIT2, PFRSQIT1, PFRSQRT, PFSUB, PFSUBR, PI2FD, PMULHRW, PREFETCH, PREFETCHW
[edit] 3DNow!+ instructions
[edit] added with Athlon
PF2IW, PFNACC, PFPNACC, PI2FW, PSWAPD
[edit] added with Geode GX
PFRSQRTV, PFRCPV
[edit] SSE instructions
added with Pentium III also see integer instruction added with Pentium III
[edit] SSE SIMD Floating-Point Instructions
ADDPS, ADDSS, CMPPS, CMPSS, COMISS, CVTPI2PS, CVTPS2PI, CVTSI2SS, CVTSS2SI, CVTTPS2PI, CVTTSS2SI, DIVPS, DIVSS, LDMXCSR, MAXPS, MAXSS, MINPS, MINSS, MOVAPS, MOVHLPS, MOVHPS, MOVLHPS, MOVLPS, MOVMSKPS, MOVNTPS, MOVSS, MOVUPS, MULPS, MULSS, RCPPS, RCPSS, RSQRTPS, RSQRTSS, SHUFPS, SQRTPS, SQRTSS, STMXCSR, SUBPS, SUBSS, UCOMISS, UNPCKHPS, UNPCKLPS
[edit] SSE SIMD Integer Instructions
ANDNPS, ANDPS, ORPS, PAVGB, PAVGW, PEXTRW, PINSRW, PMAXSW, PMAXUB, PMINSW, PMINUB, PMOVMSKB, PMULHUW, PSADBW, PSHUFW, XORPS
| Instruction | Opcode | Meaning | Notes | 
|---|---|---|---|
| MOVUPS xmm1, xmm2/m128 | 0F 10 /r | Move Unaligned Packed Single-Precision Floating-Point Values | |
| MOVSS xmm1, xmm2/m32 | F3 0F 10 /r | Move Scalar Single-Precision Floating-Point Values | |
| MOVUPS xmm2/m128, xmm1 | 0F 11 /r | Move Unaligned Packed Single-Precision Floating-Point Values | |
| MOVSS xmm2/m32, xmm1 | F3 0F 11 /r | Move Scalar Single-Precision Floating-Point Values | |
| MOVLPS xmm, m64 | 0F 12 /r | Move Low Packed Single-Precision Floating-Point Values | |
| MOVHLPS xmm1, xmm2 | 0F 12 /r | Move Packed Single-Precision Floating-Point Values High to Low | |
| MOVLPS m64, xmm | 0F 13 /r | Move Low Packed Single-Precision Floating-Point Values | |
| UNPCKLPS xmm1, xmm2/m128 | 0F 14 /r | Unpack and Interleave Low Packed Single-Precision Floating-Point Values | |
| UNPCKHPS xmm1, xmm2/m128 | 0F 15 /r | Unpack and Interleave High Packed Single-Precision Floating-Point Values | |
| MOVHPS xmm, m64 | 0F 16 /r | Move High Packed Single-Precision Floating-Point Values | |
| MOVLHPS xmm1, xmm2 | 0F 16 /r | Move Packed Single-Precision Floating-Point Values Low to High | |
| MOVHPS m64, xmm | 0F 17 /r | Move High Packed Single-Precision Floating-Point Values | |
| PREFETCHNTA | 0F 18 /0 | Prefetch Data Into Caches (non-temporal data with respect to all cache levels) | |
| PREFETCH0 | 0F 18 /1 | Prefetch Data Into Caches (temporal data) | |
| PREFETCH1 | 0F 18 /2 | Prefetch Data Into Caches (temporal data with respect to first level cache) | |
| PREFETCH2 | 0F 18 /3 | Prefetch Data Into Caches (temporal data with respect to second level cache) | |
| NOP | 0F 1F /0 | No Operation | |
| MOVAPS xmm1, xmm2/m128 | 0F 28 /r | Move Aligned Packed Single-Precision Floating-Point Values | |
| MOVAPS xmm2/m128, xmm1 | 0F 29 /r | Move Aligned Packed Single-Precision Floating-Point Values | |
| CVTPI2PS xmm, mm/m64 | 0F 2A /r | Convert Packed Dword Integers to Packed Single-Precision FP Values | |
| CVTSI2SS xmm, r/m32 | F3 0F 2A /r | Convert Dword Integer to Scalar Single-Precision FP Value | |
| MOVNTPS m128, xmm | 0F 2B /r | Store Packed Single-Precision Floating-Point Values Using Non-Temporal Hint | |
| CVTTPS2PI mm, xmm/m64 | 0F 2C /r | Convert with Truncation Packed Single-Precision FP Values to Packed Dword Integers | |
| CVTTSS2SI r32, xmm/m32 | F3 0F 2C /r | Convert with Truncation Scalar Single-Precision FP Value to Dword Integer | |
| CVTPS2PI mm, xmm/m64 | 0F 2D /r | Convert Packed Single-Precision FP Values to Packed Dword Integers | |
| CVTSS2SI r32, xmm/m32 | F3 0F 2D /r | Convert Scalar Single-Precision FP Value to Dword Integer | |
| UCOMISS xmm1, xmm2/m32 | 0F 2E /r | Unordered Compare Scalar Single-Precision Floating-Point Values and Set EFLAGS | |
| COMISS xmm1, xmm2/m32 | 0F 2F /r | Compare Scalar Ordered Single-Precision Floating-Point Values and Set EFLAGS | |
| SQRTPS xmm1, xmm2/m128 | 0F 51 /r | Compute Square Roots of Packed Single-Precision Floating-Point Values | |
| SQRTSS xmm1, xmm2/m32 | F3 0F 51 /r | Compute Square Root of Scalar Single-Precision Floating-Point Value | |
| RSQRTPS xmm1, xmm2/m128 | 0F 52 /r | Compute Reciprocal of Square Root of Packed Single-Precision Floating-Point Value | |
| RSQRTSS xmm1, xmm2/m32 | F3 0F 52 /r | Compute Reciprocal of Square Root of Scalar Single-Precision Floating-Point Value | |
| RCPPS xmm1, xmm2/m128 | 0F 53 /r | Compute Reciprocal of Packed Single-Precision Floating-Point Values | |
| RCPSS xmm1, xmm2/m32 | F3 0F 53 /r | Compute Reciprocal of Scalar Single-Precision Floating-Point Values | |
| ANDPS xmm1, xmm2/m128 | 0F 54 /r | Bitwise Logical AND of Packed Single-Precision Floating-Point Values | |
| ANDNPS xmm1, xmm2/m128 | 0F 55 /r | Bitwise Logical AND NOT of Packed Single-Precision Floating-Point Values | |
| ORPS xmm1, xmm2/m128 | 0F 56 /r | Bitwise Logical OR of Single-Precision Floating-Point Values | |
| XORPS xmm1, xmm2/m128 | 0F 57 /r | Bitwise Logical XOR for Single-Precision Floating-Point Values | |
| ADDPS xmm1, xmm2/m128 | 0F 58 /r | Add Packed Single-Precision Floating-Point Values | |
| ADDSS xmm1, xmm2/m32 | F3 0F 58 /r | Add Scalar Single-Precision Floating-Point Values | |
| MULPS xmm1, xmm2/m128 | 0F 59 /r | Multiply Packed Single-Precision Floating-Point Values | |
| MULSS xmm1, xmm2/m32 | F3 0F 59 /r | Multiply Scalar Single-Precision Floating-Point Values | |
| SUBPS xmm1, xmm2/m128 | 0F 5C /r | Subtract Packed Single-Precision Floating-Point Values | |
| SUBSS xmm1, xmm2/m32 | F3 0F 5C /r | Subtract Scalar Single-Precision Floating-Point Values | |
| MINPS xmm1, xmm2/m128 | 0F 5D /r | Return Minimum Packed Single-Precision Floating-Point Values | |
| MINSS xmm1, xmm2/m32 | F3 0F 5D /r | Return Minimum Scalar Single-Precision Floating-Point Values | |
| DIVPS xmm1, xmm2/m128 | 0F 5E /r | Divide Packed Single-Precision Floating-Point Values | |
| DIVSS xmm1, xmm2/m32 | F3 0F 5E /r | Divide Scalar Single-Precision Floating-Point Values | |
| MAXPS xmm1, xmm2/m128 | 0F 5F /r | Return Maximum Packed Single-Precision Floating-Point Values | |
| MAXSS xmm1, xmm2/m32 | F3 0F 5F /r | Return Maximum Scalar Single-Precision Floating-Point Values | |
| PSHUFW mm1, mm2/m64, imm8 | 0F 70 /r ib | Shuffle Packed Words | |
| LDMXCSR m32 | 0F AE /2 | Load MXCSR Register State | |
| STMXCSR m32 | 0F AE /3 | Store MXCSR Register State | |
| SFENCE | 0F AE /7 | Store Fence | |
| CMPPS xmm1, xmm2/m128, imm8 | 0F C2 /r ib | Compare Packed Single-Precision Floating-Point Values | |
| CMPSS xmm1, xmm2/m32, imm8 | F3 0F C2 /r ib | Compare Scalar Single-Precision Floating-Point Values | |
| PINSRW mm, r32/m16, imm8 | 0F C4 /r | Insert Word | |
| PEXTRW r32, mm, imm8 | 0F C5 /r | Extract Word | |
| SHUFPS xmm1, xmm2/m128, imm8 | 0F C6 /r ib | Shuffle Packed Single-Precision Floating-Point Values | |
| PMOVMSKB r32, mm | 0F D7 /r | Move Byte Mask | |
| PMINUB mm1, mm2/m64 | 0F DA /r | Minimum of Packed Unsigned Byte Integers | |
| PMAXUB mm1, mm2/m64 | 0F DE /r | Maximum of Packed Unsigned Byte Integers | |
| PAVGB mm1, mm2/m64 | 0F E0 /r | Average Packed Integers | |
| PAVGW mm1, mm2/m64 | 0F E3 /r | Average Packed Integers | |
| PMULHUW mm1, mm2/m64 | 0F E4 /r | Multiply Packed Unsigned Integers and Store High Result | |
| MOVNTQ m64, mm | 0F E7 /r | Store of Quadword Using Non-Temporal Hint | |
| PMINSW mm1, mm2/m64 | 0F EA /r | Minimum of Packed Signed Word Integers | |
| PMAXSW mm1, mm2/m64 | 0F EE /r | Maximum of Packed Signed Word Integers | |
| PSADBW mm1, mm2/m64 | 0F F6 /r | Compute Sum of Absolute Differences | |
| MASKMOVQ mm1, mm2 | 0F F7 /r | Store Selected Bytes of Quadword | 
[edit] SSE2 instructions
added with Pentium 4 also see integer instructions added with Pentium 4
[edit] SSE2 SIMD Floating-Point Instructions
ADDPD, ADDSD, ANDNPD, ANDPD, CMPPD, CMPSD*, COMISD, CVTDQ2PD, CVTDQ2PS, CVTPD2DQ, CVTPD2PI, CVTPD2PS, CVTPI2PD, CVTPS2DQ, CVTPS2PD, CVTSD2SI, CVTSD2SS, CVTSI2SD, CVTSS2SD, CVTTPD2DQ, CVTTPD2PI, CVTPS2DQ, CVTTSD2SI, DIVPD, DIVSD, MAXPD, MAXSD, MINPD, MINSD, MOVAPD, MOVHPD, MOVLPD, MOVMSKPD, MOVSD*, MOVUPD, MULPD, MULSD, ORPD, SHUFPD, SQRTPD, SQRTSD, SUBPD, SUBSD, UCOMISD, UNPCKHPD, UNPCKLPD, XORPD
- CMPSD and MOVSD have the same name as the string instruction mnemonics CMPSD (CMPS) and MOVSD (MOVS), however, the former refer to scalar double-precision floating-points whereas the latters refer to doubleword strings.
[edit] SSE2 SIMD Integer Instructions
MOVDQ2Q, MOVDQA, MOVDQU, MOVQ2DQ, PADDQ, PSUBQ, PMULUDQ, PSHUFHW, PSHUFLW, PSHUFD, PSLLDQ, PSRLDQ, PUNPCKHQDQ, PUNPCKLQDQ
[edit] SSE3 instructions
added with Pentium 4 supporting SSE3 also see integer and floating-point instructions added with Pentium 4 SSE3
[edit] SSE3 SIMD Floating-Point Instructions
- ADDSUBPD, ADDSUBPS (for Complex Arithmetic)
- HADDPD, HADDPS, HSUBPD, HSUBPS (for Graphics)
- MOVDDUP, MOVSHDUP, MOVSLDUP (for Complex Arithmetic)
[edit] SSSE3 instructions
added with Xeon 5100 series and initial Core 2
- PSIGNW, PSIGND, PSIGNB
- PSHUFB
- PMULHRSW, PMADDUBSW
- PHSUBW, PHSUBSW, PHSUBD
- PHADDW, PHADDSW, PHADDD
- PALIGNR
- PABSW, PABSD, PABSB
[edit] SSE4 instructions
[edit] SSE4.1
added with Core 2 x9000 series
- MPSADBW
- PHMINPOSUW
- PMULLD, PMULDQ
- DPPS, DPPD
- BLENDPS, BLENDPD, BLENDVPS, BLENDVPD, PBLENDVB, PBLENDW
- PMINSB, PMAXSB, PMINUW, PMAXUW, PMINUD, PMAXUD, PMINSD, PMAXSD
- ROUNDPS, ROUNDSS, ROUNDPD, ROUNDSD
- INSERTPS, PINSRB, PINSRD/PINSRQ, EXTRACTPS, PEXTRB, PEXTRW, PEXTRD/PEXTRQ
- PMOVSXBW, PMOVZXBW, PMOVSXBD, PMOVZXBD, PMOVSXBQ, PMOVZXBQ, PMOVSXWD, PMOVZXWD, PMOVSXWQ, PMOVZXWQ, PMOVSXDQ, *PMOVZXDQ
- PTEST
- PCMPEQQ
- PACKUSDW
- MOVNTDQA
[edit] SSE4a
added with Phenom processors
- EXTRQ/INSERTQ
- MOVNTSD/MOVNTSS
[edit] SSE4.2
to be added with Nehalem processors
- CRC32
- PCMPESTRI
- PCMPESTRM
- PCMPISTRI
- PCMPISTRM
- PCMPGTQ
[edit] FMA instructions
| Instruction | Opcode | Meaning | Notes | 
|---|---|---|---|
| VFMADDPD xmm0, xmm1, xmm2, xmm3 | C4E3 WvvvvL01 69 /r /is4 | Fused Multiply-Add of Packed Double-Precision Floating-Point Values | |
| VFMADDPS xmm0, xmm1, xmm2, xmm3 | C4E3 WvvvvL01 68 /r /is4 | Fused Multiply-Add of Packed Single-Precision Floating-Point Values | |
| VFMADDSD xmm0, xmm1, xmm2, xmm3 | C4E3 WvvvvL01 6B /r /is4 | Fused Multiply-Add of Scalar Double-Precision Floating-Point Values | |
| VFMADDSS xmm0, xmm1, xmm2, xmm3 | C4E3 WvvvvL01 6A /r /is4 | Fused Multiply-Add of Scalar Single-Precision Floating-Point Values | |
| VFMADDSUBPD xmm0, xmm1, xmm2, xmm3 | C4E3 WvvvvL01 5D /r /is4 | Fused Multiply-Alternating Add/Subtract of Packed Double-Precision Floating-Point Values | |
| VFMADDSUBPS xmm0, xmm1, xmm2, xmm3 | C4E3 WvvvvL01 5C /r /is4 | Fused Multiply-Alternating Add/Subtract of Packed Single-Precision Floating-Point Values | |
| VFMSUBADDPD xmm0, xmm1, xmm2, xmm3 | C4E3 WvvvvL01 5F /r /is4 | Fused Multiply-Alternating Subtract/Add of Packed Double-Precision Floating-Point Values | |
| VFMSUBADDPS xmm0, xmm1, xmm2, xmm3 | C4E3 WvvvvL01 5E /r /is4 | Fused Multiply-Alternating Subtract/Add of Packed Single-Precision Floating-Point Values | |
| VFMSUBPD xmm0, xmm1, xmm2, xmm3 | C4E3 WvvvvL01 6D /r /is4 | Fused Multiply-Subtract of Packed Double-Precision Floating-Point Values | |
| VFMSUBPS xmm0, xmm1, xmm2, xmm3 | C4E3 WvvvvL01 6C /r /is4 | Fused Multiply-Subtract of Packed Single-Precision Floating-Point Values | |
| VFMSUBSD xmm0, xmm1, xmm2, xmm3 | C4E3 WvvvvL01 6F /r /is4 | Fused Multiply-Subtract of Scalar Double-Precision Floating-Point Values | |
| VFMSUBSS xmm0, xmm1, xmm2, xmm3 | C4E3 WvvvvL01 6E /r /is4 | Fused Multiply-Subtract of Scalar Single-Precision Floating-Point Values | |
| VFNMADDPD xmm0, xmm1, xmm2, xmm3 | C4E3 WvvvvL01 79 /r /is4 | Fused Negative Multiply-Add of Packed Double-Precision Floating-Point Values | |
| VFNMADDPS xmm0, xmm1, xmm2, xmm3 | C4E3 WvvvvL01 78 /r /is4 | Fused Negative Multiply-Add of Packed Single-Precision Floating-Point Values | |
| VFNMADDSD xmm0, xmm1, xmm2, xmm3 | C4E3 WvvvvL01 7B /r /is4 | Fused Negative Multiply-Add of Scalar Double-Precision Floating-Point Values | |
| VFNMADDSS xmm0, xmm1, xmm2, xmm3 | C4E3 WvvvvL01 7A /r /is4 | Fused Negative Multiply-Add of Scalar Single-Precision Floating-Point Values | |
| VFNMSUBPD xmm0, xmm1, xmm2, xmm3 | C4E3 WvvvvL01 7D /r /is4 | Fused Negative Multiply-Subtract of Packed Double-Precision Floating-Point Values | |
| VFNMSUBPS xmm0, xmm1, xmm2, xmm3 | C4E3 WvvvvL01 7C /r /is4 | Fused Negative Multiply-Subtract of Packed Single-Precision Floating-Point Values | |
| VFNMSUBSD xmm0, xmm1, xmm2, xmm3 | C4E3 WvvvvL01 7F /r /is4 | Fused Negative Multiply-Subtract of Scalar Double-Precision Floating-Point Values | |
| VFNMSUBSS xmm0, xmm1, xmm2, xmm3 | C4E3 WvvvvL01 7E /r /is4 | Fused Negative Multiply-Subtract of Scalar Single-Precision Floating-Point Values | 
[edit] Undocumented instructions
The x86 CPUs contain undocumented instructions which are implemented on the chips but never listed in any official available document.
| mnemonic | opcode | description | undoc status | 
|---|---|---|---|
| AAM imm8 | D4 imm8 | Divide AL by imm8, put the quotient in AH, and the remainder in AL | Available beginning with 8086, documented since Pentium (earlier documentation lists no arguments) | 
| AAD imm8 | D5 imm8 | Multiplication counterpart of AAM | Available beginning with 8086, documented since Pentium (earlier documentation lists no arguments) | 
| SALC | D6 | Set AL depending on the value of the Carry Flag | Available beginning with 8086, but only documented since Pentium Pro. | 
| ICEBP | F1 | Single byte single-step exception / Invoke ICE | Available beginning with 80386, documented (as INT1) since Pentium Pro | 
| LOADALL | 0F 05 | Loads All Registers from Memory Address 0x000800H | Only available on 80286 | 
| LOADALLD | 0F 07 | Loads All Registers from Memory Address ES:EDI | Only available on 80386 | 
| POP CS | 0F | Pop top of the stack into CS Segment register | Only available on 8086. Beginning with 80286 this opcode is used as a prefix for 2-Byte-Instructions | 
[edit] References
- Intel Architecture Software Developer's Manual - Volume 2 (Pentium Pro edition)
[edit] External links
|  | The Wikibook X86 Assembly has a page on the topic of | 

