MIPS architecture

From Wikipedia, the free encyclopedia

A MIPS R4400 microprocessor made by Toshiba.

MIPS (originally an acronym for Microprocessor without Interlocked Pipeline Stages) is a reduced Instruction set computing (RISC) instruction set architecture (ISA) developed by MIPS Computer Systems (now MIPS Technologies). In the mid to late 1990s, it was estimated that one in three RISC microprocessors produced were MIPS implementations.^{[citation needed]}

MIPS implementations are currently primarily used in many embedded systems such as the Series2 TiVo, Windows CE devices, Cisco routers, residential gateways, Foneras, Avaya, and video game consoles like the Nintendo 64 and Sony PlayStation, PlayStation 2, and PlayStation Portable handheld system. Until late 2006 they were also used in many of SGI's computer products. MIPS implementations were also used by Digital Equipment Corporation, NEC, Siemens Nixdorf, Tandem and others during the late 1980s and 1990s.

The early MIPS architectures were 32-bit (generally 32-bit wide registers and data paths), while later versions were 64-bit. Multiple revisions of the MIPS instruction set exist, including MIPS I, MIPS II, MIPS III, MIPS IV, MIPS V, MIPS32, and MIPS64. The current revisions are MIPS32 (for 32-bit implementations) and MIPS64 (for 64-bit implementations). MIPS32 and MIPS64 define a control register set as well as the instruction set. Several "add-on" extensions are also available, including MIPS-3D which is a simple set of floating-point SIMD instructions dedicated to common 3D tasks, MDMX (MaDMaX) which is a more extensive integer SIMD instruction set using the 64-bit floating-point registers, MIPS16e which adds compression to the instruction stream to make programs take up less room (allegedly a response to the Thumb encoding in the ARM architecture), and the recent addition of MIPS MT, new multithreading additions to the system similar to HyperThreading in the Intel's Pentium 4 processors.

Computer architecture courses in universities and technical schools often study the MIPS architecture. The architecture greatly influenced later RISC architectures such as Alpha (previously Alpha AXP).

[edit] History

[edit] RISC Pioneer

A MIPS microprocessor Orion R4600 made by IDT.

In 1981, a team led by John L. Hennessy at Stanford University started work on what would become the first MIPS processor. The basic concept was to increase performance through the use of deep instruction pipelines. Pipelining as a basic technique was well known before (see IBM 801 for instance), but not developed into its full potential. CPUs are built up from a number of dedicated sub-units such as instruction decoders, ALUs (integer arithmetics and logic), load/store units (handling memory), and so on. In a traditional non-optimized design, a particular instruction in a program sequence must be (almost) completed before the next can start to "flow" from one unit to another; in a pipelined architecture, successive instructions instead overlaps in execution. For instance, at the same time a math instruction is fed into the floating point unit, the load/store unit can fetch the next instruction.

One major barrier to pipelining was that some instructions, like division, take longer to complete and the CPU therefore has to wait before passing the next instruction into the pipeline. One solution to this problem is to use a series of interlocks that allows stages to indicate that they are busy, pausing the other stages upstream. Hennessy's team viewed these interlocks as a major performance barrier since they had to communicate to all the modules in the CPU which takes time, and appeared to limit the clock speed. A major aspect of the MIPS design was to fit every sub-phase, including cache-access, of all instructions into one cycle, thereby removing any needs for interlocking, and permitting a single cycle throughput.

Although this design eliminated a number of useful instructions such as multiply and divide it was felt that the overall performance of the system would be dramatically improved because the chips could run at much higher clock rates. This ramping of the speed would be difficult with interlocking involved, as the time needed to set up locks is as much a function of die size as clock rate. The elimination of these instructions became a contentious point.

The other difference between the MIPS design and the competing Berkeley RISC involved the handling of subroutine calls. RISC used a technique called register windows to improve performance of these very common tasks, but this limited the maximum depth of multi-level calls. Each subroutine call required its own set of registers, which in turn required more real estate on the CPU and more complexity in its design. Hennessy felt that a careful compiler could find free registers without resorting to a hardware implementation, and that simply increasing the number of registers would not only make this simple, but increase the performance of all tasks.

In other ways the MIPS design was very much a typical RISC design. To save bits in the instruction word, RISC designs reduce the number of instructions to encode. The MIPS design uses 6 bits of the 32-bit word for the basic opcode;^[1] the rest may contain a single 26-bit jump address or it may have up to four 5-bit fields specifying up to three registers plus a shift value combined with another 6-bits of opcode; another format, among several, specifies two registers combined with a 16-bit immediate value, etc. This allowed this CPU to load up the instruction and the data it needed in a single cycle, whereas an (older) non-RISC design, such as the MOS Technology 6502 for instance, required separate cycles to load the opcode and the data. This was one of the major performance improvements that RISC offered. However, modern non-RISC designs achieves this speed by other means (such as queues in the CPU).

In 1984 Hennessy was convinced of the future commercial potential of the design, and left Stanford to form MIPS Computer Systems. They released their first design, the R2000, in 1985, improving the design as the R3000 in 1988. These 32-bit CPUs formed the basis of their company through the 1980s, used primarily in SGI's series of workstations. These commercial designs deviated from the Stanford academic research by implementing most of the interlocks in hardware, supplying full multiply and divide instructions (among others).

In 1991 MIPS released the first 64-bit microprocessor, the R4000. R4000 has advanced TLB where the entry contains not just virtual address but also the virtual address space id. Such buffer eliminates the major performance problems from microkernels^[2] that are slow on competing architectures (Pentium, PowerPC, Alpha) because of the need to flush the TLB on the frequent context switches. However, MIPS had financial difficulties while bringing it to market. The design was so important to SGI, at the time one of MIPS' few major customers, that SGI bought the company outright in 1992 in order to guarantee the design would not be lost. As a subsidiary of SGI, the company became known as MIPS Technologies

[edit] Licensable Architecture

In the early 1990s MIPS started licensing their designs to third-party vendors. This proved fairly successful due to the simplicity of the core, which allowed it to be used in a number of applications that would have formerly used much less capable CISC designs of similar gate count and price -- the two are strongly related; the price of a CPU is generally related to the number of gates and the number of external pins. Sun Microsystems attempted to enjoy similar success by licensing their SPARC core but was not nearly as successful. By the late 1990s MIPS was a powerhouse in the embedded processor field, and in 1997 the 48-millionth MIPS-based CPU shipped, making it the first RISC CPU to outship the famous 68k family. MIPS was so successful that SGI spun-off MIPS Technologies in 1998. Fully half of MIPS' income today comes from licensing their designs, while much of the rest comes from contract design work on cores that will then be produced by third parties.

In 1999 MIPS formalized their licensing system around two basic designs, the 32-bit MIPS32 (based on MIPS II with some additional features from MIPS III, MIPS IV, and MIPS V) and the 64-bit MIPS64 (based on MIPS V). NEC, Toshiba and SiByte (later acquired by Broadcom) each obtained licenses for the MIPS64 as soon as it was announced. Philips, LSI Logic and IDT have since joined them. Success followed success, and today the MIPS cores are one of the most-used "heavyweight" cores in the marketplace for computer-like devices (hand-held computers, set-top boxes, etc.), with other designers fighting it out for other niches. Some indication of their success is the fact that Freescale (spun-off by Motorola) uses MIPS cores in their set-top box designs, instead of their own PowerPC-based cores.

Since the MIPS architecture is licensable, it has attracted several processor start-up companies over the years. One of the first start-ups to design MIPS processors was Quantum Effect Devices (see next section). The MIPS design team that designed the R4300 started the company SandCraft, which designed the R5432 for NEC and later produced the SR71000, one of the first out-of-order execution processors for the embedded market. The original DEC StrongARM team eventually split into two MIPS-based start-ups: SiByte which produced the SB-1250, one of the first high-performance MIPS-based systems-on-a-chip (SOC); while Alchemy Semiconductor (later acquired by AMD) produced the Au-1000 SoC for low-power applications. Lexra used a MIPS-like architecture and added DSP extensions for the audio chip market and multithreading support for the networking market. Due to Lexra not licensing the architecture, two lawsuits were started between the two companies. The first was quickly resolved when Lexra promised not to advertise their processors as MIPS-compatible. The second (about MIPS patent 4814976 for handling unaligned memory access) was protracted, hurt both companies' business, and culminated in MIPS Technologies giving Lexra a free license and a large cash payment.

Two companies have emerged that specialize in building multi-core devices using the MIPS architecture. Raza Microelectronics, Inc. purchased the product line from failing SandCraft and later produced devices that contained eight cores that were targeted at the telecommunications and networking markets. Cavium Networks, originally a security processor vendor also produced devices with eight CPU cores for the same markets. Both of these companies designed their cores in-house, just licensing the architecture instead of purchasing cores from MIPS.

[edit] Losing the Desktop

Among the manufacturers which have made computer workstation systems using MIPS processors are SGI, MIPS Computer Systems, Inc., Whitechapel Workstations, Olivetti, Siemens-Nixdorf, Acer, Digital Equipment Corporation, NEC, and DeskStation. Operating systems ported to the architecture include SGI's IRIX, Microsoft's Windows NT (until v4.0), Windows CE, Linux, BSD, UNIX System V, SINIX and MIPS Computer Systems' own RISC/os.

There was speculation in the early 1990s that MIPS and other powerful RISC processors would overtake the Intel IA32 architecture. This was encouraged by the support of the first two versions of Microsoft's Windows NT for DEC Alpha, MIPS and PowerPC - and to a lesser extent the Clipper architecture and SPARC. However, as Intel quickly released faster versions of their Pentium class CPUs, Microsoft Windows NT v4.0 dropped support for anything but Intel and Alpha. With SGI's decision to transition to the Itanium and IA32 architectures, use of MIPS processors on the desktop has now disappeared almost completely^[3].

See main article Advanced Computing Environment.

[edit] Embedded markets

Through the 1990s, the MIPS architecture was widely adopted by the embedded market, including for use in computer networking/telecommunications, video arcade games, home video game consoles, computer printers, digital set-top boxes, digital televisions, DSL and cable modems, and personal digital assistants.

The low power-consumption and heat characteristics of embedded MIPS implementations, the wide availability of embedded development tools, and knowledge about the architecture means use of MIPS microprocessors in embedded roles is likely to remain common.

[edit] Synthesizeable Cores for Embedded Markets

In recent years most of the technology used in the various MIPS generations has been offered as IP-cores (building-blocks) for embedded processor designs. Both 32-bit and 64-bit basic cores are offered, known as the 4K and 5K respectively, and the design itself can be licensed as MIPS32 and MIPS64. These cores can be mixed with add-in units such as FPUs, SIMD systems, various input/output devices, etc.

MIPS cores have been commercially successful, now being used in many consumer and industrial applications. MIPS cores can be found in newer Cisco, Linksys and Mikrotik's routerboard routers, cable modems and ADSL modems, smartcards, laser printer engines, set-top boxes, robots, handheld computers, Sony PlayStation 2 and Sony PlayStation Portable. In cellphone/PDA applications, the MIPS core has been unable to displace the incumbent, competing ARM core.

MIPS architecture processors include: IDT RC32438; ATI Xilleon; Alchemy Au1000, 1100, 1200; Broadcom Sentry5; RMI XLR7xx, Cavium Octeon CN30xx, CN31xx, CN36xx, CN38xx and CN5xxx; Infineon Technologies EasyPort, Amazon, Danube, ADM5120, WildPass, INCA-IP, INCA-IP2; NEC EMMA and EMMA2, NEC VR4181A, VR4121, VR4122, VR4181A, VR5432, VR5500; Oak Technologies Generation; PMC-Sierra RM11200; QuickLogic QuickMIPS ESP; Toshiba "Donau", Toshiba TMPR492x, TX4925, TX9956, TX7901.

[edit] MIPS based Supercomputers

One of the more interesting applications of the MIPS architecture is its use in massive processor count supercomputers. Silicon Graphics (SGI) refocused its business from desktop graphics workstations to the high performance computing (HPC) market in the early 1990s. The success of the company's first foray into server systems, the Challenge series based on the R4400 and R8000, and later R10000, motivated SGI to create a vastly more powerful system. The introduction of the integrated R10000 allowed SGI to produce a system, the Origin 2000, eventually scalable to 1024 CPUs using its NUMAlink cc-NUMA interconnect. The Origin 2000 begat the Origin 3000 series which topped out with the same 1024 maximum CPU count but using the R14000 and R16000 chips up to 700 MHz. Its MIPS based supercomputers were withdrawn in 2005 when SGI made the strategic decision to move to Intel's IA-64 architecture.

An HPC startup introduced a radical MIPS based supercomputer in 2007. SiCortex, Inc. has created a tightly integrated Linux cluster supercomputer based on the MIPS64 architecture and a high performance interconnect based on the Kautz digraph topology. The system is very power efficient and computationally powerful. The most unique aspect of the system is its multicore processing node which integrates six MIPS64 cores, a crossbar memory controller, interconnect DMA engine, Gigabit Ethernet and PCI Express controllers all on a single chip which consumes only 10 watts of power, yet has a peak floating point performance of 6 GFLOPs. The most powerful configuration, the SC5832, is a single cabinet supercomputer consisting of 972 such node chips for a total of 5832 MIPS64 processor cores and 8.2 teraFLOPS of peak performance.

[edit] CPU family

Pipelined MIPS, showing the five stages (instruction fetch, instruction decode, execute, memory access and write back)

The first commercial MIPS CPU model, the R2000, was announced in 1985. It added multiple-cycle multiply and divide instructions in a somewhat independent on-chip unit. New instructions were added to retrieve the results from this unit back to the execution core; these result-retrieving instructions were interlocked.

The R2000 could be booted either big-endian or little-endian. It had thirty-two 32-bit general purpose registers, but no condition code register (the designers considered it a potential bottleneck), a feature it shares with the AMD 29000 and the Alpha. Unlike other registers, the program counter is not directly accessible.

The R2000 also had support for up to four co-processors, one of which was built into the main CPU and handled exceptions, traps and memory management, while the other three were left for other uses. One of these could be filled by the optional R2010 FPU, which had thirty-two 32-bit registers that could be used as sixteen 64-bit registers for double-precision.

The R3000 succeeded the R2000 in 1988, adding 32 KB (soon increased to 64 KB) caches for instructions and data, along with cache coherency support for multiprocessor use. While there were flaws in the R3000's multiprocessor support, it still managed to be a part of several successful multiprocessor designs. The R3000 also included a built-in MMU, a common feature on CPUs of the era. The R3000, like the R2000, could be paired with a R3010 FPU. The R3000 was the first successful MIPS design in the marketplace, and eventually over one million were made. A speed-bumped version of the R3000 running up to 40 MHz, the R3000A delivered a performance of 32 VUPs (VAX Unit of Performance). The R3000A was the processor used in the extremely successful Sony PlayStation. Third-party designs include Performance Semiconductor's R3400 and IDT's R3500, both of them were R3000As with an integrated R3010 FPU. Toshiba's R3900 was a virtually first SoC for the early handheld PCs based on the Windows CE. A radiation-hardened variant for space applications, the Mongoose-V, is a R3000 with an integrated R3010 FPU.

The R4000 series, released in 1991, extended the MIPS instruction set to a full 64-bit architecture, moved the FPU onto the main die to create a single-chip microprocessor, and operated at a radically high internal clock speed (it was introduced at 100 MHz). However, in order to achieve the clock speed the caches were reduced to 8 KB each and they took three cycles to access. The high operating frequencies were achieved through the technique of deep pipelining (called super-pipelining at the time). With the introduction of the R4000 a number of improved versions soon followed, including the R4400 (1993) which included 16 KB caches, largely bug-free 64-bit operation, and support for a larger external level 2 cache.

MIPS, now a division of SGI called MTI, designed the lower-cost R4200, and later the even lower cost R4300, which was the R4200 with a 32-bit external bus. The Nintendo 64 used a NEC VR4300 CPU that was based upon the low-cost MIPS R4300i.^[4]

bottom-side view of package of R4700 Orion with the exposed silicon chip, fabricated by IDT, designed by Quantum Effect Devices

topside view of package for R4700 Orion

Quantum Effect Devices (QED), a separate company started by former MIPS employees, designed the R4600 "Orion", the R4700 "Orion", the R4650 and the R5000. Where the R4000 had pushed clock frequency and sacrificed cache capacity, the QED designs emphasized large caches which could be accessed in just two cycles and efficient use of silicon area. The R4600 and R4700 were used in low-cost versions of the SGI Indy workstation as well as the first MIPS based Cisco routers, such as the 36x0 and 7x00-series routers. The R4650 was used in the original WebTV set-top boxes (now Microsoft TV). The R5000 FPU had more flexible single precision floating-point scheduling than the R4000, and as a result, R5000-based SGI Indys had much better graphics performance than similarly clocked R4400 Indys with the same graphics hardware. SGI gave the old graphics board a new name when it was combined with R5000 in order to emphasize the improvement. QED later designed the RM7000 and RM9000 family of devices for embedded markets like networking and laser printers. QED was acquired by the semiconductor manufacturer PMC-Sierra in August 2000, the latter company continuing to invest in the MIPS architecture. The RM7000 included an on-board 256 kB level 2 cache and a controller for optional level three cache. The RM9xx0 were a family of SOC devices which included northbridge peripherals such as memory controller, PCI controller, gigabit ethernet controller and fast IO such as a hypertransport port.

The R8000 (1994) was the first superscalar MIPS design, able to execute two integer or floating point and two memory instructions per cycle. The design was spread over six chips: an integer unit (with 16 KB instruction and 16 KB data caches), a floating-point unit, three full-custom secondary cache tag RAMs (two for secondary cache accesses, one for bus snooping), and a cache controller ASIC. The design had two fully pipelined double precision multiply-add units, which could stream data from the 4 MB off-chip secondary cache. The R8000 powered SGI's POWER Challenge servers in the mid 1990s and later became available in the POWER Indigo2 workstation. Although its FPU performance fit scientific users quite well, its limited integer performance and high cost dampened appeal for most users, and the R8000 was in the marketplace for only a year and remains fairly rare.

In 1995, the R10000 was released. This processor was a single-chip design, ran at a faster clock speed than the R8000, and had larger 32 KB primary instruction and data caches. It was also superscalar, but its major innovation was out-of-order execution. Even with a single memory pipeline and simpler FPU, the vastly improved integer performance, lower price, and higher density made the R10000 preferable for most customers.

Recent designs have all been based upon R10000 core. The R12000 used a 0.25 micrometre process to shrink the chip and achieve higher clock rates. The revised R14000 allowed higher clock rates with additional support for DDR SRAM in the off-chip cache, and a faster system bus clocked to 200 MHz for better throughput. Later iterations are named the R16000 and the R16000A and feature increased clock speed, additional L1 cache, and smaller die manufacturing compared with before.

Other members of the MIPS family include the R6000, an ECL implementation of the MIPS architecture which was produced by Bipolar Integrated Technology. The R6000 microprocessor introduced the MIPS II instruction set. Its TLB and cache architecture are different from all other members of the MIPS family. The R6000 did not deliver the promised performance benefits, and although it saw some use in Control Data machines, it quickly disappeared from the mainstream market.

**MIPS Microprocessors**
Model	Frequency (MHz)	Year	Process (µm)	Transistors (Millions)	Die Size (mm²)	Pin Count	Power (W)	Voltage	Dcache (KB)	Icache (KB)	L2 Cache	L3 Cache
R2000	8-16.67	1985	2.0	0.11	?	?	?	?	32	64	None	None
R3000	12-40	1988	1.2	0.11	66.12	145	4	?	64	64	0-256 KB External	None
R4000	100	1991	0.8	1.35	213	179	15	5	8	8	1 MB External	None
R4400	100-250	1992	0.6	2.3	186	179	15	5	16	16	1-4 MB External	None
R4600	100-133	1994	0.64	2.2	77	179	4.6	5	16	16	512 KB External	None
R5000	150-200	1996	0.35	3.7	84	223	10	3.3	32	32	1 MB External	None
R8000	75-90	1994	0.7	2.6	299	591+591	30	3.3	16	16	4 MB External	None
R10000	150-250	1996	0.35, 0.25	6.7	299	599	30	3.3	32	32	1-4 MB External	None
R12000	270-400	1998	0.25, 0.18	6.9	204	600	20	4	32	32	2-8 MB External	None
RM7000	250-600	1998	0.25, 0.18, 0.13	18	91	304	10, 6, 3	3.3, 2.5, 1.5	16	16	256 KB Internal	1 MB External
R14000	500-600	2001	0.13	7.2	204	527	17	?	32	32	2-4 MB External	None
R16000	700-1000	2002	0.11	?	?	?	20	?	64	64	4-16 MB External	None

Note: These specifications are for common processor models. Variations exist, especially in Level 2 cache.

Note: The R8000 has a unique cache hierarchy named 'Data Streaming Cache' where there is 16 KB of L1 data cache for the integer chip with an external 4 MB L2 cache that served as the secondary unified cache for the integer chip but as the L1 data cache for the floating point chip.

[edit] Summary of R3000 instruction set Opcodes

Instructions are divided into three types: R, I and J. Every instruction starts with a 6-bit opcode. In addition to the opcode, R-type instructions specify three registers, a shift amount field, and a function field; I-type instructions specify two registers and a 16-bit immediate value; J-type instructions follow the opcode with a 26-bit jump target.^[5]^[6]

The following are the three formats used for the core instruction set:

Type	-31- format (bits) -0-
R	opcode (6)	rs (5)	rt (5)	rd (5)	shamt (5)	funct (6)
I	opcode (6)	rs (5)	rt (5)	immediate (16)
J	opcode (6)	address (26)

[edit] MIPS Assembly Language

These are assembly language instructions that have direct hardware implementation, as opposed to pseudoinstructions which are translated into multiple real instructions before being assembled.

In the following, the register letters d, t, and s are placeholders for (register) numbers or register names.
"C" denotes a constant ("immediate").
All the following instructions are native instructions.
Opcodes and funct codes are in hexadecimal.
The MIPS32 Instruction Set states that the word unsigned as part of Add and Subtract instructions, is a misnomer. The difference between signed and unsigned versions of commands is not a sign extension (or lack thereof) of the operands, but controls whether a trap is executed on overflow (e.g. Add) or an overflow is ignored (Add unsigned). An immediate operand CONST to these instructions is always sign-extended.

[edit] Integer

MIPS has 32 integer ("fast") registers. Data must be in registers to perform arithmetic. Register $0 always holds 0 and register $1 is normally reserved for the assembler (for handling pseudo instructions and large constants).

The encoding shows which bits correspond to which parts of the instruction. A hyphen (-) is used to indicate don't cares.

Category	Name	Instruction syntax	Meaning	Format/opcode/funct			Notes/Encoding
Arithmetic	Add	add $d,$s,$t	$d = $s + $t	R	0	$2016$	adds two registers, executes a trap on overflow 000000ss sssttttt ddddd--- --100000
	Add unsigned	addu $d,$s,$t	$d = $s + $t	R	0	$2116$	as above but ignores an overflow 000000ss sssttttt ddddd--- --100001
	Subtract	sub $d,$s,$t	$d = $s - $t	R	0	$2216$	subtracts two registers, executes a trap on overflow 000000ss sssttttt ddddd--- --100010
	Subtract unsigned	subu $d,$s,$t	$d = $s - $t	R	0	$2316$	as above but ignores an overflow 000000ss sssttttt ddddd000 00100011
	Add immediate	addi $t,$s,C	$t = $s + C (signed)	I	$816$	-	Used to add sign-extended constants (and also to copy one register to another "addi $1, $2, 0"), executes a trap on overflow 001000ss sssttttt CCCCCCCC CCCCCCCC
	Add immediate unsigned	addiu $t,$s,C	$t = $s + C (unsigned)	I	$916$	-	as above but ignores an overflow, C still sign-extended 001001ss sssttttt CCCCCCCC CCCCCCCC
	Multiply	mult $x,$y	LO = (($x * $y) << 32) >> 32; HI = ($x * $y) >> 32;	R	0	$1816$	Multiplies two registers and puts the 64-bit result in two special memory spots - LO and HI. Alternatively, one could say the result of this operation is: (int HI,int LO) = (64-bit) $x * $y . See mfhi and mflo for accessing LO and HI regs.
	Divide	div $x, $y	LO = $x / $y HI = $x % $y	R	0	1A $16$	Divides two registers and puts the 32-bit integer result in LO and the remainder in HI.^[5]
	Divide unsigned	divu $x, $y	LO = $x / $y HI = $x % $y	R	0	1B $16$	Divides two registers and puts the 32-bit integer result in LO and the remainder in HI.
Data Transfer	Load double word	ld $x,C($y)	$x = Memory[$y + C]	I	$2316$	-	loads the word stored from: MEM[$y+C] and the following 7 bytes to $x and the next register.
	Load word	lw $x,C($y)	$x = Memory[$y + C]	I	$2316$	-	loads the word stored from: MEM[$y+C] and the following 3 bytes.
	Load halfword	lh $x,C($y)	$x = Memory[$y + C] (signed)	I	$2116$	-	loads the halfword stored from: MEM[$y+C] and the following byte. Sign is extended to width of register.
	Load halfword unsigned	lhu $x,C($y)	$x = Memory[$y + C] (unsigned)	I	$2516$	-	As above without sign extension.
	Load byte	lb $x,C($y)	$x = Memory[$y + C] (signed)	I	$2016$	-	loads the byte stored from: MEM[$y+C].
	Load byte unsigned	lbu $x,C($y)	$x = Memory[$y + C] (unsigned)	I	$2416$	-	As above without sign extension.
	Store double word	sd $x,C($y)	Memory[$y + C] = $x	I		-	stores two words from $x and the next register into: MEM[$y+C] and the following 7 bytes. The order of the operands is a large source of confusion.
	Store word	sw $x,C($y)	Memory[$y + C] = $x	I	$2 B 16$	-	stores a word into: MEM[$y+C] and the following 3 bytes. The order of the operands is a large source of confusion.
	Store half	sh $x,C($y)	Memory[$y + C] = $x	I	$2916$	-	stores the first half of a register (a halfword) into: MEM[$y+C] and the following byte.
	Store byte	sb $x,C($y)	Memory[$y + C] = $x	I	$2816$	-	stores the first fourth of a register (a byte) into: MEM[$y+C].
	Load upper immediate	lui $x,C	$x = C << 16	I	$F 16$	-	loads a 16-bit immediate operand into the upper 16-bits of the register specified. Maximum value of constant is 2¹⁶-1
	Move from high	mfhi $x	$x = HI	R	0	$1016$	Moves a value from HI to a register. Do not use a multiply or a divide instruction within two instructions of mfhi (that action is undefined because of the MIPS pipeline).
	Move from low	mflo $x	$x = LO	R	0	$1216$	Moves a value from LO to a register. Do not use a multiply or a divide instruction within two instructions of mflo (that action is undefined because of the MIPS pipeline).
	Move from Control Register	mfcZ $x, $y	$x = Coprocessor[Z].ControlRegister[$y]	R	0		Moves a 4 byte value from Coprocessor Z Control register to a general purpose register. Sign extension.
	Move to Control Register	mtcZ $x, $y	Coprocessor[Z].ControlRegister[$y] = $x	R	0		Moves a 4 byte value from a general purpose register to a Coprocessor Z Control register. Sign extension.
Logical	And	and $d,$s,$t	$d = $s & $t	R	0	$2416$	Bitwise and 000000ss sssttttt ddddd--- --100100
	And immediate	andi $t,$s,C	$t = $s & C	I	$816$	-	001100ss sssttttt CCCCCCCC CCCCCCCC
	Or	or $x,$y,$z	$x = $y \| $z	R	0	$2516$	Bitwise or
	Or immediate	ori $x,$y,C	$x = $y \| C	I	$D 16$	-
	Exclusive or	xor $x,$y,$z	$x = $y ^ $z	R	0	$2616$
	Nor	nor $x,$y,$z	$x = ~ ($y \| $z)	R	0	$2716$	Bitwise nor
	Set on less than	slt $x,$y,$z	$x = ($y < $z)	R	0	$2 A 16$	Tests if one register is less than another.
	Set on less than immediate	slti $x,$y,C	$x = ($y < C)	I	$A 16$	-	Tests if one register is less than a constant.
Bitwise Shift	Shift left logical	sll $x,$y,C	$x = $y << C	R	0	0	shifts C number of bits to the left (multiplies by $2 C O N S T$ )
	Shift right logical	srl $x,$y,C	$x = $y >> C	R	0	$216$	shifts CONST number of bits to the right - zeros are shifted in (divides by $2 C$ ). Note that this instruction only works as division of a two's complement number if the value is positive.
	Shift right arithmetic	sra $x,$y,C	$$x = $y >> C + \$ $\bigg(\bigg(\sum_{n=1}^{\text{CONST}}2^{31-n}\bigg)\cdot $2 >> 31 \bigg)$	R	0	$316$	shifts C number of bits - the sign bit is shifted in (divides 2's complement number by $2 C$ )
Conditional branch	Branch on equal	beq $s,$t,C	if ($s == $t) go to PC+4+4*C	I	$416$	-	Goes to the instruction at the specified address if two registers are equal. 000100ss sssttttt CCCCCCCC CCCCCCCC
Conditional branch	Branch on not equal	bne $x,$y,C	if ($x != $y) go to PC+4+4*C	I	$516$	-	Goes to the instruction at the specified address if two registers are not equal.
Unconditional jump	Jump	j C	PC = PC+4[31:28] . C*4	J	$216$	-	Unconditionally jumps to the instruction at the specified address.
	Jump register	jr $x	goto address $x	R	0	$816$	Jumps to the address contained in the specified register
	Jump and link	jal C	$31 = PC + 8; PC = PC+4[31:28] . C*4	J	$316$	-	For procedure call - used to call a subroutine, $31 holds the return address; returning from a subroutine is done by: jr $31. Return address is PC + 8, not PC + 4 due to the use of a branch delay slot which forces the instruction after the jump to be executed

NOTE: In MIPS assembler code, the offset for branching instructions can be represented by a label elsewhere in the code.

NOTE: that there is no corresponding "load lower immediate" instruction; this can be done by using addi (add immediate, see below) or ori (or immediate) with the register $0 (whose value is always zero). For example, both addi $1, $0, 100 and ori $1, $0, 100 load the decimal value 100 into register $1.

NOTE: Subtracting an immediate can be done with adding the negation of that value as the immediate.

[edit] Floating point

MIPS has 32 floating-point registers. Two registers are paired for double precision numbers. Odd numbered registers cannot be used for arithemetic or branch, just for data transfer of the right "half" of double precision register pairs.

Category	Name	Instruction syntax	Meaning	Format/opcode/funct	Notes/Encoding
Arithmetic	FP add single	add.s $x,$y,$z	$x = $y + $z		Floating-Point add (single precision)
	FP subtract single	sub.s $x,$y,$z	$x = $y - $z		Floating-Point subtract (single precision)
	FP multiply single	mul.s $x,$y,$z	$x = $y * $z		Floating-Point multiply (single precision)
	FP divide single	div.s $x,$y,$z	$x = $y / $z		Floating-Point divide (single precision)
	FP add double	add.d $x,$y,$z	$x = $y + $z		Floating-Point add (double precision)
	FP subtract double	sub.d $x,$y,$z	$x = $y - $z		Floating-Point subtract (double precision)
	FP multiply double	mul.d $x,$y,$z	$x = $y * $z		Floating-Point multiply (double precision)
	FP divide double	div.d $x,$y,$z	$x = $y / $z		Floating-Point divide (double precision)
Data Transfer	Load word coprocessor	lwcZ $x,CONST ($y)	Coprocessor[Z].DataRegister[$x] = Memory[$y + CONST]	I	Loads the 4 byte word stored from: MEM[$2+CONST] into a Coprocessor data register. Sign extension.
Data Transfer	Store word coprocessor	swcZ $x,CONST ($y)	Memory[$y + CONST] = Coprocessor[Z].DataRegister[$x]	I	Stores the 4 byte word held by a Coprocessor data register into: MEM[$2+CONST]. Sign extension.
Logical	FP compare single (eq,ne,lt,le,gt,ge)	c.lt.s $f2,$f4	if ($f2 < $f4) cond=1; else cond=0		Floating-point compare less than single precision
Logical	FP compare double (eq,ne,lt,le,gt,ge)	c.lt.d $f2,$f4	if ($f2 < $f4) cond=1; else cond=0		Floating-point compare less than double precision
Branch	branch on FP true	bc1t 100	if (cond == 1) go to PC+4+100		PC relative branch if FP condition
Branch	branch on FP false	bc1f 100	if (cond == 0) go to PC+4+100		PC relative branch if not condition

[edit] Pseudo instructions

These instructions are accepted by the MIPS assembler, however they are not real instructions within the MIPS instruction set. Instead, the assembler translates them into sequences of real instructions.

Name	instruction syntax	Real instruction translation	meaning
Load Address	la $1, LabelAddr	lui $1, LabelAddr[31:16]; ori $1,$1, LabelAddr[15:0]	$1 = Label Address
Load Immediate	li $1, IMMED[31:0]	lui $1, IMMED[31:16]; ori $1,$1, IMMED[15:0]	$1 = 32 bit Immediate value
Branch if greater than	bgt $rs,$rt,Label	slt $at,$rt,$rs; bne $at,$zero,Label	if(R[rs]>R[rt]) PC=Label
Branch if less than	blt $rs,$rt,Label	slt $at,$rs,$rt; bne $at,$zero,Label	if(R[rs]<R[rt]) PC=Label
Branch if greater than or equal	bge	etc	if(R[rs]>=R[rt]) PC=Label
branch if less than or equal	ble		if(R[rs]<=R[rt]) PC=Label
branch if greater than unsigned	bgtu		if(R[rs]=>R[rt]) PC=Label
branch if greater than zero	bgtz		if(R[rs]>0) PC=Label

[edit] Some other important instructions

NOP (no operation) (machine code 0x00000000, interpreted by CPU as sll $0,$0,0)
break (breaks the program, used by debuggers)
syscall (used for system calls to the operating system)
a full set of Floating point instructions for both single precision and double precision operands

[edit] Compiler Register Usage

Main article: calling convention#MIPS

The hardware architecture specifies that:

General purpose register $0 always returns a value of 0 .
General purpose register $31 is used as the link register for jump and link instructions.
HI and LO are used to access the multiplier/divider results, accessed by the mfhi (move from high) and mflo commands.

These are the only hardware restrictions on the usage of the general purpose registers.

The various MIPS tool-chains implement specific calling conventions that further restrict how the registers are used. These calling conventions are totally maintained by the tool-chain software and are not required by the hardware.

**Registers**
Name	Number	Use	Callee must preserve?
$zero	$0	constant 0	N/A
$at	$1	assembler temporary	no
$v0–$v1	$2–$3	Values for function returns and expression evaluation	no
$a0–$a3	$4–$7	function arguments	no
$t0–$t7	$8–$15	temporaries	no
$s0–$s7	$16–$23	saved temporaries	yes
$t8–$t9	$24–$25	temporaries	no
$k0–$k1	$26–$27	reserved for OS kernel	no
$gp	$28	global pointer	yes
$sp	$29	stack pointer	yes
$fp	$30	frame pointer	yes
$ra	$31	return address	N/A

Registers that are preserved across a call are registers that (by convention) will not be changed by a system call or procedure (function) call. For example, $s-registers must be saved to the stack by a procedure that needs to use them, and $sp and $fp are always incremented by constants, and decremented back after the procedure is done with them (and the memory they point to). By contrast, $ra is changed automatically by any normal function call (ones that use jal), and $t-registers must be saved by the program before any procedure call (if the program needs the values inside them after the call).

[edit] Simulators

Open Virtual Platforms (OVP) [1] includes the freely available simulator OVPsim, a library of models of processors, peripherals and platforms, and APIs which enable users to develop their own models. The models in the library are open source, written in C, and include the MIPS 4K, 24K and 34K cores. These models are created and maintained by Imperas [2] and in partnership with MIPS Technologies have been tested and assigned the MIPS-Verified(tm) mark. The OVP site also includes models of ARM, Tensilica and OpenCores/openRisc processors. Sample MIPS-based platforms include both bare metal environments and platforms for booting unmodified Linux binary images. These platforms/emulators are available as source or binaries and are fast, free, and easy to use. OVPsim is developed and maintained by Imperas and is very fast (100s of million instructions per second), and built to handle multicore architectures. To download the MIPS OVPsim simulators/emulators visit [3].

There is a freely available "MIPS32 Simulator" (earlier versions simulated only the R2000/R3000) called SPIM for several operating systems (specifically Unix or GNU/Linux; Mac OS X; MS Windows 95, 98, NT, 2000, XP; and DOS) which is good for learning MIPS assembly language programming and the general concepts of RISC-assembly language programming: http://www.cs.wisc.edu/~larus/spim.html

EduMIPS64 is a GPL graphical cross-platform MIPS64 CPU simulator, written in Java/Swing. It supports a wide subset of the MIPS64 ISA and allows the user to graphically see what happens in the pipeline when an assembly program is run by the CPU. It has educational purposes and is used in some^[who?] Computer Architecture courses in Universities around the world. More info at http://www.edumips.org

MARS is another GUI based MIPS emulator designed for use in education, specifically for use with Hennessy's Computer Organization and Design. More information is available at http://courses.missouristate.edu/KenVollmar/MARS/

More advanced free MIPS emulators are available from the GXemul (formerly known as the mips64emul project) and QEMU projects, which emulate not only the various MIPS III and higher microprocessors (from the R4000 through the R10000), but also entire computer systems which use the microprocessors. For example, GXemul can emulate both a DECstation with a MIPS R4400 CPU (and boot to Ultrix), and an SGI O2 with a MIPS R10000 CPU (although the ability to boot Irix is limited), among others, as well as the various framebuffers, SCSI controllers, and the like which comprise those systems.

Commercial simulators are available especially for the embedded use of MIPS processors, for example Virtutech Simics (MIPS 4Kc and 5Kc, PMC RM9000, QED RM7000), VaST Systems (R3000, R4000), and CoWare (the MIPS4KE, MIPS24K, MIPS25Kf and MIPS34K).

[edit] Trivia

"Mips" the rabbit in Super Mario 64 is named after the technology, which was used by the Nintendo 64.

[edit] Notes

^ Morgan Kaufmann Publishers, Computer Organization and Design, David A. Patterson & John L. Hennessy, Edition 3, ISBN 1-55860-604-1, page 63
^ Jochen Liedtke(1995). On micro kernel construction. 15th Symposium on Operating Systems Principles, Copper Mountain Resort, Colorado.
^ SGI announcing the end of MIPS
^ NEC Offers Two High Cost Performance 64-bit RISC Microprocessors
^ ^a ^b MIPS R3000 Instruction Set Summary
^ MIPS Instruction Reference

[edit] Further reading

Patterson, David A; John L. Hennessy. Computer Organization and Design: The Hardware/Software Interface. Morgan Kaufmann Publishers. ISBN 1-55860-604-1.
Sweetman, Dominic. See MIPS Run, 2nd edition. Morgan Kaufmann Publishers. ISBN 0-12088-421-6.
Sweetman, Dominic. See MIPS Run. Morgan Kaufmann Publishers. ISBN 1-55860-410-3.
Farquhar, Erin; Philip Bunce. MIPS Programmer's Handbook. Morgan Kaufmann Publishers. ISBN 1-55860-297-6.

[edit] See also

DLX, a very similar architecture designed by John L. Hennessy (creator of MIPS) for teaching purposes
Loongson, a MIPS-like processor architecture developed at Chinese Academy of Sciences
MIPS-X, developed as a follow-on project to the MIPS architecture
Mongoose-V, a radiation hardened version of the MIPS R3000 used in spacecrafts
SPIM. is a MIPS processor simulator.

[edit] External links

Wikibooks has a book on the topic of

MIPS Assembly

[0] Morgan Kaufmann Publishers, Computer Organization and Design, David A. Patterson & John L. Hennessy, Edition 3, ISBN 1-55860-604-1, page 63

[1] Jochen Liedtke(1995). On micro kernel construction. 15th Symposium on Operating Systems Principles, Copper Mountain Resort, Colorado.

[2] SGI announcing the end of MIPS

[3] NEC Offers Two High Cost Performance 64-bit RISC Microprocessors

[uidaho-4] MIPS R3000 Instruction Set Summary

[5] MIPS Instruction Reference

[1]

[2]

[3]

[4]

[5]

[6]