The Graphics Card produces all the signals that your monitor needs to display.
The two most important parts of any Graphics Card are the framebuffer and the graphics processing unit.
- 1 Framebuffer
- 2 Graphics Processing Unit
- 3 Resolutions
- 4 Graphics Processor Form Factors
- 5 External Links
Framebuffer memory is a linear adress space (physically: fast RAM) containing color values for individual pixels for at least one single image (i.e. one refresh).
Types in use
Graphics cards typically do not use the standard DDR, DDR2 or DDR3 memory in use by the main system, they typically use GDDR2, GDDR3, GDDR4 or GDDR5.
DDR2 or DDR3 is currently only being used by integrated graphics (shared with the main system memory) or on very low end graphics cards.
GDDR, or Graphics Double Data Rate memory is a type of memory class that was introduced to run faster and have more bandwidth then typical DDR memory, to be more suitable for high performance graphics.
GDDR2 is not really 'GDDR', in the sense of the later versions, but more of a bridge point between DDR2 and GDDR3. It suffered heating issues to inappropiate voltages.
ATI Technologies, together with JEDEC, improved upon the GDDR2 design and created GDDR3. It has a similar technology base as GDDR2, but with enhanced power efficiency allowing it to run cooler and faster. Internal technologies allow it to run faster then DDR2, better facillitating graphics related functions.
GDDR4 was introduced in 2007 with ATI's Radeon X1900XTX. It is another version of GDDR specified by JEDEC, designed to be faster then all prior GDDR versions. Like GDDR3, it has no relationship to DDR3. GDDR4 SDRAM introduced DBI (Data Bus Inversion) and Multi-Preamble to reduce data transmission delay. Prefetch was increased from 4 to 8 bits. The maximum number of memory banks for GDDR4 has been increased to 8. To achieve the same bandwidth as GDDR3 SDRAM, the GDDR4 core runs at half the speed of a GDDR3 core of the same raw bandwidth. Core voltage was decreased to 1.5 V.
On the signaling front, GDDR4 expands the chip I/O buffer to 8 bits per two cycles, allowing for greater sustained bandwidth during burst transmission, but at the expense of significantly increased CAS latency (CL), determined mainly by the double reduced count of the address/command pins and half-clocked DRAM cells, compared to GDDR3. The number of addressing pins was reduced to half that of the GDDR3 core, and were used for power and ground, which also increases latency. Another advantage of GDDR4 is power efficiency: running at 2.4 Gbit/s, it uses 45% less power when compared to GDDR3 chips running at 2.0 Gbit/s.
In Samsung's GDDR4 SDRAM datasheet, it was referred as 'GDDR4 SGRAM', or 'Graphics Double Data Rate version 4 Synchronous Graphics RAM'. However, the essential block write feature is not available, so it is not classified as SGRAM.
It is presently in use in high end Radeon cards, such as the HD3870.
GDDR5 was first introduced in 2008, by ATI Technologies with their Radeon HD4870. GDDR5 is the sucessor to GDDR4 and unlike all previous versions, has two parallel links providing doubled rate, essentially 'quad pumping' the bus. Per se, the 'memory clock' (such as 900MHz on the Radeon HD4870), is quadrupled for an effective rating (for example 900MHz x 4 = 3,600MHz). This has enabled GDDR5 memory to reach a new level of performance greater then previous generations.
NVIDIA has recently acknowledged the capabilities of GDDR5 and will be moving to GDDR5 in the near future. It is speculated this will occur with the GT200b 55nm core (a die shrink and improved version of the existing GT200 core in use with the GTX 200 series) or with an upcoming generation codenamed D12U.
Graphics Processing Unit
The graphics processing unit is the central time base logic which produces the V-Sync- and H-Sync- signals that move the electron beam on the monitor and it simultaneously produces the adequate address to read out the current pixel from the frame buffer.
Difference from CPUs
Graphics processing units typically have a much higher computational ability then a central processing unit. They are somewhat similar in that both perform calculations with information given. However, the architecture used in GPUs differs greatly from CPUs.
Since the Xbox 360's Xenos (or R500), Radeon R600 (HD2K) and GeForce 8K series, GPUs typically consist of a large number of stream processors (also called Unified Shaders in GPUs specifically), which are essentially programmable mini-cores. This provides more flexibility in the computational ability of a graphics card, where in previous generations had separate types of shaders (geometry, vertex and pixel).
Recently, with the advent of NVIDIA's CUDA to allow CPU calculations to run on stream processors, the company has begun to call their shaders 'cores' which may be slightly misleading.
Number of 'shaders'
Typically, as mentioned prior, a GPU consists of stream processors. The number varies, usually with higher performance GPU models having a larger number of shaders. The top of the line NVIDIA GTX 280, for example, has 240, while the low end has a mere 32.
ATI's flagship solo-processor HD4870 has, on the other hand, a whopping 800 shaders.
However, one should not think that the ATI solution is immediately superior as NVIDIA's shaders are different from ATI's. An individual shader from a NVIDIA solution is frequently more powerful then an individual shader from ATi.
ATI's shaders share the same clock speed as the core (such as 750MHz on the HD4870) while NVIDIA's have their own separate clock (such as 1296MHz on the GTX 280, whereas the core is clocked at 602MHz).
But, how many pixels are on your screen? Well, here is a brief list:
- 640x480 gives you 307,200 Pixels
- 800x600 gives you 480,000 Pixels
- 1024x768 gives you 786,432 Pixels
- 1280x1024 gives you 1,310,720 Pixels
- 1920x1200 gives you TONS of Pixels
You then have what is called the colour depth. 2 Colour, or Monochrome, uses just Black and White. 16 Colour uses the 16 'Browser Safe' colours only. 256 colours gives you slightly more choice, but older systems using this just look weird. 16-Bit Colour uses 65,535 colours, the maximum that can be allocated using 16-Bit Addresses. 24-Bit Colour uses 16,777,215 colours, and 32-Bit uses 4,294,967,295 colours. 24-Bit and 32-Bit are referred to as 'True Colour'.
But, what does refresh rate mean? Well, typically, the Graphics Card Pixel Database contents change rapidly, and the faster the refresh rate, the faster those changes are relayed to the monitor.
If a Graphics Card transmits a resolution of 1280x1024, at a refresh rate of 60Hz, 257,698,037,700 individual Pixel refreshes occur each second.
Graphics Processor Form Factors
Currently, graphics solutions come in various forms.
The most common form, 'graphics cards', 'video cards' or 'dedicated graphics', refers to a GPU with its own personal printed circuit board and RAM (the framebuffer). These come in the form of add-on cards which typically insert into a high speed interface slot on the motherboard. Early graphics solutions used the typical PCI slot of the time, before progressing to Accelerated Graphics Port. The latest interface in use is PCI-Express x16 slots. Dedicated graphics make up much of the graphics hierachy, ranging from low to high end. Low end cards include the GeForce 9500GT or Radeon HD3450. High end cards are extremely powerful, with some being capable of one teraFLOPs, such as the NVIDIA 9800GX2 or ATI Radeon HD4870. It should be worth noting however that a single high end card is frequently extremely expensive, with one possibly costing even more then a complete budget system.
The other popular form of graphics comes in the integrated sort, which are 'integrated' or permanently soldered on the motherboard. They can also be 'integrated' on the Northbridge or Southbridge. They have also been called IGPs (Integrated Graphics Processors). Integrated graphics set aside an amount of the computer's system RAM for use as their frame buffer, which typically incurs a performance hit. Integrated graphics are typically rather low in performance due to their rather small size and possible heat issues. However, they are much cheaper then dedicated solutions. Integrated graphics were previously considered unfit for 3D applications or intensive 2D programs (such as Adobe Flash), but recent models have been capable, such as the ATI Radeon HD3300 (integrated in the AMD 780GX chipset), the Intel Graphics Media Accelerator X4500HD (Intel G45) and the NVIDIA GeForce 8200 (NVIDIA 730a).
Another newer type takes the form of 'Hybrid Graphics', which is typically a low end dedicated graphics card (such as ATI's HD2400). These cards come with a small framebuffer memory. Through technologies within the PCI-Express bus, hybrid graphics cards are able to access the system's RAM to boost the size of their frame buffer. This implementation has been named TurboCache and HyperMemory by NVIDIA and ATI respectively. Some of these solutions can be advertised with "up to 768MB of memory" which is actually the amount it can use in total, with the card's dedicated memory being as little as 128MB.
Hybrid CrossFireX and Hybrid SLI
These are not the same as the Hybrid Graphics that have been described above, but actually something else.
Hybrid CrossFireX and Hybrid SLI follow the same concept of pairing a low end graphics card (such as the GeForce 8400GS or the Radeon HD3470) with a motherboard with supporting integrated graphics (such as a GeForce 8300 or AMD 790G board).
This is to improve performance while costing relatively less then most dedicated solutions do.
NVIDIA's Hybrid Power does not work in the same way (combining IGP and dedicated GPU) but instead turns off or disables the Hybrid Power supporting card (such as a GTX 280/260 or a 55nm GeForce 9) during simple applications such as word processing or desktop operations. This enables a high level of power saving as an IGP often draws much less power then a dedicated card would.
Another new concept application for GPUs is that of stream processing. This concept uses massively parallel floating-point, yet dedicated computational power of a modern graphics accelerator's shader pipeline. In certain applications requiring massive vector operations, this can yield several orders of magnitude higher performance than a conventional CPU. The two largest GPU designers, ATI and NVIDIA, are firmly pursuing this new market.
Recently NVIDIA began releasing cards supporting an API extension to the C programming language called CUDA ("Compute Unified Device Architecture"), which allows specified functions from a normal C program to run on the GPU's stream processors. This makes C programs capable of taking advantage of a GPU's ability to operate on large matrices in parallel, while still making use of the CPU where appropriate. CUDA is also the first API to allow CPU-based applications to access directly the resources of a GPU for more general purpose computing without the limitations of using a graphics API.
A new concept is to use a modified form of a stream processor to allow a general purpose graphics processing unit. This concept turns the massive floating-point computational power of a modern graphics accelerator's shader pipeline into general-purpose computing power, as opposed to being hard wired solely to do graphical operations. This means that the graphics card can now perform various forms of calcuations, such as those typically done by a CPU. In certain applications requiring massive vector operations, this can yield several orders of magnitude higher performance than a conventional CPU. The two largest discrete (see "Dedicated graphics cards" above) GPU designers, ATI and NVIDIA, are beginning to pursue this new market with an array of applications. ATI has teamed with Stanford University to create a GPU-based client for its Folding@Home distributed computing project (for protein folding calculations) that in certain circumstances yields results forty times faster than the conventional CPUs traditionally used in such applications.
Since 2005 there has been interest in using the speed offered by GPUs for evolutionary computation in general and for speeding up the fitness evaluation in genetic programming in particular. There is a short introduction on pages 90-92 of A Field Guide To Genetic Programming. Most approaches compile linear or tree programs on the host PC and transfer the executable to the GPU to run. Typically the speed advantage is only obtained by running the single active program simultaneously on many example problems in parallel using the GPU's SIMD architecture. However, substantial speed up can also be obtained by not compiling the programs but instead transferring them to the GPU and interpretting them there. Speedup can then be obtained by either interpreting multiple programs simultaneously, simultaneously running multiple example problems, or combinations of both. A modern GPU (e.g. 9800 GTX) can readily simultaneously interpret hundreds of thousands of very small programs, faster then any CPU can.