Lecture-1-Slides.txt

Card Set Information

Author:
mlbbolls
ID:
62844
Filename:
Lecture-1-Slides.txt
Updated:
2011-01-29 20:01:52
Tags:
lecture
Folders:

Description:
lecture notes
Show Answers:

Home > Flashcards > Print Preview

The flashcards below were created by user mlbbolls on FreezingBlue Flashcards. What would you like to do?


  1. 1
    Lecture 1: Measuring Performance
  2. �Topics: (Sections 1.1, 1.4, 1.5, 1.8)
    • .Technology trends
    • .Performance summaries
    • .Performance equations
  3. 2
  4. Historical Microprocessor Performance
  5. 15x performance growth can be attributed to architectural innovations
    • 3
    • Processor Technology Trends
  6. �Shrinking of transistor sizes: 250nm (1997) .
    • 130nm (2002) .65nm (2007) .32nm (2010)
    • �Transistor density increases by 35% per year and die sizeincreases by 10-20% per year� more cores!
    • �Transistor speed improves linearly with size (complex
    • equation involving voltages, resistances, capacitances)�
    • can lead to clock speed improvements!
    • �Wire delays do not scale down at the same rate as logicdelays
  7. 4Power Consumption Trends
    • �Dyn power a activity x capacitance x voltage2x frequency
    • �Capacitance per transistor and voltage are decreasing,
    • but number of transistors is increasing at a faster rate;
    • hence clock frequency must be kept steady
    • �Leakage power is also rising
    • �Power consumption is already between 100-150W inhigh-performance processors today
  8. 5Where Are We Headed?
    • �Modern trends:
    • .Clock speed improvements are slowing
    • .power constraints
    • .already doing less work per stage
  9. .Difficult to further optimize a single core for performance
    .Multi-cores: each new processor generation will
  10. accommodate more cores
  11. 6Recent Microprocessor Trends
    2004
  12. 2010
    Source: Micron University Symp.
  13. Transistors: 1.43x / year
  14. Cores: 1.2 -1.4xPerformance: 1.15x
    Frequency: 1.05x
  15. Power: 1.04x
    • 7Modern Processor Today
    • �Intel Core i7
    • .Clock frequency: 3.2 �3.33 GHz
    • .45nm and 32nm products
    • .Cores: 4 �6
    • .Power: 95 �130 W
    • .Two threads per core
    • .3-level cache, 12 MB L3 cache
    • .Price: $300 -$1000
  16. 8Other Technology Trends
    • �DRAM density increases by 40-60% per year, latency hasreduced by 33% in 10 years (the memory wall!), bandwidthimproves twice as fast as latency decreases
    • �Disk density improves by 100% every year, latencyimprovement similar to DRAM
    • �Emergence of NVRAM technologies that can provide abridge between DRAM and hard disk drives
  17. 9Measuring Performance
    • �Two primary metrics: wall clock time (response time for aprogram) and throughput (jobs performed in unit time)
    • �To optimize throughput, must ensure that there is minimalwaste of resources
    • �Performance is measured with benchmark suites: acollection of programs that are likely relevant to the user
    • .SPEC CPU 2006: cpu-oriented programs (for desktops)
    • .SPECweb, TPC: throughput-oriented (for servers)
    • .EEMBC: for embedded processors/workloads
  18. 10
  19. Summarizing Performance
  20. �Consider 25 programs from a benchmark set �how dowe capture the behavior of all 25 programs with asingle number?
    • P1 P2 P3Sys-A10 8 25Sys-B12 9 20Sys-C8 8 30
    • .Total (average) execution time
    • .Total (average) weighted execution timeor Average of normalized execution times
    • .Geometric mean of normalized execution times
  21. 11
  22. AM Example
    • �We fixed a reference machine X and ran 4 programsA, B, C, D on it such that each program ran for 1 second
    • �The exact same workload (the four programs executethe same number of instructions that they did on
    • machine X) is run on a new machine Y and theexecution times for each program are 0.8, 1.1, 0.5, 2
    • �With AM of normalized execution times, we can concludethat Y is 1.1 times slower than X �perhaps, not for allworkloads, but definitely for one specific workload (whereall programs run on the ref-machine for an equal #cycles)
    • �With GM, you may find inconsistencies
  23. 12GM ExampleComputer-AComputer-B Computer-CP1 1 sec 10 secs 20 secsP2 1000 secs 100 secs 20 secsConclusion with GMs: (i) A=B
    • (ii) C is ~1.6 times faster
    • �For (i) to be true, P1 must occur 100 times for everyoccurrence of P2
    • �With the above assumption, (ii) is no longer trueHence, GM can lead to inconsistencies
  24. 13Summarizing Performance
    • �GM: does not require a reference machine, but doesnot predict performance very well
    • .So we multiplied execution times and determined
  25. that sys-A is 1.2x faster�but on what workload?
  26. �AM: does predict performance for a specific workload,
    • but that workload was determined by executingprograms on a reference machine
    • .Every year or so, the reference machine will have
  27. to be updated
  28. 14
  29. Normalized Execution Times
  30. �Advantage of GM: no reference machine required
    • �Disadvantage of GM: does not represent any �real entity�
    • and may not accurately predict performance
    • �Disadvantage of AM of normalized: need weights (whichmay change over time)
    • �Advantage: can represent a real workload
  31. 15
  32. CPU Performance Equation
    • �Clock cycle time = 1 / clock speed
    • �CPU time = clock cycle time x cycles per instruction xnumber of instructions
    • �Influencing factors for each:
    • .clock cycle time: technology and pipeline
    • .CPI: architecture and instruction set design
    • .instruction count: instruction set design and compiler
  33. �CPI (cycles per instruction) or IPC (instructions per cycle)
    can not be accurately estimated analytically
  34. 16Measuring System CPI
    • �Assume that an architectural innovation only affects CPI
    • �For 3 programs, base CPIs: 1.2, 1.8, 2.5CPIs for proposed model: 1.4, 1.9, 2.3
    • �What is the best way to summarize performance with asingle number? AM, HM, or GM of CPIs?
  35. 17Example
    • �AM of CPI for base case = 1.2 cyc+ 1.8 cyc+ 2.5 cyc /3instr instr instr5.5 cycles is execution time if each program ran forone instruction �therefore, AM of CPI defines aworkload where every program runs for an equal #instrs
    • �HM of CPI = 1 / AM of IPC ; defines a workload whereevery program runs for an equal number of cycles
    • �GM of CPI: warm fuzzy number, not necessarilyrepresenting any workload
  36. 18
  37. Speedup Vs. Percentage
  38. ��Speedup� is a ratio
    • ��Improvement�, �Increase�, �Decrease� usually refer topercentage relative to the baseline
    • �A program ran in 100 seconds on my old laptop and in 70seconds on my new laptop
    • .What is the speedup?
    • .What is the percentage increase in performance?
    • .What is the reduction in execution time?
  39. 19
  40. Title
    �Bullet

What would you like to do?

Home > Flashcards > Print Preview