Ex Parte Faith et alDownload PDFPatent Trial and Appeal BoardNov 20, 201813836110 (P.T.A.B. Nov. 20, 2018) Copy Citation UNITED STA TES p A TENT AND TRADEMARK OFFICE APPLICATION NO. FILING DATE 13/836,110 03/15/2013 909 7590 12/06/2018 Pillsbury Winthrop Shaw Pittman, LLP PO Box 10500 McLean, VA 22102 FIRST NAMED INVENTOR John Newman Faith UNITED STATES DEPARTMENT OF COMMERCE United States Patent and Trademark Office Address: COMMISSIONER FOR PATENTS P.O. Box 1450 Alexandria, Virginia 22313-1450 www .uspto.gov ATTORNEY DOCKET NO. CONFIRMATION NO. 034250-0437044 1106 EXAMINER JOHNSON, ROBERT C ART UNIT PAPER NUMBER 3682 NOTIFICATION DATE DELIVERY MODE 12/06/2018 ELECTRONIC Please find below and/or attached an Office communication concerning this application or proceeding. The time period for reply, if any, is set in the attached communication. Notice of the Office communication was sent electronically on above-indicated "Notification Date" to the following e-mail address(es): docket_ip@pillsburylaw.com PTOL-90A (Rev. 04/07) UNITED STATES PATENT AND TRADEMARK OFFICE BEFORE THE PATENT TRIAL AND APPEAL BOARD Ex parte JOHN NEWMAN FAITH, JAGJIT SINGH BATH, and EITHAN ZILKHA Appeal 2017-002791 1 Application 13/836,110 Technology Center 3600 Before ANTON W. PETTING, MICHAEL C. ASTORINO, and ROBERT J. SILVERMAN, Administrative Patent Judges. PETTING, Administrative Patent Judge. DECISION ON APPEAL STATEMENT OF THE CASE2 John Newman Faith, Jagjit Singh Bath, and Eithan Zilkha (Appellants) seek review under 35 U.S.C. § 134 of the Examiner's Final 1 Appellants identify RetailMeNot, Inc. as the real party in interest. App. Br. 2. 2 Our decision will make reference to the Appellants' Appeal Brief filed September 1, 2016, ("App. Br."); Reply Brief filed December 7, 2016, ("Reply Br."); the Examiner's Answer mailed October 7, 2016, ("Ans."); and Final Action mailed October 23, 2015, ("Final Act."). Appeal 2017-002791 Application 13/836,110 rejection of claims 1, 2, 5, 9, and 21-38, the only claims pending in the application on appeal. We have jurisdiction over the appeal pursuant to 35 U.S.C. § 6(b). We AFFIRM. The Appellants invented forms of offer-discovery systems. Specification para. 2. An understanding of the invention can be derived from a reading of exemplary claim 1, which is reproduced below (bracketed matter and some paragraphing added). 1. A computer-implemented method, comprising: [ 1] accessing an offers engine user profile associated with a user and an offers engine, the offers engine user profile comprising a plurality of attributes associated with customization of an offers interface, the plurality of attributes including a collection of favorite offers previously identified by a corresponding user, the offers interface being configured to provide a plurality of merchant offers displayed concurrently; [2] receiving over a network from a first user device and a first session of the offers interface a modification to an attribute of the plurality of attributes of the offers-engine user profile, the modification including the addition of a given offer to the collection of favorite offers; [3] storing the modified attribute in the offers-engine user profile; [ 4] receiving over a network a request to access the offers interface in a second session from a second user device; [5] constructing, by one or more processors, the offers interface based on the modified attribute to produce a customized offers interface, 2 Appeal 2017-002791 Application 13/836,110 and wherein constructing the offers interface comprises: [ 5 .1] sending a query from a controller of the offers engine to a cache server of the offers engine, wherein the cache server maintains a cache of offers stored in random access memory of the cache server, the cache of offers having a subset of all of the offers accessible to the offers engine, wherein the subset of offers are each associated with a hash key value calculated based on a respective parameter of the respective offer among the subset and paired with a respective addresses of the respective offer in the random access memory; and [5.2] identifying responsive offers with the cache server from among the subset of offers based on the query; [ 5 .3] sending at least some of the responsive offers from the cache server of the offers engine to the controller of the offers engine; [5.4] obtaining at least some of a selected plurality of offers for the second session that is with the second user device and from among the responsive offers based on the given offer added to the collection of favorite offers in the first session that is with the first user device; [5.5] including the at least some of the selected plurality of offers in the customized offers interface; [ 6] sending over a network the customized offers interface to the second user device for use in the second session. 3 Appeal 2017-002791 Application 13/836,110 Claims 1, 2, 5, 9, and 21-38 stand rejected under 35 U.S.C. § 101 as directed to a judicial exception without significantly more. 3 ISSUES The issues of eligible subject matter tum primarily on whether the claims recite more than abstract conceptual advice of what a computer is to provide without implementation details. FACTS PERTINENT TO THE ISSUES The following enumerated Findings of Fact (FF) are believed to be supported by a preponderance of the evidence. Facts Related to Appellants' Disclosure 01. The cache server 23 stores a subset of the data in the data store 24 that is among the more likely data to be accessed in the near future. To facilitate relatively fast access, the cache server 23 may store cached data in relatively high speed memory, such as random access memory or a solid-state drive. The cached data may include offers entered into the offers engine 12 within a threshold period of time, such as offers that are newer than one day. In another example, the cache data may include offers that are accessed with greater than a threshold frequency, such as offers that are accessed more than once a day, or offers accessed 3 Claim rejections under 35 U.S.C. § 103 (Final Act. 5-20) are withdrawn. Ans. 11. 4 Appeal 2017-002791 Application 13/836,110 within the threshold, such as offers accessed within the previous day. Caching such offer data is expected to facilitate faster access to offer data than systems that do not cache offer data. Spec. para. 41. 02. Next, in the present embodiment, the received offer data is stored in an offer data store, as indicated by block 68. Storing the offer data in the offer data store may include identifying a merchant to which the offer pertains and storing the offer in a merchant-offer record associated with that merchant. Further, some embodiments may include inserting the offer in order in a sorted list of offers for relatively fast retrieval of offers using a binary search algorithm or other techniques to facilitate relatively quick access to data that has been preprocessed ( e.g., using a prefix trie[ sic]). In some embodiments, storing the received offer may further include updating hash tables by which the offer may be retrieved according to various parameters, each hash table being associated with one parameter and including a hash key value calculated based on the parameter and paired with an address of the offer. Such hash tables are expected to facilitate relatively fast access to a given offer as the need to iterate through potentially all offers meeting certain criteria may be potentially avoided. Spec. para. 46. 5 Appeal 2017-002791 Application 13/836,110 ANALYSIS Method claim I recites accessing a profile, receiving and storing an attribute modification, receiving a request for access, constructing an interface, and sending the interface. The claim further recites, within the operation of constructing the interface, sending a query and identifying, sending offers, obtaining, and including offers in the interface. Thus, claim I recites receiving, querying, analyzing, modifying, and transmitting data. None of the limitations recite implementation details for any of these steps, but instead recite functional results to be achieved by any and all possible means. Data reception, querying, analysis, modification, and transmission are all generic, conventional data processing operations to the point they are themselves concepts awaiting implementation details. The sequence of data receiving-querying-analyzing-modifying-transmitting is equally generic and conventional. The ordering of the steps is therefore ordinary and conventional. That the data constructed and transmitted is labelled as an interface is unhelpful to Appellants as this is no more than the data for a conventional data entry and retrieval screen. The remaining claims merely describe offer and interface parameters, generic devices, and shuffling the offer data, with no implementation details. The Supreme Court set forth a framework for distinguishing patents that claim laws of nature, natural phenomena, and abstract ideas from those that claim patent-eligible applications of those concepts. First, [] determine whether the claims at issue are directed to one of those patent-ineligible concepts. [] If so, we then ask, "[ w ]hat else is there in the claims before us? [] To answer that question, [] consider the elements of each claim both individually and "as an ordered combination" to determine whether 6 Appeal 2017-002791 Application 13/836,110 the additional elements "transform the nature of the claim" into a patent-eligible application. [The Court] described step two of this analysis as a search for an "inventive concept"-i.e., an element or combination of elements that is "sufficient to ensure that the patent in practice amounts to significantly more than a patent upon the [ineligible concept] itself." Alice Corp., Pty. Ltd. v CLS Bank Intl, 134 S. Ct. 2347, 2355 (2014) (citing Mayo Collaborative Servs. v. Prometheus Labs., Inc., 566 U.S. 66 (2012)). To perform this test, we must first determine whether the claims at issue are directed to a patent-ineligible concept. The Examiner finds the claims directed to accessing an offers engine user profile including a plurality of attributes, receiving and storing a modification to an attribute in the plurality of attributes, receiving a request to access the offers interface, constructing the offers interface based on the modified attribute, and sending the offers interface to a user device. The courts have noted that comparing new and stored information and using rules to identify options is an idea of . . . . judicial exceptions such as comparing new and stored information and using rules to identify options ( e.g. comparing the stored user profile including the collection of favorite offers, to the newly received modification of attributes, and using the modified attributes to construct a customized offers interface). The process of accessing an offers engine user profile, receiving a modification to an attribute over a network, storing the modified attribute, receiving a request to access the offers interface, and constructing the offers interface based on the modified attribute are all steps that describe the abstract idea and are mere instructions to implement an abstract idea on a computer. Final Act. 2-3. 7 Appeal 2017-002791 Application 13/836,110 Although the Court in Alice made a determination as to what the claims were directed to, we find that this case's claims themselves and the Specification provide enough information to inform one as to what they are directed to. The preamble to claim I does not recite what it is directed to, but the steps in claim I result in creating and sending a customized offers interface over a network. The Specification at paragraph 2 recites that the invention relates to offer-discovery systems. Thus, all this evidence shows that claim I is directed to sending a way for a customer to review promotional offers, i.e., marketing promotion. This is consistent with the Examiner's finding. It follows from prior Supreme Court cases, and Bilski (Bilski v Kappas, 561 U.S. 593 (2010)) in particular, that the claims at issue here are directed to an abstract idea. Like the risk hedging in Bilski, the concept of marketing promotion is a fundamental business practice long prevalent in our system of commerce. The use of marketing promotion is also a building block of ingenuity in marketing. Thus, marketing promotion, like hedging, is an "abstract idea" beyond the scope of 35 U.S.C. § 101. See Alice, 134 S. Ct. at 2356. As in Alice, we need not labor to delimit the precise contours of the "abstract ideas" category in this case. It is enough to recognize that there is no meaningful distinction in the level of abstraction between the concept of risk hedging in Bilski and the concept of marketing promotion at issue here. Both are squarely within the realm of "abstract ideas" as the Court has used that term. See Alice, 134 S. Ct._at 2357. Further, claims involving only data collection, analysis, and display are directed to an abstract idea. Elec. Power Grp. v. Alstom S.A., 830 F.3d 8 Appeal 2017-002791 Application 13/836,110 1350, 1353 (Fed. Cir. 2016) (Holding that "collecting information, analyzing it, and displaying certain results of the collection and analysis" are "a familiar class of claims 'directed to' a patent ineligible concept."); see also In re TL! Commc 'ns LLC Patent Litig., 823 F.3d 607, 611 (Fed. Cir. 2016); FairWarning IP, LLC v. Iatric Sys., Inc., 839 F.3d 1089, 1093-94 (Fed. Cir. 2016). Claim 1, unlike the claims found non-abstract in prior cases, uses generic computer technology to perform data reception, querying, analysis, modification, and transmission and does not recite an improvement to a particular computer technology. See, e.g., McRO, Inc. v. Bandai Namco Games Am. Inc., 837 F.3d 1299, 1314-15 (Fed. Cir. 2016) (Finding claims not abstract because they "focused on a specific asserted improvement in computer animation."). As such, claim 1 is directed to the abstract idea of receiving, querying, analyzing, modifying, and transmitting data. The remaining claims merely describe offer and interface parameters, generic devices, and shuffling the offer data. We conclude that the claims at issue are directed to a patent-ineligible concept. The introduction of a computer into the claims does not alter the analysis at Mayo step two. the mere recitation of a generic computer cannot transform a patent-ineligible abstract idea into a patent-eligible invention. Stating an abstract idea "while adding the words 'apply it"' is not enough for patent eligibility. Nor is limiting the use of an abstract idea "[]to a particular technological environment.[]" Stating an abstract idea while adding the words "apply it with a computer" simply combines those two steps, with the same deficient result. Thus, if a patent's recitation of a computer amounts to a mere instruction to "implement[t]" an abstract idea "on ... a computer," that addition cannot impart patent eligibility. This conclusion accords with the preemption concern that undergirds our § 101 jurisprudence. Given the 9 Appeal 2017-002791 Application 13/836,110 ubiquity of computers, wholly genenc computer implementation is not generally the sort of "additional feature[ e ]" that provides any "practical assurance that the process is more than a drafting effort designed to monopolize the [ abstract idea] itself." Alice Corp. Pty. Ltd., 134 S. Ct. at 2358 (citations omitted). "[T]he relevant question is whether the claims here do more than simply instruct the practitioner to implement the abstract idea ... on a generic computer." Alice, 134 S. Ct. at 2359. They do not. Taking the claim elements separately, the function performed by the computer at each step of the process is purely conventional. Using a computer for receiving, querying, analyzing, modifying, and transmitting data amounts to electronic data query and retrieval-one of the most basic functions of a computer. All of these computer functions are well- understood, routine, conventional activities previously known to the industry. See Electric, supra. See also In re Katz Interactive Call Processing Patent Litig., 639 F.3d 1303, 1316 (Fed. Cir. 2011) ("Absent a possible narrower construction of the terms "processing," "receiving," and "storing," ... those functions can be achieved by any general purpose computer without special programming."). In short, each step does no more than require a generic computer to perform generic computer functions. As to the data operated upon, "even if a process of collecting and analyzing information is 'limited to particular content' or a particular 'source,' that limitation does not make the collection and analysis other than abstract." SAP Am. Inc. v. InvestPic LLC, 898 F.3d 1161, 1168 (Fed. Cir. 2018). It is proper to at this point look particularly at limitation 5.1, describing a cache server that maintains a cache of offers stored in random 10 Appeal 2017-002791 Application 13/836,110 access memory of the cache server, the cache of offers having a subset of all of the offers accessible to the offers engine, wherein the subset of offers are each associated with a hash key value calculated based on a respective parameter of the respective offer among the subset and paired with a respective addresses of the respective offer in the random access memory. At first, this appears to be the sort of technological implementation detail that would be more than an abstraction. After all, cache and hashing are two technical implementations most anyone with programming experience are aware of for speeding up processing. The problem is they are too well known, conventional, and generic. Appellants do not describe any particular implementation details for practicing these techniques, nor do they describe how any aspect of the invention that makes use of these techniques provides more than the well known speed. 4 Appellants do not claim to have invented or improved upon these techniques. The support for this limitation found at Specification paragraphs 41 and 4 7 only describes faster access than without these generic techniques. This recitation is then no more than abstract conceptual advice to apply generic tools for their intended use in a known context. The use of generic computer equipment to produce their known expected results is of no avail. These techniques are not part of achieving what the claims are 4 See John P. Hayes, Computer Architecture and Organization, Chapter 5, Memory Organization, Section 5.2.5 File Organization, describing hashing pp. 381-382 and Section 5.3.2 Cache Memories pp. 385-388 (1978). It was a college text book, (see id. at xi), and therefore disseminated widely and its contents widely practiced. 11 Appeal 2017-002791 Application 13/836,110 directed to, only generic parts of speeding up the process whatever it may be. Because the claims are directed to an abstract idea, the claims must include an "inventive concept" in order to be patent- eligible. No such inventive concept is present here. Instead, the claims "add" only generic computer components such as an "interface," "network," and "database." These generic computer components do not satisfy the inventive concept requirement. Mortgage Grader, Inc. v. First Choice Loan Servs. Inc., 811 F.3d 1314, 1324-1325 (Fed. Cir. 2016) (citations and quotation marks omitted). The only limitations on the breadth of the result focused, functional claims in this case are (1) that the application used by the cellular telephone must be wirelessly downloadable, and (2) that the cellular telephone must have a graphical user interface display that allows the user to select the regional broadcasting channel. Those additional limitations describe purely conventional features of cellular telephones and the applications that enable them to perform particular functions. They therefore do not meaningfully limit the scope of the claims. Affinity Labs of Tex., LLC v. DIRECTV, LLC, 838 F.3d 1253, 1265 (Fed. Cir. 2016). Dependent claims 10 and 11 respectively recite an "image analysis unit for determining quality of the digital images" and a "control unit for controlling resolution of digital images." These components purportedly analyze the image data sent from the telephone unit to determine the quality of the image sent, and if certain criteria are met, instruct the telephone unit to resend the image. While these units purport to add additional functionality to the server, '295 patent, col. 5 11. 14-32, the [S]pecification limits its discussion of these components to abstract functional descriptions devoid of technical explanation as to how to 12 Appeal 2017-002791 Application 13/836,110 implement the invention. For example, the "image analysis unit" predictably analyzes the digital images to "determine[ ] the quality of the digital image provided to the server." Id. at col. 5 11. 14--16; see also id. at col 8 11. 24--26. And, the "control unit" predictably "controls" various aspects of the claimed functionality. It "controls the image resolution of the digital images" using known image compression techniques, id. at col. 5 11. 21-24, and it "controls the transmission rate during transmission of the data via the transmission system," id. at col. 5 11. 30-33. Such vague, functional descriptions of server components are insufficient to transform the abstract idea into a patent-eligible invention. In re TL! Commc'ns LLC Patent Litig., 823 F.3d 607, 614-615 (Fed. Cir. 2016). Considered as an ordered combination, the computer components of Appellants' method add nothing that is not already present when the steps are considered separately. The sequence of data receiving-querying- analyzing-modifying-transmitting is equally generic and conventional or otherwise held to be abstract. See Ultramercial, Inc. v. Hulu, LLC, 772 F.3d 709, 715 (Fed. Cir. 2014) (Sequence of receiving, selecting, offering for exchange, display, allowing access, and receiving payment recited an abstraction.); Inventor Holdings, LLC v. Bed Bath & Beyond, Inc., 876 F.3d 1372, 1378 (Fed. Cir. 2017) (Sequence of data retrieval, analysis, modification, generation, display, and transmission.); Two-Way Media Ltd. v. Comcast Cable Communications, LLC, 874 F.3d 1329, 1339 (Fed. Cir. 2017) (Sequence of processing, routing, controlling, and monitoring.). The ordering of the steps is therefore ordinary and conventional. Viewed as a whole, Appellants' method claims simply recite the concept of marketing promotion as performed by a generic computer. To be sure, the claims recite doing so by advising one to package promotional 13 Appeal 2017-002791 Application 13/836,110 offers in an interface for transmission. But this is no more than abstract conceptual advice on the parameters for such marketing promotion and the generic computer processes necessary to process those parameters, and do not recite any particular implementation. It is little more than computerization of former mass coupon mailing, albeit customized to the recipient by virtue of the generic abilities of computers. The method claims do not, for example, purport to improve the functioning of the computer itself. Nor do they effect an improvement in any other technology or technical field. The 30 pages of specification spell out different generic equipment and parameters that might be applied using this concept and the particular steps such conventional processing would entail based on the concept of marketing promotion under different scenarios. 5 They do not describe any particular improvement in the manner a computer functions. Instead, the claims at issue amount to nothing significantly more than an instruction to apply the abstract idea of marketing 5 The Specification describes special purpose computer or a similar special purpose electronic processing/computing device. In the context of this specification, a special purpose computer or a similar special purpose electronic processing or computing device is capable of manipulating or transforming signals, for instance signals represented as physical electronic, optical, or magnetic quantities within memories, registers, or other information storage devices, transmission devices, or display devices of the special purpose computer or similar special purpose processing or computing device. Spec. para. 95. Although this description uses the phrase "special purpose computer," the Specification does not further describe what distinguishes such from general purpose computers, and the structure described is that of a general purpose computer. Thus, there is no apparent physical structural distinction between the "special purpose computer" described in the Specification and a general purpose computer. 14 Appeal 2017-002791 Application 13/836,110 promotion using some unspecified, generic computer. Under our precedents, that is not enough to transform an abstract idea into a patent- eligible invention. See Alice, 134 S. Ct. at 2360. As to the structural claims, they are no different from the method claims in substance. The method claims recite the abstract idea implemented on a generic computer; the system claims recite a handful of generic computer components configured to implement the same idea. This Court has long "warn[ ed] . . . against" interpreting [section] IO I "in ways that make patent eligibility 'depend simply on the draftsman's art."' Alice, 134 S. Ct. at 2360. As to Appellants' Appeal Brief arguments, we adopt the Examiner's determinations from Final Action 2-4 and Answer 3-10 and reach similar legal conclusions. We now tum to the Reply Brief. We are not persuaded by Appellants' argument that "[t]he rejection does not correctly identify the subject matter to which the claims are directed because the Examiner has analyzed a concept merely involved in the claims, not the subject matter to which the claims as a whole are directed." Reply Br. 2-3 (emphasis omitted). We find sufficient evidence supra to determine that the claims are directed to sending a way for a customer to review promotional offers, i.e. marketing promotion. Although we agree with Appellants that "[t]he Examiner is not permitted to ignore the primary thrust of the claim when selecting the alleged abstract idea" (id. at 3), both the claim itself and the Specification support this determination. We are not persuaded by Appellants' argument that "[t]he claims are directed to a patent-eligible improvement in computer functionality because the claimed architecture and data structures related to the cache server and 15 Appeal 2017-002791 Application 13/836,110 hash keys improve upon the operation of traditional computer systems in the context of computer systems used in online coupon publishing." Reply Br. 4-5 (emphasis omitted). Appellants simply recite well trodden memory architectures routinely used in general purpose computers. Just as using a computer to speed processing is insufficient, so is using conventional generic structures within or attached to computers. We are not persuaded by Appellants' argument that "[t]he claims recite an ordered combination that the Examiner's analysis fails to address because the Examiner merely labels claim elements individually as generic computer operations without analyzing the ordered combination for an inventive concept." Id. at 5-6 ( emphasis omitted). We determine supra that based on similar ordered sequence facts in precedential cases, the ordered combination here is conventional. We are not persuaded by Appellants' argument that "[t]he claimed subject matter is not similar to the 'idea of itself court cases upon which the Examiner relies because those cases are limited to activities historically performed in a person's mind." Id. at 6. As the claims recite no technological implementation details aside from conventional generic processing, the purely functional nature of the claims is essentially no more than abstract conceptual advice, an idea unto itself. CONCLUSIONS OF LAW The rejection of claims 1, 2, 5, 9, and 21-38 under 35 U.S.C. § 101 as directed to a judicial exception without significantly more is proper. 16 Appeal 2017-002791 Application 13/836,110 DECISION The rejection of claims 1, 2, 5, 9, and 21-38 is affirmed. No time period for taking any subsequent action in connection with this appeal may be extended under 37 C.F.R. § l.136(a). See 37 C.F.R. § l.I36(a)(l)(iv) (2011). AFFIRMED 17 * * Notice of References Cited Document ~Jurnber Country Cot1e-Number-Kind Code A US- B US- c US- n US-!J E US- F US- c; US- H US- US- j US- K US- L US- M US- Document Number Country Code-Number-Kind Cot1e N 0 p Q R s T Date MM-YYYY Date MM-YYYY Application/Control No. Applicant(s)/Patent Under Patent Appeal No. 2017-002791 13/836, 110 Examiner Art Unit 3682 Page 1 of 1 U.S. PATENT DOCUMENTS Name CPC Classification US Classification FOREIGN PATENT DOCUMENTS Country Name CPC Classification NON-PATENT DOCUMENTS * Include as applicable: Author, Tille Date, Publisher, Edition or Volume, Pertinent Pages) u John P. Hayes, Computer Architecture and Organization, Chapter 5, Memory Organization, Section 5.2.5 File Organization, describing hashing pp. 381-- 382 and Section 5.3.2 Cache Memories pp. 385-- 388 (1978) V w X 'A copy of tt11s reference 1s not bemg furnished w1m mis Office action. (See MPEP § 707.05(a).) Dates in MM-YYYY format are publication dates. Classifications may be US or foreign. U.S. Patent and Trademark Office PT0-892 (Rev. 01-2001) Notice of References Cited Part of Paper No. 20170303 ,.._ ~ Computer Architecture and Organization Copyright © 1978 by McGraw-Hill, lnc. All rights reserved. Printed in the lJnllcLI Slntcs or America. No part of this publication may be reproduced, stored in a retrieval N)'Nlem. Ill' trnn~mit- led. in any form or by any means. electronic, mechanical. photocopying, recordin11, or otherwise. without the prior written pennission of the publisher. 1011 J ~ IIDHD 8987654 ThlK hook was set in Times Roman by A Graphic Method Inc. The editor~ were Peter D. Nalle and J. W. Maisel; the production supervisor WIIN llennlN J, Conroy. The drawings were done by A NCO/BOSTON. HayeN. fohn P Computer architecture and organization. (McClrnw-Hill computer science series) Includes hibliographies and index. I. Computer architecture. 2. Electronic disit11l computers-Design and construction. I. Title. QA76.9.A73H39 621.3819'52 77-18898 ISBN 0-07 -027363-4 To my father Patrick J. Hayes ( 1910-1968) IN MEMORIAM "'1!111!! ~T"NU 11tr I 1m Oraanlutlon 401 C'ummunicution 40 I 6, I, I Locul communication. 6.1.2 Long-distance communication. 1'1.1 J ln1cm:onncction structure. 6.1.4 Bus control. I lnru1-ou11111t sy111cms 1'1.2.1 Pro.irammccJ 10. 6.2.2 DMA and interrupts. fl,2J IO rwoccssorN. 6.2.4 CPU-IO interaction. I M11l1lr,lc ( 'PlJ systems 6J. I Mullirroccssors. 6.3.2 Fault-tolerant computers. 6.lJ C'omrutcr networks. I Summnry Prohlr1m lhfc,rrnt.:tN • 418 447 475 476 478 481 PREFACE Computer architecture can be loosely defined as the study of the structure, be- havior. and design of computers. It has emerged as a separate discipline in recent years mainly as a result of the proliferation of computers and advances in computer technology. The development of medium- and large-scale in- tegrated circuits in the 1970s has provided a variety of powerful but inexpen- sive components for computer design. Perhaps the most significant of these de- velopments is the appearance of microprocessors and microcomputers, devices that in a remarkably short time have resulted in the use of computer systems i~ many applications that previously used special-purpose logic circuits. Con- sequently. the concepts of computer architecture have become increasingly rel- evant to a large number of engineers and scientists. This book is intended primarily as a text for electrical engineering and com- puter sdence courses in computer architecture and organization at the ad- vanced undergraduate and beginning graduate levels. Its emphasis is on com- puter hardware, but the relevant aspects of software are also treated. The book assumes that the reader has a good understanding of computer programming and is familiar with the workings of at least one simple computer. Some famil- iarity with logic design is also assumed. There are no special mathematical prerequisites; however, occasional use is made of elementary calculus and prnhahility theory. The aim of the hook is to provide a comprehensive and self-contained view of rnmputcr architecture. The topics covered are broadly consistent with the recnmmendations of the IEEE Computer Society Task Force on Computer Architecture.• Underlying design principles and performance evaluation are stressed; and an attempt h.is hcen made to use a uniform terminology and nota- tion lhrou!lhout. Ahout II hundred rrohlcms arc included to provide the student 1li. II.. MII-Nlllllllll, f.'1 nl.· "I\ C1111r-11111' Study 111 ('0111p11lcr lfonlwurc Architecture."' Corn- Jl/1/1'1', 11111. M, no. 1.1.1111, 44 ti,. Drcomhrt· IV7.~. xi CH,OTHM FIVE MEMORY ORGANIZATION Thl,,1 dmr,tl'I' considers the architecture of a computer's memory system. The ~.'hlll'lll'leri~lics of the most important storage~device technologies are surveyed. ltic,·nrchkal memory systems including virtual memory are studied, as well as 1111mc hi1th-spccJ memory organizations. !. I MEMORY TECHNOLOGY Every computer system contains a variety of devices to store the instructions und data required for its operation. These storage devices plus the algorithms (either implemented by hardware or software) needed to control or manage the stored information constitute the memory system of the computer. In general, it is desirable that processors should have immediate and uninterrupted access lo memory, so the time required to transfer information between a processor and memory should be such that the processor can operate at, or close to. its maximum speed. Unfortunately, memories that operate at speeds comparable lo processor speeds are very costly. It is not feasible (except for very small sys- ll'llls) to employ a single memory using just one type of technology. Instead the slorcd information is distributed in complex fashion over a variety of different 111c111ory units with very different physical characteristics. The memory components of a computer system can be divided into three 11111i11 groups: I. /11/<'rnal processor memory. This usually comprises a small set of high- J20 ·~· 'lllJWIWVIIT 'VMfflM•I\TN'l"f RI 11peed rt11l•ttr11 u11oll m1 worklna rca1Ntor11 for tcn11mr111·y 11torqc or ln11truc- tiun111 ind dntn, 2. Ma/11 mrm,wy (nl110 l'nllcd primary memory). Thif'I iN II rclntively lurge ful memory u11cd for r,rogrnm und dutu slurage during computer operation. It is churuc1erlzcd hy the fuel that locations in main memory can be directly accessed by the Cf>LJ instruction set. The principal technologie~ u11cd for main memory are semiconductor integrated circuits (ICs) and ferrite cores. 3. Secondary memory (also called auxiliary or backing memory). Thi11 i1111en• erally much larger lh capacity but also much slower than main memory. It is used for storing system programs and large data files and the like which are not continually required by the CPU; it also serves as an "overflow" memory when the capacity of the main memory is exceeded. Information in secondary storage is usually accessed indirectly via special proarumN that first transfer the required information to main memory. Reprc11enlitllvc technologies used for secondary memory are magnetic disks and lt1pe11, The major objective in designing any memory system is to provide ude• quate storage capacity with an acceptable level of performance at a rea1mnublo cost. Four important interrelated ways of approaching this goal can be idcn• tified. I. The use of a number of different memory devices with different cost/per• formance ratios organized to provide a high average performance ut u low average cost per bit. The individual memories form a hierarchy of storuyc devices. 2. The development of automatic space-allocation methods to make more cf• ficient use of the available memory space. 3. The development of virtual-memory concepts to free the ordinary Uf'lel' from memory management and make programs largely independent ot' the physical memory configurations used. 4. The design of communication links to the memory system so thut ull processors connected to it can operate at or near their maximum rutc,111, This involves increasing the effective memory processor bandwidth nm.I also providing protection mechanisms to prevent programs from acccsNing or altering one another's storage areas. 5.1.1 Memory-Device Characteristics The computer architec1 is faced with a bewildering variety of different memory devices to use. However. all memories are based on a relatively small numhcl' of physical phenomena nml employ relatively few organizationul principles. In this section we cxamini: the l\llll'lionul 1.:haructeristics that arc common to the devices used to build muln IIMll 11ec1111d11ry 1.:omputer memories. A knowledge of these general propcrtlc11 h1 ONNllltl11l l11 cvuluating any memory tcchnoloKY, The U l'OMl'UTH 411CHITIC'TUlll1 ANU OIIUANll!ATION h11ruc1c1·IKtlc11 nnd undcrlyinu rhysicnl prlnclr,IH nr 1eomc 11pccil\c n:prcscn· 111lvc 1cchnolo1ic11 nrc 111110 discussed. :u1t The cost of a memory unit is most meuningfully measured by the 1urchm1e or lease price to the user of the complete unit. The price should 114:ludc not only the cost of the information storage cells themselves but also he CllNI of the peripheral equipment or access circuitry essential to the opera· Ion uf the memory. Let C be the price in dollars of a complete memory system vlth Shits of storage capacity. We define the cost c of the memory as follows: c ~ dollars/bit ~cce!IN time und access rate The performance of a memory device is primarily lctcrmincd hy the rate at which information can be read from or written into he memory. A convenient performance measure is the average time required o rend II lh.cd 11111oun1 of information, e.g., one word, from the memory. This is ornicJ the 1·1•11(/ clfft1,1·.1· time or, more commonly, the access time of the memo- ·y 1mJ l1e tlcnotcd hy , .. 1• (The write access time is defined similarly; it is ~ph:1dly, hul 1101 ulwuys, equal to the read access time.) Access time depends lfl lhl f1hy1dc11l churnctcristics of the storage medium. and also on the type of 111111101 111t1.'111111!Nm used: a precise general definition of tA is difficult. It is l•IIMIIY ,·11h:11l11trd from the time a read request is received by the memory unit U lh1 lllllt 111 whkh all the requested information has been made available at h1 n1tmory output terminals. The access rate bA of the memory defined as t.:i 1 ind tno""lll'~ll in words per second is another widely used performance ru:n111111r~ l'o1· memory devices. ( 'lturly low cost and high access rates are desirable memory character- 111tlc1e; 11nfort11natcly they appear to be largely incompatible. Memory units with 1IJi1h IICCl!Ns rates are generally expensive, while low-cost memories are rela- lvcly Nlow. Figure 5.1 shows the relation between cost c and access rate hc1 for 1omc representative current ( 1977) technologies. This relation can be approxi- n11tcd by the straight line AB. If we write bA lOY and c 10.r, then ' -, m., t k ', where m denotes the slope of AB. Hence h A = 1 om.r + k' and 1 1 kl'"'. where k, k' are constants. From the data provided in Fig. 5.1. it can ,c: conclmled that m 2; hence hA ""'kc2• Improvements in manufacturing techniques are continually reducing c and 111.Tru~ing h. 1 for almost all memory technologies. Most forecasters expect the 1111·1111u•tcr III to remain fairly constant [25]. Note that the technologies shown 111 l·i11, . .11.1 foll into two groups, with a significant gap between them. Several rn·nl t~chnologies, notably magnetic-bubble memories and memories built 'rnm dmritc-coupled devices (CCDs), appear likely to fill this gap. ,,•rtfllfll modeH: random and serial An importunt rropcrty of a memory device is 1hc: orllt'I' or sequence in which informntlon l.'lln hr 11c1.·essed. If locations may i'C 11cccsscd in uny order und KCCCN111 time 1111 inJept'1ulcnt of the location hcing ,ccc~Nc:d. the memory 111 termol.l M 1·11111/m,1 111·,·,•.B 111,•mory. Ferrite-core und IQIO J09 10a to7 ';;,' JQ6 "e ~ 105 J1D'1 ~ ~ 103 8 < 102 101 JOO 10-t 10-2 10-7 A 10-6 Srmlcomluctor hipolur devices, / Semiconductor MOS devices"- Magnetic drums and disks Punched cards and paper 10-5 10-4 I 0-3 10-2 10-1 Cost c, $/bit MIIMOIIV C')IIOANl7.ATION H 100 Figure 5.1 Access rate versus cost for various memory technologies. semiconductor memories are usually of this type. Memories where storage locations can be accessed only in certain predetermined sequences are called serial access memories. Magnetic-tape units and magnetic-bubble memories employ serial access methods. In a random access memory each storage location can be accessed in- dependently of the other locations. There is, in effect, a separate access mecha- nism. or read-write head, for every location, as illustrated in Fig. 5.2. In serial memories. on the other hand, the access mechanism is shared among different locations. It must be assigned to different locations at different times. This is accomplished by moving the stored information, the read-write head, or both. Many serial access memories operate by continually moving the storage loca- Read-write head selector Read-write heads I I I I I I I I I ~~~;:i~ns l''t1ure 5,2 ( '11ni:trt1111l mmlel of II rnmlom ucceK~ memory. r ~ 324 COMPUTER ARCHITECTURE AND ORGANIZATION ,U Re,d-wriIW t lluH,•t 1·rMM1•1 l"l111n 11,,t M1•mm¥ 11•-111111111111 In H 1IHll'U1 111111h11111IHUll 11111m111·~. chlirae reprc11ent11 u O. Over " period uf lime, t1 11turcd chnrac tcnJN H> lcuk away. causing u los!I of informution unlc11s thc ch1t1·gc h1 rc!ltorcd. The proce1111 of restoring is called rt'./i·,,.,·hi11~. Memories which require periodic rcfrc!!hinu are called dynamic memories, as opposed to .1·tatii' memories. which require no refreshing. (Note that the terms dynamic and static in this context do not refer to the presence or absence of physical motion in the storage device.) Most memories using magnetic storage techniques are static. Refreshing in dynamic memories can be carried out in the same way data is restored in a ORO memo· ry. The contents of every location are transferred systematically to buffer reg is• ters and then returned, in suitably amplified form, to their original locations. Another physical process that can destroy the contents of a memory is the failure of its power supply. A memory is said to be volatile if the stored infor- mation can be destroyed by a power failure. Most semiconductor memories are volatile, while most magnetic memories are nonvolatile. Cycle time and data-transfer rate We defined the access time t,1 of a memory as the time between the receipt of a read request by the memory and the delivery of the requested information to its external output terminals. In ORO and dy- namic memories, it may not be possible to initiate another memory access until a restore or refresh operation has been carried out. This means that the minimum time that must elapse between the initiation of two different accesses by the memory can be greater than t A; this rather loosely defined time is called the cycle time tM of the memory. It is generally convenient to assume that tu is the time needed to complete any read or write operation in the memory. Hence the maximum amount of in- formation that can be transferred to or from the memory every second is 1/tM; this quantity is called the data-transfer rate or bandwidth bM. The data-transfer rate is frequently measured in bits per second (bauds) or words per second. A factor limiting memory bandwidth is the memory bus width w, which is the number of bits that can be transferred simultaneously over the memory bus. w is generally, but not necessarily, the same as the standard memory word size. Clearly bM = wtM1 bauds. In cases where tA ¥ tM, both are used to measure memory performance. The access time may be more important in measuring overall computer-system performance since it determines the length of time a processor must wait after initiating a memory access request; during the remainder of the memory cycle, both processor and memory can operate simultaneously. On the other hand, if tA ¥ t, 11 , then bA ¥ bM and bA does not represent the actual number of accesses that can be carried out per second, whereas bM does. Physical storage media Many different physical properties of matter are used for information storage. The more important properties used for this purpose can be classified as electronic, magnetic, mechanical, and optical. A basic requirement for a storage medium is that it have two well-defined physical states that can be used to represent the logical O and I values. The access rate of II p11rtlcul1r mamory dovlco dopamh un tha riu, 1u which ltN phy11lc1I 1t1t11 cun he meutmred unJ ultcrcJ. If u memory 11nd lhc 1uoce1111or conncc1ed lo ii u11c lhc 11umc i,hyNlc1d stornge mcJht. I hey urc !!Uh.I lO he ,·0111JJC1Ji/JI,•. If' 1hcy urc not cumputlhlo, specitt.l interface devices. c11lleJ 1ra11.wl11,·,•r.,·. urc nccc.led; they cun tuJJ 11lgnlft· cantly to both memory cost anJ itccess time. Most processors employ eloc• tronic (semiconductor) technologies; hence only memories using similar tech• nologies are compatible. Figure 5.5 contains a table showing representative physical charecteri,1tlf.:. of some major modern memory technologies. Miscellaneous physical characteristics There are several other attributc11 of memories that do not directly involve functional behavior but neverthelcu 111• nificantly affect the cost of a memory technology in a particular appllcutlun, A factor determining the physical size of a memory unit is the .1·tor1111t• dl'H• sity measured, perhaps, in bits per unit volume. The physical size al,m doter• mines the portability of the memory. The energy consumption of the memory units may contribute significantly to the running costs of a computer 11y111tcm. Large energy consumption combined with high storage density may require OJI• pensive cooling equipment. Finally, some mention should be made of reliability. This can be mea11ured PhyHicul Cost c, Access Access ~,m·uue Technology $/bit time t.i, s mode Alterability Permanence medium Bipolar semiconductor 10· 1 10·-ij Random Read/write NDRO. volatile Electronlt.: Metal-oxide- semiconductor (MOS) 10-, 10' Random Read/write ORO or NDRO. Elcclrnnlc volalile Ferrite cores I0-2 10-6 Random Read/write DRO. Muym:llc nonvolatile Magnetic Random disks and or semi- drums 10-, 10-, random Read/write NDRO, Ma11netlc nonvolatile Magnetic tapes 10-0 rn-1 s~rial Read/write NDRO. Magnelii: nonvolatile Punched cards and paper tape 10-6 10 Serial Read only NDRO. Mechunicul nonvolatile Figure 5.5 Characteristics of representative memory technologies. 328 COMPUTER ARCHITECTURE AND ORGANIZATION by the mean time before failure (MTBF). 1n general, memories with no moving parts have much higher reliability than memories such as magnetic disks which involve considerable mechanical motion. Even in memories that involve no moving parts, reliability problems arise, particularly when very high storage densities or high data-transfer rates are used. The reliability of any memory can be increased by using error-detecting and error-correcting codes (see Sec. 3. l.2). 5.1.2 Random Access Memories Random access memories are characterized by the fact that every location can be accessed independently. The access and cycle times for every location are constant and independent of its position. Figure 5.6 shows the main compo- nents of a random access memory unit. The storage cell unit comprises N cells each of which can store one bit of information. The memory operates as follows. The address of the required location (a set of w :::: 1 cells) is trans- ferred via the address bus to the memory address register. The address is then processed by the address decoder which selects ,the required location in t,he storage cell unit. A read-write select control line specifies the type of access to be performed. If read is requested, the contents of the selected location is trans· ferred to the output data register. lf write is requested, the word to be written is first placed in the memory input data register and then transferred to the selected cell. Since it is not usually desirable to permit simultaneous reading and writing, the input and output data registers are frequently combined to form a single data register (also called the memory buff er register). Each storage cell has a number oflines connected to it. Figure 5.7 shows an idealized model of a cell and its external connections. The address lines are used to select the cell for either reading or writing, as determined by the read- write control lines. A set of data lines is used for transferring data to and from the memory. The actual number of physical lines connected to a storage cell is Address 1----[>,-----i Storage decoder ceU unit Address drivers Internal iriutu { Ao li111~N A I /114 x J /.II} ; 1 output Xn Xi '---v----1 Data input r---""""'" --·- A0 H ,11-·www·--··"' I I I• I I I 12~ 1/4 A 3--1 decoder Enable AE Xo----------..J X1 ----------....1 Xr----· .... X,1-·- .. --. /1/4 X 2 M4,2 •·111ur, 5,14 ,\ 1 h • •I hll owmory 111·1·u~. Figure S.13 Symbol for the 4 x 2 hll momur~ circuit of Fig. 5. 12. WE 1-- ,'1,/4 X 2 t-t-< M4x; ...... t---t--- /114 X l - ---+-·+··-111 ··•/1 --II ···-/1 - '"""'l"Wll!'II ,_llllftfTlllll"'llfl i,1'111 HMli,l'lt'11'1111Jr, ---~- every IC: the rornulnln1& two 111.l1Jrc11111 llneN 11re lnf'IIIN 10 the c,iternul JecoJer. Httch of' the nutrmt lines of thi11 tt1 bu• IJntu huff.r ro11Mor lnhihit I words. The addrco uf the information to be accessed is loaded into the memory address reai11ter. The output of the address decoder determines the track to be UNCd (the tl'uck address) and the location of the desired block of information within the lr1u:k (the block address). The track address determines the particulur reul.l-wrlto head to be selected. Then, if necessary, the selected head is moved Into riu11I• tion to transfer data to or from the target track. The desired block cannot ho l111• cessed until it reaches the selected head. To determine when this occur11, 11om1 type of track position indicator is needed which generates the addre1111 of the Storage tracks End of block detector l ....- Head fnable circuit f Comparator Block r • address Start of block detector r- l Read· write head Address decoder A,hln·ss hus .- Fl11ure !1,11 A •tthll 11n1•,, m~·mory unit. r- 1 Read· write ,--. head • + ... t I I ; Timing and control 1011,ic ~ Cont nil h11s r-- l Read· writt head r I r l Rcud· ,..... write hc11ll r ,..... r-- .. - 1---.,.---' 1>11111 hui I I 1111 ~MN• block lhat IN currently 1"UN1dn11 the re1uJ-wrltc hctuJ, The 11,cnerutcd 111Jdres11 h1 compared with lhe block 11ddre1111 produced hy the 11ddreNN decoder. When they match, the selected head h1 enabled and data tmnsfcr between the storugc track and the memory data buffer registers begins. The rcud-write head is disabled when a complete block of information has been transferred. The memory input and output data registers are generally shift registers of the parallel-in/series- out and series-in/parallel-out types, respectively. Classification by physical access mechanism The number of different types of storage media and access mechanisms used to construct serial access memories is very large; they can be classified in many ways. A possible classification. which w.e will use here, distinguishes memories on the basis of type of physical access mechanism used. I. M c•mories with dynamic access mechanisms. The read-write heads und/or the storage locations are moved through space, usually by elec- tromec hanicul devices such as electric motors. in order to perform an ncce!IN. The most widely used group of secondary memory devices, mag- nctic-diNk und -tape units, fall into this category. 2. M,•,,1111'11·.~ ll'ith slotir acres.\· mechanisms. No mechanical motion is l'ClijUlrcd lo 11cccss the memory. The now obsolete mercury delay line momury WIIN of this type, as are magnetic bubble and CCD memories. M1111nc1ic mcmrn·ics with electromechanical access have had many years of lln1lor11ncnl. The 111rn·n(.tc media (disks and tapes) used are inexpensive and 11h111 riot·tnhlc. However, electromechanical equipment is inherently unreliable 11nd n IUl\lnt' liOUl'Ce of 1:ompulcr-system failures. Memories with no moving l"llt'tli urc therefore very uttriu.;tivc from the point of view of reliability, We will dlscu11~ l'Cl"l'Clitnlntivc examples of both types of serial access memory. M1111netlc-,1urface recording In magnetic-drum, -disk, and -tape memories, in- form111ion is stored in tracks on the surface of a magnetic medium, usually ferric oxide. Each cell in u track has two stable magnetic states that represent the log- ical O and I values. The magnetic states are typically defined by the direction or magnitude of the magnetic flux in the cell (many different methods of encoding information in magnetic states are used). Each cell is therefore similar in princi- ple to a ferrite core. As in the case offerrite cores, electric currents are used for altering and sensing the magnetic state. In surface magnetic storage, however. an external read-write head of the type shown in Fig. 5 .19 is employed. The read and write currents pass through coils around a ring of soft magnetic mate- rial. There is a gap in this ring which permits magnetic flux to pass into the sur- rnunding air. A very narrow space separates the ring gap from a cell on the storage track. so that their respective magnetic fields can interact. This interac- tion permits information transfer between the read-write head and the storage medium. Wrlk ilrlvcr llutN ln1111t Rcuil•writc hcutl 11~ l>•h 011111111 11 MMMOIV nanANII.ATION Ml Son1e umrillflcr Track ~urfacc I t I +HJ ht j +HI tt t ! :,-Mu1ncllc _ nl~lllum Motion ) Substrate Figure 5.19 Magnetic-surface recording mechanism. In order to write information, the addressed cell is moved under the rond• write gap. A pulse of current is then transmitted through the write coll whh:h alters the magnetic field at the ring gap; this in turn alters the stutc of mn11no· tization of the cell under the gap. The direction or magnitude of the write cur• rent determines the resulting cell state. To read a cell, it is moved pus I the rend· write head. The magnetic field of the cell induces a magnetic field in the con, material of the read-write head. Since the cell is in motion, this magnetic lleld varies and so induces an electric voltage pulse in the read coil. Thi11 volt1111 which is then fed to a sense amplifier identifies the state of the cell. The reKdout process is nondestructive. In addition, magnetic-surface storage is nonvulntllc, Electromechanically accessed magnetic memories are distinguished hy the shapes of the surfaces in which the storage tracks are embedded. In drum mcm• ories, the tracks are parallel circular paths around the surface or u cyllndrh:nl drum. In disk memories, the tracks form concentric circles on the surt'ucc ol' 11 disk. In tape memories, the tracks form parallel lines on the surfnce of II lun11 narrow plastic tape. Magnetic drums will not be discussed explicitly, 11incc they are seldom used nowadays and their operation resembles thul of m1111nclh: disks. Magnetic-disk memories A magnetic disk is similar in size and appeumncc to 11 phonograph record. It may be made of aluminum or plastic with a thin con tiny of magnetic material on its surface. On each sunace of the disk there ure up lo several hundred tracks which are arranged in concentric circles us shown In Fig. 5.20a. Typically several disks are attached to a common spindle lo form 11 disk pack that can he easily removed. During operation of u disk memory, the disks are rotated continuously at a constant speed. In modern disk units, ench recordin11 ~url'llcc is supplied with HI least one read-write head. The reud-wrllc ,.. ('OMl'U'flll itilt'HITIIC'TU RR AND nRnANlf.ATI (a) Read-write Q /ea!_ -;;::,:-cc----- ~ - J f I - J D Drlve L molot (h) Figure S.20 {u) Top view and (b) side view of a magnetic disk pack and its read-write arm. heuds may be connected to form a read-write ''arm," as shown in Fig. 5.20b, so that all heads move in unison. This arm may then be moved in a fixed linear path to select a particular set of tracks. Disk memories have also been designed with one head per track. thus eliminating the need for a moving read-write arm and effectively reducing the seek time to zero. Example 5.3: The IBM 3330 disk storage unit This is a moving-arm disk memory designed for the IBM System/370 computer series. Each disk drive accommodates a removable disk pack, the IBM 3336. Figure 5.21 shows some of the principal characteristics of this device. The basic unit of storage used is 1 byte = 8 bits. Number of disks per disk pack Number of recording surfaces Number of tracks per surface Track storage capacity Disk pack storage capacity Disk rotation speed A veragc seek time A vcrage latency Data transfer rate 12 19 404 13, 030 bytes Approximately 10" bytes 3600 r/min 30 ms 8.4 ms Approximately II X 10" bytes/, Figure 5,21 Characteristics of the IHM 3 33111.lisk sturn11c unit. Mqnetk-tape memorlu M1111,ncth.:-t11r,e momurlri11 nrc e11Ncntlully 11imllur to domeNtic tupc recorder11: in,dcud ol' Ntorlnu. 11nulu11, informution. they store hin11• ry digital informulion. The Nloruu.e medium huH "" ilN NUbMtrnte u Hexlhle pl11Ntlc tape. I nt'ormation is generally stored in nine J'H1rullcl longitudimtl mu.:kN, A read-write heud that can simultaneously access 1111 nine tracks is used: hcm\:a the basic memory "word" is 9 bits. This usually comprises 8 bits ( I byte) of In• formation and I parity check bit. Magnetic tapes are stored on reeh1 11nd provide a compact. inexpensive, and portable medium for storing large infor• mation files. Figure 5.22 shows the main components of a tape drive unit. Two reelN are used to store the tape. Unlike disk or drum memories, the storage medium h1 not in continuous motion. When an access request is received, the lltpe h1 moved forward or backward to the desired location; it is stopped at the end of the data transfer. In order to permit rapid starting and stopping. two l11rao loop11 of tape are permitted to hang freely on each side of the read-write helld, Thi ro- tating drums (called capstans) that pull the tape past the read-write hHd lrt here only to accelerate the tape in these loops while the reels themMCIYIII ll't brought up to speed. This procedure reduces the influence of the inert ht uf lhl reels. After its initial acceleration, the tape moves at a constant velocity, cnll1d the tape speed. Data transfer takes place only when the tape is moving at constant velm:t.' ty; hence the data-transfer rate of a particular tape unit is determined hy the storage density and the tape speed. For example, if the tape storage density IN 1600 bytes/in and the tape speed is 18. 7 5 in/s (these figures are representative of commonly used low-speed tape systems), the data-transfer r111e IN 1600 x 18.7 5 30,000 bytes/s. Information stored on magnetic tapCII Ii. organized into blocks of various sizes. Relatively large gaps must be inserted nt the end of each block to permit the tape to start and stop between blockN. Nt,to that the time required to rewind an entire tape is of the order of I min. Read-write head Tape reels Capstans ~·tuuN! 5.22 A mu11nc1ic-1npc drive unll. - 1.u"'r"' 1,u, 1'11'1/l'IITlll1TV .... /lll"Y UIIIUl'll"lllilU'IV~ (a) (h} t Bias H field b Figure 5.23 The formation of magnetic bub- bles. M11neUc-buhble memories In thin plates of certain materials, e.g., orthofer. rltc11 1mJ 11urnch1, the n11tural directions of magnetization are perpendicular to the NUrf11cc ol' lhe r,lutes. When no external magnetic field is present, serpentine •re1u cnllot.l ,/111,ia/n~ form spontaneously in the plate; see Fig. 5.23a. The ma- torhd within cuch is increased, the t.lomuins whose direction of magnetization is opposite to that of H b contract in Nl1.e until eventually they are cylindrical in shape. These cylindrical domains ure culled mti/{netic bubbles. Typical bubble diameters are I to 10 ,um. Bubbles can be moved at high velocity through the plate if an additional ex• ternal magnetic field called a drive field H dis applied whose direction is parallel to the plate surface. Usually the drive field is rotated at a fixed rate. This rotat- ing field is generated by an electromagnet, so no mechanical motion is involved. By depositing linear tracks of a soft magnetic material (permalloy) on the plate surface, the bubbles can be moved along predetermined paths. The permalloy tracks are designed so that they constrain the bubbles to remain under the tracks. They also convert the drive field into magnetic fields that force the bub- bles trapped under the track to move continuously in a fixed direction. Figure 5.24 shows a T-bar track, which consists of T- and bar-shaped per- malloy elements in an alternating linear sequence. The rotating drive field H,1 induces a magnetic field in the permalloy elements parallel to Hr1, Thus, depending on the orientation of H "' the extremities of the T's and bars and the junctions of the T's become north (N) or south (S) magnetic poles at various times. Each magnetic bubble is like a small magnet, one of whose poles (the S pole in Fig. 5.24) is at the surface on which the permalloy track has been laid. According to the classic law of magnetism (like poles attract, unlike poles repel), a bubble is attracted to the nearest N pole in either a Tor a bar. Figure 11,rmillu)' ltu , .. ,, .. , D [ 0 ;'~2 . t ' IJ ' ~,, s s s ?-DuOuO ) 11• 1Ja1Jo1J~ t uo~ouo u, c Figure 5.24 Propagation of magnetic bubbles along a T-bar track. 5.24 shows how one revolution of H" causes a bubble to move in II stmiaht line from one T to the corresponding position under a neighboring T. Many other equally ingenious permalloy track designs exist for bubble propagation [], I HJ. Bubble-memory devices are constructed by forming closed tracks urmmd which bubbles can be circulated continuously at a fixed rate. Each truck iN then a shift register. In the case of the T-bar track of Fig. 5.24, each pair or 11dJ1tcen1 T's and bars constitutes a cell. The presence (absence) of a bubble in n cell denotes the logical I (0) state. In order to be able to write data into u cell. two special devices are needed; a bubble wmerator to introduce buhbleN into u track and a huhble annihilator to remove bubbles. These device!! al,m ucl u11 transducers, since the input data is invariubly in the form of electricul r,ul11c11. Ml ('OMl1UTIIII All('HITIIC'TUIU\ ANll OMUANIJ.ATION l Uuhhl~ Mtora11e tra~k - Input duta Output data Figure 5.25 Basic bubble-memory organization. Ful' remlinj purposes. a lmbhle detector is required that can produce an elec- 1rh:11I Ml1i1n11I indicnling the presence or absence of a bubble. The Nll'Ucture of n hubble memory chip organized as a single N-bit shift reg· h1tor l1111hown In Fi1&. ~.25. The average access time for I bit (which in this case h1 lho 11111ne "" the latency) is the time required to propagate a bubble through N/2 i:olh1. The nccess time can be decreased by a factor of k by the common 111 rn11111~111 of miillll ~ independent bubble shift registers that can be accessed in r111·1111"1. An e,rnmplc: of a relatively fast bubble-memory system is the major-minor /11111' ur1i111niiutiun shown in Fig. 5.26. Data is stored ink bubble shift registers culled lhe minor loors, which are rotated in synchronism by a common mag- netic-drive field. Data is transferred to or from the minor loops via another bub- hie Nhift register called the major loop. Each minor loop is connected to the m1,iur loop hy a device called a transfer gate. An external control signal applied to the transfer gates causes one bit of information to be transferred between the major loop and each of the minor loops. The major loop is attached to the input· output circuitry (bubble annihilators, generators, and detectors) and acts as a communication link between the minor storage loops and the outside world. Magnetic bubble memories have only recently become available; they are not ycl widely used in computer systems. Integrated-circuit manufacturing tcchni4ues are used that require few steps, which should eventually lead to very low cost per bit. Storage capacities of I 06 bits per chip have been reported. The lack of moving parts promises high reliability and suggests that huhhle memories may ultimately replace "those marvelous mechanical whirl- ing dervishes" [3]. magnetic disk and tape memories, as the main technology for secondary or low-speed computer memories. Nonvolatility is achieved in buhhle memories by usinj u permanent magnet to 1&encrutc the hias field. On the nc1&ative 11iJc, huhhlc devices are incomrmlihle with procei;sor tech· nolo1&ies so lh1U ln1erll1dn.i cnn he dillicult. Howevc:r, ii is possible to design Minor loopH t Input data Output data Ml1MOMV OMUA~l1.A'l'tnN Mt Transfer control Figure 5.26 A major-minor loop bubble-memory organization. bubble circuits that perform logic functions, which may minimize lhiN dlf• ficulty. Indeed the construction of an entire computer using m111&ncllc huhhlc technology has been suggested [ 19]. CCD memories The charged-coupled device (CCD) announced hy Bell Tele• phone Laboratories in 1970 is a shift register whose cells are constructed from MOS capacitors [8, 18]. Information is represented by packets of electrkully charged particles, e.g., electrons, which circulate continuously throu1&h the Nhln register under the influence of an electric "drive" field. The pre,~ence (uhNcncc) of a charge packet in a cell denotes a logical I (0). A CCD memory i11 in muny ways anaJogous to a magnetic bubble memory with charge packet11 rcr,lm:ln11 magnetic bubbles. The small size of the MOS capacitors makes it r,01111ihle 111 manufacture one-chip memories with very large storage capacily. C,lil,, 1'14K bits. Figure 5 .2 7 a shows the structure of a representative CCD. It consists ut' 11 linear array of closely-spaced MOS capacitors which form the hasic memory cells. The drive field is provided by a three-phase voltage suurcc. Every lhirtl cell in the array is connected to the same phase. The waveforms of the drive voltages are shown in Fig. 5.27b. Figure 5.28 illustrates the charge-transfer process. When u nonzero volt111i1c is applied to any cell. a potential well forms in the cell suhslrutc which cun Nlorc a charge puckct whose sign (positive or negative) is oppusitc to thut of lhe applied volln1i1e. The well "depth" is determined hy lhc volt111i1c mojniluJc, Fl11· ure 5.2H Mhow11 thl' potcntinl well shnr,c 111 the lhrcc point" of time Jctlnc'-1 In cnMl'UTII A Ultt r\Nn Oll(IANIIA 11,---··--·~ ... ' 'J-- I =t , I I --v, • Mr1111 w111t• -. _ _.---, (hl1lr / ln•11l11tm Srmk,111,lul't(n· wuh1trulr (a) r I ---11 I I ....,, ~ I V volts I', ,, '1 ,, -Time (/11 ..... l,ff A tih•II' \IIIUl'lfll ilP\il~'" H '( 'I)) Nhifl rc11istcr: (al physical structure; (h) three-phase 4ft~1,llftd wn,tumu PII, J1J1/1, Al 11 H riotcnllnl well i'I 11ssuciu1cd wilh the V3 gates only. A charge PMktl ~r,vlumdy h\1111.·t~d fmm nn external source can be stored in any of these Wlllll on, •u"h 1uu:kcl 111111hown in the figure. At 12 a potential well forms under &ht~·, 11111, whllc, lhc wclliti undcr lhc V:1 gales begin to shrink. As a result, the filh1r11 l"lill!klt bt11l1111 lo llow from the Va gate to the neighboring V1 gate. At llmt '• lho 11hn1·11• ll'llll!til'el' is comrlclc. It is easily seen that the three voltages lnlorHl!I In 11111:h II wuy lh11t they effectively cause the potential well containing th~ ~h111·1i1e r,111:kcl 10 move linearly along the cell array. The period of the drive /1 ,,i li'laur, 11.211 Ch11r,11c-rii11: kiit 11·1111Nfcl' in u c TI> memory. Ml'lMOM\' OIICIANIIA'rlON !II voltitgeN JctcrmincN the b1v1ic 111hin period 11mJ hcm:c the Nr,ccJ or opcrullt,n llf the CCI) memory. Three 1uijucent cells are needed to l'llorc une bit or Inform•· lion becituse the churge packet in uny cell must he isolated from other charae pnckets by its two neighboring celh,. CCDs may be organized to form computer memory units in vuriou11 w11.y11. The simplest organization is that of a single closed shift register uround which charge packets circulate continuously. Refresh amplifiers must he included to regenerate the charge packets periodically. Access time can be dccrea11ed by using organizations similar to the major-minor loop organization for bubble memories shown in Fig. 5.26. Unlike bubble devices. CCDs are volatile. On the other hand, CCDs can be more easily interfaced with convention11I P1eml· conductor logic circuits. 5.2 VIRTUAL MEMORY Virtual memory loosely describes a hierarchical storage system or ul lc1uu IWU levels, which is managed by an operating system to appear to a w,er Ilka It 1dngl1 large directly addressable main memory. There are three main rc1111onM fnr using virtual memory . I. To free users from the need to carry out storage allocution 2. To make programs independent of the configuration and cur,ucity ol' the memory systems used during their execution 3. To permit efficient sharing of memory space among different m1en1 und achieve the high access rates and low cost per bit that is posNihle with a memory hierarchy Most virtual memory systems employ a two-level hierarchy compriNinai It muln memory M I of relatively small capacity S I and a much larger secondary memo• ry M2 of capacity S2• The ordinary user, who may program in high-level hm- guages only, views the system as a single virtual or logical memory of unlimited capacity (in fact, its capacity is bounded by S1 + S2 ). The virtual memory 111 addressed by the set L of logical addresses or identifiers explicitly or impllcl1ly specified in the user's program. The fixed physical storage locationl'I in lhc memory units are identified by a set of physical addresses P. Virtuul memory systems are implemented by providing an automatic mechanism for the 11Jllre1oi mapping.f:L ~ P. S.2.1 Memory Hierarchies The various m1,ior units in a typical memory system can he viewed us forrnin1i111 hierarchy l)f memories (M 1• M2 •• ••• M,1) in which euch memhcr M1 il'I in immc sense suhordinute to the next highest memher M, _1 of the hierurchy. The CPU and ,nher r,roce11!llors communicate dh'cctly with the: llrst memher of the hlcm.r- ('I'll 1'1111 C'uchr M1l11 m,mory - lu) Muin memory (h) s~~·omluy llltllloty Srcomlury memory Flicure !.29 Common memory hierar• chics: (a) two-level; ( h) three-level. chy M 1• MI can communicate directly with Mz, and so on. Let c1, tA 1, and S1 demote the cost per bit, access time, and storage capacity, respectively, of M1• The following relations normally hold between the memory levels M1 and M1+1: c, > <'1+1 (AJ < fAi+I 51 < S1+1 Fi1urc 5,29 shows two of the most common memory hierarchies. l)ur'int& the execution of programs, the CPU generates a continuous stream ol' k)1Jicul memory addresses. At any time these addresses are distributed in Nome f'llshion throughout the memory hierarchy. If an address is generated whh.:h h1 currently assigned to Mi where i ;o, I, the address must be reassigned to M 1, the only level of the memory hierarchy that the CPU can access directly. ThlN rdocntion of logical addresses generally requires the transfer of informa- l Ion hctween levels M1 and M 1, a relatively slow process. In order for a memory hlernrchy to work efficiently, the addresses generated by the CPU should be l'nund in MI as often as possible. This requires that future addresses be to some eic.tent pl'cdictable, so that information can be transferred to MI before it is ac- tuully referenced by the CPU. If the desired information cannot be found in M 1• then execution of the program originating the memory request must be sus- pended until an appropriate reallocation of storage is made. Locality of reference The predictability of logical memory addresses which is essential to the successful operation of a memory hierarchy is based on a com- mon characteristic of computer programs called locality of reference. This describes the fact that over the short term, the addresses generated by a typical progrnm tend to be confined to small regions of its logical address space, as Illustrated in Fig. 5.30. The items of information whose addresses are rcfrrcnccd during the time interval from t - T tot, denoted (t-T, I), constitute lhc ll'orkin,: set W(t, T) [10]. It has been found that W(t, T) tends to change rut her slowly; hence by maintaining all of W(t, T) in the fastest level of memo- ry M1, the number of references to M 1 can be made considerably greater than lhc numher of references made to the other levels of the memory hierarchy. One reason for locality of reference is that instructions and, to a lesser ex- tent. datu are written down and subsequently storec.J in lhe computer's memory in approximutely the order in which they are neec.Jed during program execution. c ~ I !l = e ~ ~ Logical address space Figure 5.30 Typical nonuniform distribution of address references. Suppose that a request is made for an address A containing an instruction/, and this address is currently assigned to M 1 ¥ M 1• The next most likely inNtructlun to be required by the CPU is the instruction immediately following / whoH address is A + I. Thus instead of simply transferring the instruction / to M 1 , It is desirable to transfer a block of consecutive words containing/. A common way of implementing this is by subdividing the information stored in M, into pages, each containing S1,1 consecutive words. Information is then transferred one page or S 111 words at a time between levels M; and M I H· Thus if the CPU requests word / in level M1, the page of length S,, 1_ 1 in M1 containing / is lrun,,1. ferred to lvf1_ 1, then the page of length 51,1_ 2 containing/ is transferred tu M1 v, and so on. Finally the page P of length S1,1 containing / is transferred to M 1 where it can be accessed by the CPU. Subsequent memory referenccN ura likely to refer to addresses in P, so that the single page transfer to M I unth:1- pates future memory requests by the CPU. A second factor in locality of reference is the presence of progrum loopN, Statements within a loop may be executed repeatedly, resulting in II hith frequency of reference to their addresses. When a loop is being executc,I. it hi clearly desirable to store the entire loop in M I if possible. Design objectives The goal in memory hierarchy design is to achieve u perform- ance close to that of the fastest device M I and a cost per bil close to that of the cheapest device Mn· The performance of a memory hierarchy dcpi:mls un II VII· riety of factors, which are related in a complex manner. The more impor111n1 of these are the following. I. The address reference statistics, i.e .. the order and frequem:y of the io1Jh.:nl addresses ~cnerated by programs that use the memory hicrnrchy 2. The 1u:ccss lime t. 11 of each level M 1 relative lo the CPU IN COMl'UTH AIC!NITlt'Tllllfl AND ORCIANIIA 3. The Nlorqe CQpaclty of cuch level 4. The :dze of lhc hlockN uf informutiun 1run11fcrrcJ hctwccn successive levels ,. The strategy, Clllled the allocation algorithm, u11cd for determining the regions of memory to which blocks of information are transferred by the swapping process These design factors interact in a complex manner which is by no means fully understood. A number of simple analytic models exist, however, which reveal lhe general way in which some of these factors are related. Some representative models of this kind are discussed in the present chapter. It should be emphllsized that simulation is still the major tool for memory-system design. Simullllion is used for determining such program-dependent design parameters 1111 uddrcss reference frequencies. It is also the main technique for evaluating memory-system performance. Coit and performance For simplicity we restrict our attention to the most com- mon form of memory hierarchy, a two-leve1 hierarchy (M 1 , M 2 ). It is not dif- th:ult lo 1l«mcruli1.c the cost and performance measures discussed here ton-level hl11·1m:hh~N. (.'a1t The 11\ICl'llllC cost per bit of memory is given by . c1.S\+c2S2 (.5 t) ( S1+S2 . who1·e c·1 denotes the cost per bit of M, and S; denotes the storage capacity in hltN of M1• To uchicvc the goal of making c approach c2, S 1 must be very small ~umpnrcd with S 2• llil roll" The performance of a two-level memory hierarchy is frequently mcnsurcd in terms of the hit ratio H, which is defined as the probability that a logicul address generated by the CPU refers to information stored in M 1• Hit mtios are generally determined experimentally as follows. A set of represen- lativc programs is executed or simulated. The number of address references to M I nnd M 2, denoted by N 1 and N 2, respectively, are recorded. H is then given by the equation H N (.5.2) Clearly II is highly program dependent. The quantity I H is called the mis.,· mtio. Acc,m lime Let r,. 1 and t.4, denole the access times of M I and M z, respectively, 1·el11ti\lc to the CPU. The ttvcrage timer,. for the CPU to access a word in the MIMOIIV OIIQANIIATION JII memory 11yNlem IN given hy lhe equa.tlun tA=JltA 1 +(I 1/)rA. (.5.3) In most lwo-level hierarchies, a request for ll word not in main memor)' cau111 a block of information containing the requested word to be transferred to main memory. When the block transfer has been completed, the requested word 11 accessed in main memory. The time t,i required for the block transfer is called the block-replacement, or b/ock-tramfer, time. Hence we have 1,. = 1 11 + lit , Substituting into Eq. (5.3) yields • 1 111 = 111 + (1 H)ts (5,4) 1 Block transfer usually requires a relatively slow IO operation; therefore 1 1 11 usually much greater then r111 • Hence t11 , ~ 1111 and f,t1 ""' t8 • Let r tA.f t,1 1 denote the access-time ratio of the two levels of memor)', Let e = t II Ir A, which is the factor by which r A differs from its minimum poulbl1 value; 1 e is called the access efficiency of the virtual memory. From Eq, (5,3) WI obtain I e = --:-:---:-:-: r+(l-r)H (5.S) In Fig. 5.31, e is plotted as a function of H. This graph shows the importance of achieving high values of H in order to make e ""' 1, that is, rA .. IA,• Por ex• 1.0.-----------------r = I 0,8 ~·f/s 0.6 II t ~.§ 8 ~ 0,4 < 1J 0.2 0 0.2 0.4 0.6 0.8 Hit ratio H LO Figure !.JI The ucccss efficiency e of a two-level memory as u runctinn nr hit rntio If ror v11tl11u, v11lues of r ... 1,,/1,,. 11mr,le. NUPflOIIC thHl r - 100. In order tu miakci ,• •· O.'>. we mm1t h1wc II > 0.99K. M,mory uttllu.ition Memory capacity is limited hy co11t coi'!Niderutions: it is therefore desirable that as little memory space as pos11ihlc he wasted. The ef· Hciency with which space is being used at any time can be loosely defined as the ratio of the memory space S 11 occupied by "active" parts of user program!! to the total amount of memory space available S. We call this the ,l'PCll'l' 11tiliu1tion u and write s u=f Since main-memory space is more valuable than secondary-memory space, it is useful to restrict u to measuring main-memory space utilization. In that cttse. the S S II words of M 1 which represent "wasted" space can be attributed to several sources. I. Empty regions. The blocks of instructions and data occupying M I at any time 1Lre generally of different lengths. As the contents of M 1 are changed. unoccupied regions or holes of various sizes tend to appear between suc- cc11sivc blocks. This phenomenon is called fragmentation. 2. H.c~ions occupied by the memory management system. A certain amount of main-memory space is required to store memory management routines 11nd memory maps. ~. Regions occupied by inactive user information. Certain words may be 1runsferred to M 1 , for example, as part of a page. and may be subsequently transferred back to M 2 without ever being referenced by a processor. Some superfluous transfers of this kind are unavoidable, since exact address ref- erences are unpredictable. However, superfluous transfers can also be caused by an inefficient memory-allocation strategy. The efficiency with which a given program Q utilizes main memory can be measured by its memory spare-time function defined as r1' q= Jo S(t)dt where (0, T) is a real-time interval and S (t) is the amount of main-memory srace assigned to Q at time t. (0, T) includes time spent actively executing Q. as well as waiting time while memory swapping and other IO operations take place. Address mapping The set of all abstract locations that can be referenced by a program is loosely defined as its logical address space L. Logical addresses may be explicitly named by identifiers assigned by the programmer. Many addresses are implicit or relative to other addresses. In order 10 execute the program on a MIMOIIY nllCIANIIATION 11'1 p1trtlcuh1r m11chlno, the losicul 11ddrc1111c1 mu111 he mapped onto the phy11lcll addreu Npucc I' of the m11chlne's m11ln memury. This proceNN ir1 c1lled uddr,.u m,1p11i1111 or, rn.:c1uiom11ly, t1ddmu hl11dltt11. The phy11ical ttddreu 11pac1 11 represented by u linear sequence of numberN 0, 1. 2 •...• n-1. Main memory 11 therefore a one-dimensional array of' word locations. The mathematical 1truc• lure oh program represented hy Lis usually more complex; it can include mul• tidimensional arrays, trees, and other nonlinear structures. Before the proaram can be executed, it and its data sets must be "linearized," which mean11, In ef- fect, transforming them into a set of contiguous word sequence!!, each of whleh can fit in main memory. Address mapping can be viewed abstractly as a function/:/, ... P. Thl1 function is not easily determined, since address mapping can be carried out wholly or in part at various stages in the life of a program. specifically: I. By the programmer while writing the program 2. By the compiler during program compilation 3. By the loader at initial program load time 4. By the operating system while the program is being executed Specification of physical addresses by the user was necessary in the eull11t computers which had neither operating systems nor the facilities. to support 1ny programming languages except machine language. It is rarely u11ed nowad1y1. It is generally not permitted in systems where memory space is shared by dff· ferent users. In modern computer systems, the user is limited to specifyin11 rcl1• tive addresses within the program. The final physical addresses are determined by the compiler, the loader, or the operating system. The compiler transforms all user identifiers into binary addre111ie11. If tho program is sufficiently simple, the compiler may be able to make a complete transformation of logical to physical addresses, especially if the program In question contains no concurrent or recursive procedures. i.e .. it i11 Ntrlctly sequential and nonrecursive. In general, the output of the compiler i11 " Ht ur program and data blocks each of which is a sequence of contiguous worc.h, (Pages and segments are special types of blocks, which are discussed in more detail later.) Each word within a block can be identified by a logic1d uddre11 which comprises a base address and relative addre.'i.l' (also called u dh1plm.:e- ment or index). as shown in Fig. 5.32. Address mapping can be completed when the program is first loaded hy MN· signing fixed values to the base address of each block. This is called .1·tt11fr allocation, since the physical address space of the program is fixed for the dunt· tion of its execution. In systems that support recursive or concurrent procedures, the lu11icul space of a program may vary dynamically during execution. For exumple, 1 recursive procedure is typically controlled by a stack containing the linkqc be· tween succe11sivc calls to the procedure. The size of thi11 11tack c1tnnot be predicted hcfure execution, because it depends on the numher of tlme1 the 311 COMPUTU AIICHITIC~TUIUI AND Olll t1 at which the next reference to block Kisto occur; the K to be replaced is the one for which IJ t1 has the maximum value tK. This ideal strategy has been called OPT [ 17]. OPT can be implemented by making two passes through the program. The flr11t is a simulation run to determine the sequence S8 of distinct logical block addresses generated by the program; the sequence is called the block c1clclrc1.,,.v stream or block address trace. The values of t K at each point in time can be computed from S8 and used to construct the optimal sequence SW'T of hlock11 h> be replaced. The second run is the execution run which uses SH"T to 11pcclfy lht blocks to be replaced. OPT is not a practical replacement policy bec11u11e of the cost of the simulation run and the fact that Sn may be extremely Iona, making S~"T very expensive to compute. A practical replacement policy 1ttlempt11 to 01• timate tK using statistics it gathers on the past references to all blockN currently in main memory. Two of the most commonly implemented replacement policies arc ./1,·.tt,(,r .first-out (FIFO) and least recently used (LRU). FIFO selects for repluccment the block least recently loaded into main memory. FIFO has the advant1111c thllt it is easily implemented. A loading sequence number is associated with ettch block in the occupied space list. Each time a block is transferred to or from main memory, the loading sequence numbers are updated. By inspecting these numbers, the operating system can easily determine the oldest (first-in) block. FIFO has the defect, however, that a frequently used block, e.g. one cont11.inin11 a program loop, may be replaced because it is the oldest block. The LRU policy selects for replacement the block that was least recently accessed by the processor. It is based on the very reasonable assumption thtll the least recently used block is the one least likely to be referenced in the f\J. turc. The LRU policy avoids the replacement of frequently used blocks which can occur with FIFO. It is more difficult to implement than FIFO. however. since the operating system must maintain statistics on the number of refercnce11 to all blocks in main memory. LRU can be implemented by associating u hardware or software counter, called an age register, with every hlock in mnin memory. Whenever a block is referenced, its age register is set to u prcdcter• mined positive number. At fixed intervals of time, the age registers of 1111 the blocks are decremented by a fixed amount. The least recently used hlock 111 uny time is the one whose age register contains the smallest numher. Block hit ratio The performance of a replacement policy in a given memory 01·- ganization can he unalyzed using the hlock address stream genernted by u set ,,I' representlltive computations. Let Nt 11nd Nf denote the numher of rcfercnce11 to M I and M 1 , rc111pectlvcly, In the block 1ddr11111trettm. The h/(lt'k hf t '"''" 11• 1!1 defined by H* _ _!:!l - N•+ N* l I which is analogous to the (word) hit ratio H defined by Eq. (5.2). Let n• denote the average number of consecutive word address references within each block. II can be estimated from H* using the following relation: I -H* H=l----n* In a paging system, H* is the page hit ratio. I H*, the page miss ratio, is usually called the page fault probability. Example 5.5: Comparison of replacement policies Consider a paging sys- tem in which main memory has a capacity of three pages. The execution of a program Q requires reference to five distinct pages Pi where i = I. 2. 3. 4. ~. und i is the page address. The page address stream formed by executing "' 114 2 3 2 I 5 2 4 5 3 2 5 2 which means that the first page referenced is P2 , the second is P3, etc. Fig- ure , Jt4 shows the manner in which the pages are assigned to main memo- ry uKing Fl FO, LRU, and the ideal OPT replacement policies. The next hlor.:k to be selected for replacement is marked by an asterisk in the Fl FO 1,1ml LRU cases. It will be observed that LRU recognizes that Pz and P5 are referenced more frequently than other pages, whereas Fl FO does not. Thus Fl FO replaces P 2 twice but LRU does so only once. The highest page hit ratio is achieved by OPT, the lowest by FIFO. The page hit ratio of LRU is quite close to that of OPT, a property which seems to be gener- ally true. Stack replacement policies As discussed in Sec. 5.2.1, the cost and perfor- mam:e of a memory hierarchy can be measured by average cost per bit c and average access time t,i. Equations (5.1) and (5.4) are convenient expressions for c and r11 : C C1S I+ C2S2 S1 + S2 l,4 = tA1 + (I - H)tB (5.1) (5.4) The quantities c, tA, and tB are determined primarily by the memory-device I technologies used for M 1 and M 2• Once these have been chosen, the hit ratio H must be computed for various possible system configurations. The major vari- ables on which H depends are r -·-~·~· MIMftl'V Ola4NIIATION .., 'l'hne I l'•11r 111ldr~•• ~,r~nm FIFO LRU 4 ti N II 10 11 12 2 .I 2 I ~ 2 4 ~ 3 2 5 2 § ~-.·. ~--·. rn· rn· ffi· rn· ffi. ~ ffi ~ ~· 3 3 3 3• 2 2 2 2 • 2 • 5 5 .· . . . }{ I I I • 4 4 4 4 4• 2 . '· ,!,' Hit Hit Hit B 2 • ~ rn rn· w ~ rn· ~ ~ rn· ~· J 3• 3• s s s• s s s• s 5 1ff I I 1° 4 4 4' 2 l 2 Hit Hit Hit 1111 1111 i ~ ~-rn rn~ rn ~ ~ ~ rn rn rn OPT .: .. ·• ,/./ 3 3 3 3 3 3 3 3 3 l'> y;-;:, 1 5 5 5 5 5 5 5 ' Hit Hit Hit Hit 1111 1111 Figure 5.38 Action of three replacement policies on a common address stream. 1. The types of address streams encountered 2. The average block size 3. The capacity of main memory 4. The replacement policy used Simulation is perhaps the most practical technique used for evalualing dif• ferent memory system designs. H is determined for a representative sample ot' address streams, memory technologies, block sizes, memory capacities, and re• placement policies. Figure 5.38 shows a sample point in this simulation r,rm:• ess. In this example, the block address stream, block size, and main-memur~ capacity are fixed, and three different replacement strategies are being lcstcd, Due to the large number of alternatives that exist. the amount of simulation required to optimize the design of a virtual memory system can be very grcut. A number of analytic models for optimizing memory design have been propoMed. Notable among these is a technique called .~tack proce.1·.1·i11!(, which is applicuhlc to paging systems that use a class of replacement algorithms culled stur.:k algorithms [ 17]. Let A be uny page address stream of length L to be processed usinw II re· placement policy N. Let r denote the point in time when the first r I p1t1lCl'4 of A have been processed. Let n be a variable denoting the PUlCC cupm:ity of M 1, -----r,,~ .. -~.- 368 COMPUTER ARCHITECTURE AND ORCJANIZATION 81(n) denotes the set of pages in M 1 at time t, and L1 denotes the number of dis- tinct pages that have been encountered at time t. R is called a stack algorithm if it has the following inclusion property: B1(n) C B1(n + 1) B1(n) = B1(n + 1) if n < Lt if n > L1 LRU retains in M 1 then most recently used pages. Since these are always included in the n+ I most recently used pages, it can be immediately concluded that LRU is a stack algorithm. Many other replacement policies are also of this type. FIFO is a notable exception, however. Consider the following page address stream: A=I 2 3 4 I 2 5 1 2 3 4 5 Figure 5.39 shows how this address stream is processed using FIFO and main- memory capacities of three and four pages. It can be seen that at varioys points of time the conditions for the inclusion property are not satisfied. For ex- ample, when t=7, 87(3)={1, 2, 5} and 87(4)={2, 3, 4, 5}; therefore Bi3) 5 0.58 Figure 5.41 shows a plot of H* against n. It is a general characteristic of stack-replacement algorithms that the hit ratio increases monotonically with the available capacity n of M 1• This is a direct consequence of the inclusion property. If the next page address x is in B 1 (n), ii must also be in B1(n+ I), since B1(n) i;;: Br{n+ l ). Hence if a hit occurs with M I of capacity 11, a hit also occurs when the capacity is increased ton+ I. It might he expected that this is true for all replacement policies, but such is not the CUNC. A" the e,mmplc in Fig. 5.39 shows, increasing n from three to four r11au In u ")'Ntem llNin1tt Fl FO replacement actually reduces the page hit ratio in thh1 i,11rtlculur CI\NC from 0.2~ to 0.17. This appears to be a relatively rare phe- num,non, not 11ccm1·lnt1 for most address streams. l,IA Ntaman&• and P11H ....... II Th, ""'"llci.t Ill' mo11t primitive elements of a computation are in- llt'llullnn 1tnd d"h' wmdN, A well-written program, however, exhibits a substan- 1111 11tmunt ur hh1h-level 11lrncture. A se"luence of instructions may thus be dtftlltd 1111 M 11111hroutlnc ur procedure and be given an appropriate name. Nlmll1rly. r1·tmlllvc d11t11 ltc11111 muy he grouped into lists, arrays, and the like. Wt wlll r1ror to 11rour,111 ol' instructions and data of this general type as modules. A w1ll·11lt't1Cll1t'cd rro1i1rnm has clearly defined modules whose relations with 11no 1m11ther urc nlso clearly defined. Programming languages such as ALGOL which urc "hluck•slructured" have a syntax that yields a high degree of modulurity. 10 l),!I • :i:: ~ (U, .a l 0.4 ll.2 ,, P111, ~·11•dly 11 7 t'laun 5.41 1'111L1c hit ratio vcr~us puiic CII• ri11clty l'ur b11mr,lc Hi. MIMDllV OIOANIZATIDN A module mny be lou"cly reaurdcd "" n numcd 11cqucncc uf 11tutcmcnt11 that 1:tre u11ut1IIY proce1111cd in the Nequcncc in which they urc written by the proarllffl• mer. Compilution trnm,forms this into 1:t machine-cxecut1:tble module with eN• sentially the same se"luentiul orderin11 1u1 the original module: this type of module is termed a segment. Formally, a se~ment is a set of logically relnted contiguous words generated by a compiler or a programmer. It is therefore 11 special type of block in the sense defined in Sec. 5 .2.1. A word in ll segment I" referred to by specifying a base address (the segment address) 1md a relative address or displacement within the segment. The relative addresses 1:tre derived from program identifiers at compilation time. Segment addresses may be 111• signed during program execution by the operating system. Segmentation A program and its data sets can be viewed as a collection ur linked segments. The links arise from the fact that a program segment ma)' u11e 1 or "call," another program or data segment. Since segments cont1:tin logically related words, it seems reasonable to maintain complete segment11 in m1dn memory. A memory management technique that allocates main memury b)' segments is called segmentation. When a segment not currently re11idcnt In main memory is required, the entire segment is transferred from seconduy memory. The physical addresses assigned to the segments are maintained In M memory map called a segment tah/e, which may itself be a relocatable segment. Example 5.7: Segmentation in the Burroughs 85500 [!i] The 85000 famil)' of computers uses segmentation extensively. Each main pro11ram huN HM· sociated with it a segment called its pro81"£1m l'('.fi•rem·<· table (PRT). The PRT is used in part as a segment table. Each segment associated with ll program is defined by a word called a descriptor in the correspondina P RT. A descriptor contains the following information: I. A "presence" bit P which is set to I if the segment in question iN cur• rently assigned to main memory, and to O otherwise 2. A size field Z, which is the number of words in the sc11mcnt 3. An address field S, which is the segment address in main memory II' P = I, or in secondary memory if P 0 A program refers to a word within a segment by specifying the sct1mcnl descriptor word W in its PRT and the displacement /) (called un index In Burroughs literature). Wis fetched and examined by the ccntrul processor, If the presence bit P 0, then an interrupt occurs, and execution of th,: requesting program is suspended while the operating system trnnsfcrs lhc required segment from secondary to main memory. If I' I. the(' PU com- pares the displacement I) to the segment size field Z in the dcscriplor. If D > '/,, I> is invalid and an interrupt occurs. If JJ < Z, the 11ddrcs11 field S of the descriptor is added to the displacement /J. The result. S I /), IN the ah11olutc mhlrcss of the required word in muin rncmo,·y which muy then be IICCCSNed, J,Z C'OMl'IJ'l'l'.11 1'Mt'Hl'tllrll1MI'. ANIJ OMC1.\N11.i\'l'ION The main advantage of segmentation ii~ the J'uct th111 l>lcgment boundarie!I correspond to natural program boundaries. Because of their logical indepen- dence. 11 segment can be changed and recompiled at any time without affectina other segments. Certain properties of programs such as the scope (range of def- inition) of a variable and the access rights to a program are specified by seg• mcnt. These properties require that accesses to segments be checked to protect ngttinst unauthorized use; this protection is most easily implemented when the units of allocation are segments. Certain segment types, most notably stacks und queues, can vary in length during program execution. A segmentation sys- tem can vary the region assigned to such a segment as it expands and contracts, thus efficiently using the available memory space. On the other hand. the fact that segments can be of different length requires a relatively complex main- memory allocation method to avoid excessive fragmentation of main-memory space. This problem can be alleviated by combining segmentation with paging, u~ discussed later. l'ualna Paging systems use fixed-length blocks called pages and assign them to fixed regions of physical memory called page frames. The flowchart of a typical demund r,aging system can be found in Fig. 1.28. The maip advantage of paging 111 1h111 memory allocation is greatly simplified, since an incoming page can be "1111l11ncd lo uny available page frame. Each logical address consists of two 11nrt,c u /)(J!,/1' "ddress and a displacement (often called a line address). The m~mnry rm1p, now called a page table, contains the information shown in Fig. ., .42. When lhc presence bit Pis 1, the page in question is present in main mem- ory, und the page table contains the base address of the page frame to which the r,u1i1c hus been assigned. If P 0, a memory fault called a page fault occurs, and n r,ugc swup ensues. The change bit C is used to specify whether or not the page hns been changed since being loaded into main memory. lf a change has oc- curred. indicated by C = I, the page must be copied onto secondary memory when it is preempted. The page table also contains priority information R used lo select the page to be replaced. Note that unlike a segment table, a page table conluins no block size information. As noted earlier, paging requires a simpler memory allocation system than lfr •lnn·m~nl pnonty ('ha1111cbit~ l'rcsencc hit --i l111KC Paic p 1111111c frmnc (' H A () I 0 (' () i: l I I II I I 0 Fh1un .UZ i\ 1m11r mhlr, Ml1MOMY OMUANlt.i\'rlClN J'P3 scgmcntution. since block size is not u fuctur in pujin11. On the other hunll, pages huve no low.icul significance; they do nut correspond to progrl.im ele• ments. It can he useful to regard segmentation us partitioning the loah:ul address space, while paging partitions the physical address sr,uce I 26 J. The two techniques can also be compared from the point of view of memory frug• mentation. l n systems with segmentation, holes of different sizes tend to pru• liferate throughout main memory; they can be eliminated only by the time-con- suming process of memory compaction (see Sec. 5.2.2). Unusable spuce he• tween occupied regions is called external.fi·agml'ntation. Since page frameN are contiguous, no external fragmentation occurs in paged systems. However, If A words are divided into p n-word pages, and k is not a multiple of 11, the lost page will not be filled. When this page is assigned to a page frame, part of the Pl-kllC frame is empty; this is called intcrnal.fi·agmentation. Paged segments Paging and segmentation can be combined in an 11ttcmi,1 IO gain the advantages of both. This is done by dividing each segment into i,11gc11, A word then has a logical address with three components: I. A segment address 2. A page address 3. A line address The memory map may then consist of a segment table and u set of pugc tuhlc11, one for each segment. The segment table contains for each segment 11ddrcK11 11 pointer to the base of the corresponding r,a11,c luhlc. The (luge lahlc is then 1111"ll in the usual way to determine the required physicul uddrcss. Thi,'j lechnlqu" 111 used in the IBM S/360 Model 67. Page size The page size S 1, has a signilicunt impact on holh ,'j(Ul'lli&I: utlll1,111lon and the memory-access rate. Let us firsl consider lhi: int111c:m.:c: of ,'t,, on tho space-utilization factor II introduced in Sec. 5.2.1. If .\'1, il'I 100 l111'A&t', llXl.'OM11lvo internal fragmentation results; if it is too small. lhc Pll'1C tuhli:H hi:i:0111, vor~ large and tend to reduce space utilization. A good vului: uf .\',, ~hould 1n:hlvv, a balance between these two difficulties. Lei S, denote lhi: IIVl'l'lllll: 111c:11mo111 size in words. If S., P .\,. it can be expected that lhc last PIIAIC IIKl'li1&11c:ll IO 11 segment contains, on the average, S1,/2 words. The size of the p111i1c 111hlr IIN· sociated with each segment is approximately S,/S,, words, 11Ns11min1,1 l'IIL'h enlry in the table is a word. Hence the memory spal.'e ovc1'hl:nd nssrn:inll'll with each segment is ~ _:h I .s, . 2 s,, The physkul memory space utilization 11 ,:an he defined ns II \' "-'-L .. S1 I S 1.;·1 , l ,, ~sr,·,, -o·--- JS. (It S1,I 1.,.6) .... ···m·eoMPUTIW '"""'IA I, ... The npllmum fH'iC sl1..e Sl/1"" muy he c.lcHnecJ "" the v11l11c of,\\, which mu.lmh!c11 11 or. equivltlently. which minimizes S. Dlffcrcntluiln11 S with rcNru.:cl tu.\\,. we obtain JS I S -=- ..::..L JS,, 2 Sf, S is a minimum when dS/dSP = 0, from which it follows that sgn = V2Ss The optimum space utilization u0 PT is given by UOPT (5.7) Figure 5.43 shows the space utilization u defined by Eq. (5.6) plotted against Ss for representative values of S,,. The influence of page size on hit ratio is complex, depending on the pro- gram reference stream and the amount of main-memory space available. Let the logical address space of a program be a sequence of numbers A 11 , A 1 , ••• , A 1 • 1 • Let Ai he the logical address referenced at some point in time: 1md let A 1 Ht he the next address generated, where dis the "distance" be· tween A, und Ai"'' For example. if both addresses point to instructions, A Ha points to the tel I I )st instruction either preceding or following the instruction whm,e lo11icul m.Jdrcss is A,. Let S,, be the page size, and suppose that an ef- llclcmt replacement policy such as LRU is being used. The probability of A i+•t hcina,. in M I is high if one of the following conditions is satisfied. I. dis small compared with S1,, so that Ai and Ai+d are in the same page P. The probability of this being true increases with the page size. 2. dis large relative to Sv, but A1+d is associated with a set of words that are frequently referenced. A1+<1 is therefore likely to be in a page P' =I= P which is also in M 1 • The probability of this being true tends to increase with the I.Or------- "' 0.9 C .s -; @ 0.8 ::, 1l "' 0. "' 0.61 SP =-/'fs; . Segment sizes,, in K-words Flaure 5.43 lntlucm:c of page size S,. and .~c11mcnl sill· S, on .~pm;t.' utilirntinn 11 . I.() flit ratln II Page size::,,, ~----- .. "''"'IWIIT VRUl'll'll•PITIVl'I ..... Figure 5.44 I ntlucncc ul' i,11111.i NIJ.o ,\'~ on hit ratio II. number of pages that can be stored in M 1; it therefore tends 10 dccret11te with the size of S,,. Thus H is influenced by two opposing forces as S,, is varied. The rci1ull h• that when Sp is· small, H increases with S/J' However. when S,, exceed11 u ccrtuln value, H begins to decrease. Figure 5.44 shows some typical curves rclnllna fl and Sp for various main-memory capacities. Simulation studies indicute thut In large systems, the values of S,, yielding lhe maximum hit rn1ios cun he 11,c lttrac as 1024 words, which may be much greater th1rn the "optimum" pu1i1c !'li1.e a,ilvcn by Eq. (5.7). Since high His important in achieving smnll tA (due tu tht rein· tively slow rates at which page swapping lakes pince). vulueN ol' 8 11 thnl mllK• imize H are preferred. The first computer wilh II p1111,in11 sy11tcm tlhc ATI.AS) used a 512-word page. The I BM S/360 Model 67 U!'lell l"l\llCIII of' I024 wm·~111. 5.2.5 File Organization Introduction We now consider the manner in which lnforn1111l11n i111111turtd In th• secondary-memory system of a compulel' I l 6 ]. Sccomlnry-momury dnltJH are characterized by very large storage capucitic~ und rol11tlvely rdnw llUUUM rates. The slow access rates reflect the fact that, in gcneml, 1ho n~i:111111 mml111 or these devices are serial or else semirandom direct 11cccss I whkh mlyhl 1tl111u hi called semiserial). The information stored in secondary mcnmry l,c 11111rnlly grouped into large sets ofrelated items called.file,\', for example. the r111yroll lllo of an organization contains information about all the employees of lhc 01·1111111- zation; their names, addresses, wage rates, etc. Because they 11n: 11s1111lly VNY large, files must be carefully organized so that they can he etnciently 11ccc,c~c,I or modified. Rapid access methods such as random or associative 11~t,1t·cN1o1inw are precluded by the essentially serial access modes of most sccondury storn1i1c units. The individual items in a file are often of varying length; hence such Ille" may not have a simple word-organized structure. A tile cun he viewed as a set of smaller addressable units culled r,11·,ml.v. A record can, in 111m, he further subdivided intojidc/s. Thus inn puyroll lllc. Iha 3'6 t'OM1°lJTI\II Allt'Hl'l'IU"l'tllll1 AN I> OIIUANl1.A'l'ION lnl'mnmtion nhout each cmrloyce muy com1tlt11te II record. The individuul item!! of lnformntion ahout the cmrloyce-his name, uddrc!i!i, wnge rate. etc.-form the flchJs. Every item that has to he accessed independently-the entire file. a record. or u field within a record-must be identified by means of a name called u fry. The key therefore constitutes the logical address of the item in question. As in nn associative memory, the keys are stored with the data they specify. There ure three general ways in which the data can be accessed. 1. The file can he searched sequentially comparing the given key to the stored keys until the desired one is found. This is normally done with tape files which can only he accessed serially. 2. A memory address map, usually called a directory, can be used to map the key onto a physical storage address. This may be used when the memory rcrmits random or semirandom access. l The rhysical address is generated by a simple transformation of keys ac- cordini to some algorithm. Again this is suitable only for random or l'lemirundom ucccss memories. At the f'l'el'lcnt time the most important secondary storage device is proba- hly the mn1i1nctlc-llisk unit. As discussed in Sec. 5.1.3, information is stored in a ~l11k momory hy menns of circular tracks on a set of concentric rotating disks. A 11ft or tn11:k!i with the same radius is referred to as a cylinder. The smallest 1uhh't11111MhlC' 111111 of stornl(e will he assumed to be a record. The length of a 1•11.101·,I tnllY hci ll~cll or v11ri11hle. A track may therefore store a fixed or variable numhtl' or 1·oconl11. Euch record contains a key which is its logical address. It is 1.1onv1nlont to view II disk unit as a two-dimensional storage device with the 1111·m:t111·0 11111i11i1c11kd hy Vil(, 5 .45. A particular track may be accessed randomly hy 11r,ocll'yln1i1 11 cylimler and a track number. The required record is then ac- \!C!il'IC!ll 11erl11lly by scanning the track until a record whose key matches the 1i1lvcn key is fm11ul. Sequentl11I Hle!i The simplest type of file organization requires storing records in some sequence and performing all access and modification operations by scriul rrocessing. The record sequence may be arbitrary, in which case the or- itnnizat ion is called unordered sequential. More commonly, the records are or- Tra<.:ks O I 2 3 4 5 ~ ::H ! ! I ! I ·~ I · · · I ":::t I 113 lJ •·111ur~ !1.4!1 Storn11c uq,11111iw1iun in u n1a1tnctic-Ji~k memory. MMMOIIV OIIUANl,.ATION ffl de red hy u key llnd stored In thllt order; thl!i l!i culled ( ord,•r,•,J) ,\·t•qu,•11tla/ m·· ~a11lwtlo11. Thl!i tyre of orgunization therefore has the structure of a linear lh1t. Since records cannot be randomly accessed in most secondary-memory devices, the access time ofany specific record is relatively long. Sequential flle11 are most useful when all records must be accessed in sequence and proce1111ed. For example, a payroll file is typically processed in this manner to produce em• ployees' paychecks. More generally, records that are processed in batch mode may be organized sequentially. If real-time access to any record is required, then other file organizations with shorter access time are more suitable. A sec• ond disadvantage of sequential organization is that it is necessary to move larae amounts of data in order to make insertions or deletions. This organization h1 therefore inefficient if the file must be updated frequently. Updating can be simplified by leaving gaps between records or groups of records into which in• sertions can be made. Magnetic-tape files are invariably organized sequentially. Magnetic-dl1k files are frequently sequential also. Figure 5.46a shows a possible file org11nlz1t• tion of this kind. The file consists of fixed-length records with the format key, data stored in ascending numerical order determined by the keys. The file is fllled cylinder by cylinder and track by track. Changes are made hy transferrina an 0 011 031 101 i'.2 Oh58! 061 09! 17! " ,:, 5 481 50i >, u 53, l 57: 6J1 0 01• I,,.,, ]v~9i'""' __ .5 48 1 >, u I 57 Tracks 111 211 !61 19• 27 (al Tracks 2 !Ji 2Ji 16i 251 I I. ,,.i 1 ...... 1 .!./i ['II (/>) 231 3(, 40 291 46 47 l . ~-·~·- .~-~· ~J Clwrtluw I rn,·k 'IJI ffflilt ,...-..--... ·-- L________,,- Ov1•11lnw I rn.-k •·1i1urt 1,46 I\ ~1111111•nti11l 1lisk lilc (11) hcforc 111111 th) nl'tcr inscrti11111wn l'l"CUl'ds, entire trnck Into main memory. mRklna the required ch11nyeN, 11ml then wrltln1& the informl:ltlon buck onto the 11ume track. To fucllltntc the insertion of new records, overflow areas arc included in every truck, 11nd an overflow truck iN included in every cylinder. New records may he inserted directly into a track until its overflow area is filled. Further insertions can be made using the overflow track. Instead of inserting an entire record into a track. a short pointer of the form key, overflow track is inserted in the correct position in the logical sequence of records. Figure 5 .46h shows how the records with keys 05 and 3 7 are placed on the disk file when pointers and the overflow track are used. The use of overflow areas per- mits insertions and deletions to be made rapidly without moving large amounts of data. Note that when pointers are introduced, the physical sequence of the records is no longer the same as the logical sequence. The file must be r,rocessed from time to time to eliminate the pointers and restore the records to their r,roper physical sequence. Indexed sequential files A popular method for organizing 1 disk files so that indi- vi~unl items can be accessed rapidly is called the indexed sequential organiza- tion. It is characterized by the use of directories or indexes which specify the locution of each record. The records are ordered sequentially by key as in 11cqm:ntial lile organization, so that records can also be accessed sequentially without using the directories. The directories, however, permit rapid semiran- llom m:ccss. The simplest kind of directory stores the entire physical address, specified hy u disk-unit number, a cylinder number, and a track number for every record. It thus consists of a list of items of the form record key, physical address A record is located by sequentially searching the directory until a matching key is found. If the number of records is very large, this type of directory is imprac- tical due to both its size and the time required to search it. ln such cases, sever- al levels of directories are employed where each directory provides a part of the r,hysical address. Figure 5.47 shows a typical three-level directory system. The first directo· ry. called the unit directory, indicates the disk unit where each record can be found. Each disk unit stores a cylinder directory that specifies the cylinder !lllllrcss of every record in the unit. Finally, each cylinder contains a track di- rc1.:lory indicating the track addresses of every record in the cylinder. The size of the directories can be minimized by storing only the highest record key as- sm:iatcd with each level of storage (unit, cylinder, or track). This is done in Fig. ~.47. The unit directory contains one entry for each unit. It specifies, for ex- ample, that unit O stores all records with keys between O and 9999, while unit I stores all records with keys between 10,000 and 17.426. The structure of the llighcsl key 9999 Unit Unit Liirectory - Highest ...--.. C'ylinder key 0042 0 I--- 0101 I =i_ 0157 2 I I I I I . 9999 199 l-- =---1 Highest I- linderl ~-"" !LY 1050 +._ 1_104 1167 ._,.._:::__-J I 17426 ill:!- High,•st I ('ylimh•r key 29113 () 2')177 I 29241 ! - 40100 I l'>tJ CylinJcr Jirc<:torics .. .... .,,,,.,.. I ~-- Jiigure 5.47 Multilevel indexed sequential file directory, MI\MO"'V OIICIANIIATlftN nt 1111111~-· hy 0004 0011 002f4 OOJ~ 0042 lllghest key 0062 006') 0081 0095 0101 lli1dwst h•y ~~!~.'. ,,114q '.'.'~!•L t)ll7 J •1•1•iii 'l'r~~k () I 2 3 4 Truck () I 2 3 4 Tru~k I 1n1'k 1!111•,·torln cylinder and track directories is similar. The physkul uddrcNs of II yivcn riicm'J is found by first examining the unit directory. This r,oint.-. lo the npr,l'oprlutc 1,:yl• inder directory, which in turn points to a track directory. Fore x.111nr,h:. NllJ'IJ'IO"t' that record 0087 is to be accessed. The unit directory indicates thut ii cun he found in unit 0. The cylinder directory for unit O indicates that the record i!4 In cylinder I. Finally, a search through the track directory for cylimlcr I of unit 0 reveals that the desired record is in track 3. Let 1111 • 11.,. and 111 denote the number of units, the numhc1· of cylindcr111 per unit. and the numhcr of tracks per cylinder, respectively. The numher of pUNNI· ble phy11ic11l (tmck) addresses is 11 1111,.111• Using the three-level directory of Fla, • C'UMl'U'fkll AMC'lll'l'l(t'TUIUI ANn ClllUANl1.A'l'ION u ('yllndm I I I I () .. Cylinder directory Track directory Track directory Truclo 2 - .. 4 Track directory 199 I I I I lllaun !l.4M Storage of file directories in a disk memory unit. ~.47, we sec that the maximum number of directory entries that must be ,umnncd to locate a record is nu + n, + n1. The various directories are generally 11lut'1:itl 11lonil with the data they define. Figure 5.48 shows how directories are olhm Mimed within a disk unit. The first track of cylinder O contains the cylin- "''' dln~ctory for that unit. The first item stored in each cylinder is its track di- roclmy. Information in the disk unit is typically processed in the following way: I. The required cylinder directory is read into main memory and searched to idc:ntify the cylinder storing the desired information. 2. The track directory stored in this cylihder is transferred to main memory and searched to determine the required track. ~- The contents of this track are read into main memory where they are searched to obtain the required records. The records can then be r,roccssed. If records are inserted or deleted, the cylinder and track di- rectories are also updated. 4. All modified records and directories are transferred from main memory hack to the disk unit. UnkL'CI 11st flies A linked list is a set of items of information each of which has an address field called a link appended to it. The link contains the address of the next item in the list. By using linked lists a sequence of items can be stored in 11om:onsccutive physical locations. The logical sequence or order of the items is sriccilicd hy the links. Figure 5.49a shows an example ofu linked list in which the logical sequence of the items is A, B, D, and E. A file organization in which the records form a linked list is called u linked-list Hie. I .inks l,&l'C:Hlly simplify the r,rohlem of makin&& in,u:rtiuns and deletions in u MI\MON\' IINIIANltA'l'ION •• A,hlru•• Ihm L111k ,\d,lrr~• llrm Llllk 20 i 20 ,\ Oil ll I 11 H ENI> UI/ I [) I II 09 D II 06 I B I 09 06 B UI 01 I (' I 09 Figure 5.49 A linkc!.l-list file 1111 hcl'urc (al (b) an!.l (hl after inscrtin[( item C'. list. Figure 5.49h shows how a new item C can be inserted into the liNI of Fla. 5.49a so that C is logically positioned between B and D. Only item H In tha original list has to be accessed in order to alter its link. Thus it is not nccu11ur)' to move large amounts of data to make insertions and deletions. On the other hand, extra memory space is needed to store the links. A further dis1tdv11nt11ao of linked-list files is the difficulty of accessing a batch of records thul 11ro logically but not physically contiguous. Random file organizations Many so-called random file organizations huvc hcen devised, where the word random refers to the fuel that there is no appurcnl 10111· cal connection between records. Records with consecutive keys urc not neceli· sarily in consecutive physical locations, nor urc there links hctwcen the records. The physical address/(X) of record Xis determined hy procc11sin11 lt11 key X using some well-defined algorithm. Two techniques for doinl& thiN huvo been distinguished [22]: I. Compression. The physical address is formed hy sclcctin1& 11 lillhliCI of Iha characters forming the key and (possihly) rc111T11n11in11 thc111. An 1rn1mr,lc nl' this approach is truncation of the key. Comr,rcssion I.ii rur1k11h11·ly Mlli11thlo for processing nonnumerical keys. It can he dcsi1tncd to 111·0,hh:e r,hyMk1tl addresses that resemble the keys. 2. Hashing. The key Xis treated as a numhcr und an nrithnu:th.: l\1m:1lnn,1: called a hash function, is used to compute the phy sicul 11ddrc"" /( .\' ). M11ny hash functions have been devised [ I 2]. One technique is lo divide .\' hy 11 constant n and use the remainder as the address. i. c .. ./(X) remainder(X) modulu 11 The number of possible valid record keys is generally much lar1tcr lh1111 the number of physical addresses available. Thus two distinct keys muy yield lht' same value of/L\'l. A good choice of/should generate addrcssl!s lhut nrc JIM· trihutcd uniformly over the availnhle file storage urcu. If cuch 11ddn:s1,mhlc unit of ston111e, c. 11 .. n track. can uccun1mrnl11tc scvernl records. then rccor1.IH wllh the 11nmc nddrc.'is arc 11ssi1tned sc411cnli11lly to consl!rntivc lm.:11tio1111 within the JH ('OMPUnlR AIU'tllTIU'T\JIUi ANI> ONOANl1.A'l'ION truck. Once the truck is filled. suhscqucnt items with the sumc uddress must he 1111signcd to 11n ovcitlow urcu. The function fwhich is used to determine where a record is written in a file is ulso used to retrieve that record, thus eliminating the need for directories. This type of addressing is very suitable for large files that must be accessed ran- domly in real time. Because there is no logical sequence among the records, hutch processing of randomly organized files is difficult. 5.J HIGH-SPEED MEMORIES A mt\ior problem in achieving high-speed computation is the disparity in opcmting speeds between processors (CPUs, IOPs, etc.) and main memory M. Typicully the cycle time of M is greater by a factor of 5 or so than the. cycle time of ll processor. Furthermore, several memory words may be required dur- in& H processor cycle. To eliminate processor idle time while waiting for memo- ry necc11ses to he completed, special measures may be taken to increase the ef- l'cetlvc processor-memory interface bandwidth. Several methods are possible: I, l)ocrc1uu: the memory access time by using a faster (and more expensive) to1o1hnolo11y for main memory. 2. lJ H M h111111:r 111c111ory word. ~. ln11111·1 11 1'11!111 cm:hc memory between the processor and main memory to 1·,J11ct1 IICCCN!II time. 4, Accu11 more 1h11n one word during each memory cycle. Tha Llc111l1111 of c11chc memories is the topic of Sec. 5 .3.2. We first describe a mommy Ol'/i&lllli1.111ion thut permits more than one word to be accessed per memory cycle. !I.J. I Interleaved Memories In order tu carry out m independent accesses simultaneously, main memory 11111st he partitioned into m separate modules or memory hanks M0• M., ... , Mm "us shown in Fig. 5.50. Each module must be provided with its own ad- dressing cin.:uitry. If the physical buses between the processors and memory 11111st he shared by the modules, an appropriate bus-control mechanism must also he introduced. Finally, sufficient buffer storage must be included in the p1·m·cssors tu accommodate the increased flow of information. A modular memory organization is particularly useful in systems where several processors require 11ccess lo a common memory, e. g., in the Burroughs B5000 system shown in Fig. 1.26. Different processors can access memory simultaneously provided that they reference separate modules. In order for this memory organization to be utilized efficiently. the memory references generated hy the processors should be distrihuted evenly among the CPU IOP Memory swit~hing network MtlMnu OIUIANl:l'.A1'ION ., rn M1 M2 Mm I ngure 5.50 A modular memory orl111nl1.111ion. m modules. Ideally, every member of a set of m consecutive memory refer• ences should address a separate module. Any set of m or fewer memory au:• cesses, no two of which refer to the same module, can be carried out 11lmultta• neously. The processor-memory interface bandwidth is then m word11 per mom• ory cycle. Address interleaving Let X0, X1, ••• , X11:-i be k words that are known (or U• pected) to be required in sequence by a processor, for example, k com1ccutlv1 instructions in a program. They will normally be assigned k consecutive phy1d· cal addresses A 0, A 1, ••.• A/(-I in main memory. The following rule cun be employed to distribute these addresses among the memory modules. lnterleavin~ rule. Assign address A1 to module MJ ifj:::::: i (modulo ml. Thus A0, Am, A2m, ... are assigned to Mo: A1, Am, 1, A~,n 1 1,,., lll'e UN• signed to M1, etc. This technique for distrihutin& uddrcsMeM umnna mamory modules is termed interleaving. The interlenvinu or nddrcmu ttmnn11 m modules is called m-way interleaving. It is convenient to m11ko 111. tho numh1r of modules, a power of 2. say m = 21'. Then the lcu11t Nl"nllh:unt 11 hlt11 of t\llrY (binary) address immediately identify the module to which tho 1u,hlr1u belongs. The efficiency with which any interleuvcd memory 11y111cm ctrn ht 11111d I• highly dependent on the order in which memory uddrcsscs lll'C 11cnor1111d; lhl111 order is clearly determined by the programs hcin1it cxccutcll. II' two nr mlll't addresses require simultaneous access to the same module, then therr l11111111hl In be memory interference or contention. The memory accesses in LlUC'Ntlon l.'1111• not be executed simultaneously. In the worst case, if all uddrcsscs l'efcl' lo the: same module, the advantages of memory modularity are entirely lust. Interleaving is frequently applied to program addresses in order to inc reuse the rate at which instructions can be fetched from memory. The instructions ol' a program are normally assigned consecutive addresses and cxcculed in the sequence in which they are written. Only program control instructions such 1111 branch instructions cause a deviation in the execution sequence. Since the pro• portion of instl'lu.:tions that result in hrunching is uften 11m0II, the CPU c11n rea.11,muhly mommc thut the current instruction will he followed hy the inNlru~· .JM ('OMl1llTl1.M !\Ml'tllTIJl'TllMI' !\NI> 0Mti1'Nll.1'TION tiom1 in the ne,c.t consecutive instruction addreNfilCS, Thus the CPU can fetch in- Mtructions in advance and store them in 1:1n instruction buffer. With m-way in- tcrleuving m consecutive instructions can be fetched during one memory cycle. A performance model A model for measuring the efficiency of interleaved memory systems has been studied by Burnett and Coffman [ 4] based on earlier work hy Hellerman [ 11 J. It is assumed that there are m independent memory modules and that the CPU maintains a "request queue" of addresses A 1, A 1 , • .• , Aq that it needs to access. Before each memory cycle, the request queue is scanned and a sequence called the request sequence A 1, A2, ••• , Ak of ~ '.': m addresses is selected from the head of the queue. A 1• A 2 , ••• , A k is the longest sequence with the property that no two of its members are in the same module. During the next memory cycle, these addresses are used to carry out k simultaneous memory accesses. The efficiency of this system clearly depends on the average length of the request sequences, the closer to m the better. Following Ref. 4, let p(k) denote the r,rohahility density function of the request sequence lengths, where k I, 2., . . , 111. The mean value of k is denoted by bm and is given by the equation m hm = L kp(k) (5.8) k=l 11 111 I" lhl! nveruge number of words that can be accessed per cycle and therefore ro11re111l!nts the memory bandwidth. Sin1:c p(/...) depends on the programs being executed, p(k) and hm may be dif- fkult to determine. Consider the case where the request queue contains only in- Mlruction addresses. An instruction request queue can be characterized by the l11·wwlti11i,: probability A_, defined as the probability that any given instruction cnuscs a jump to a nonconsecutive address. p( k) can then be defined as follows: p(l) A_ p(k) (1-A)"- 1A p(m) = ( 1 - A)"i-1 Substituting into Eq. (5.8), we obtain for I < k < m h 111 = A + 2( 1-A) A_+ 3( l -A)2 A_+ ... + m( 1-A)'" 1 II can easily he shown by induction on m, that (5.9) implies that 111 - I hm L (I -A); l=O which is a simple geometric progression. Hence I ,. ( I A) 111 hm=-··- A (5.9) ! 5. IO) Ill "' .. , ..,.,, ............... ~ .. .. 14 12 .,,E tol- \m 16 .c: .; ·; -a § 8 ,,, c 0 E 0) ::.: 6 4 0 0.2 0.4 0./l Branchin~ prnhahilit y ~ MIi.MORY ORUANIJ.A'flON •• 11111ur. ,.,, ~h111111·~ l111mlwhl1h 11,,. M- " l\111,·111111111' hrn11,·h1t11111111hNhll11v ;, 1111111 Ill Wll\' llll"i l,N ~fll lll,111111 ~. It can be seen from (5.10) thnt 11 111 I whon It. ... I, whll1 l1,.. .. 111 1 111 fflH• imum value, when A= o. Figure ,JI MhllWN h,,. f'IIUlttll IOI It runullon ur A,., several values of m. These i:urves inJiculc: thul It. nu111I ht llhtllf ht -,Nt f'Dr .. IO approach its maximum value. 5.3.2 Cache Memories A cache is a small fast memory placed het ween II 111'111:fUIII' 11nd m11in fflffflUrr as illustrated in Fig. 5.29h. The cache is then the 1'1111tc111 i:mnr,nntnt In lht memory hierarchy. It can be viewed as a buffer memory for· lht: nmln nu:iinnt'Y• so that the cache M1 and main memory M 2 fbrm II two-lcvd hlcrnri:hy. II h1 common to manage the three-level system of Fig. 5.29h us two lur11dy lmlrl'tn, dent two-level hierarchies (M 1, Mz) and (M2, M 3 ). The muin sccomlury "Y"lcm (M2, M3) may be organized as a virtual memory system of the 1y11e e1t11mi11cll in Sec. 5.2. The cache-main-memory system (Mi, M2) is organized ulonH c~11cn- tially similar lines: hence (M1, M2) has many of the propertic:, of u virtual metn· ory system. The performance goal of adding a cache memory to a comriutcr is to muke the uveruijc memory access time IA seen by the 11rocessur us close us pmu1ihle to thut of the cuchc rA,· To achieve lhiM, 11 hilth r,cn:cntu1:1e of nil memm·y refer• 116 C~OMl'U'f'U AR<'HITIC'TUIIII AND ORUANIZATION Two-lc,vcl hh:nari:hy (M1 .,M1) Typical 11cccN~ lime rntim IA,/IA, .. , Memory m11nogemen1 IIYNICffl Typic11l pugc size Accc1111 of processor to NCCOnll lcvcl M1 C11chc m11ln memory (M,, Mw) 5/ I Implemented by special hardware 1 to 16 words Usually has direct access to M2 M•ln •cconllury memory {M1, M.J 1000/1 Mainly implemented by software 16 to 1024 words All access to M3 is viaM., Fl1ure !l,!12 Main differences between cache-main-memory and main-secondary-memory hierar- chic~. cnces should be satisfied by the cache, i. e .• the cache hit ratio should be close to I. This is possible because of the locality of reference property of programs. Although small buffer memories were used to prefetch instructions in some early machines, cache memories did not become economically feasible until the 111.lvent of LSI semiconductor memories in the late 1960s. Historically, therefore, virtual memories preceded cache memories. Many of the techniques dc'iiiincd for virtual memory management have been applied to cache systems 11nd huvc stimulated their development. Pl1tlnctlve features of cache memories There are some important differences hc,twc,cn lhc cache-main-memory hierarchy (M 1, M 2) and the main-secondary~ m111u11·y hierarchy (M 2, M 3); these differences are summarized in Fig. 5.52. H1~1111,,u: it is higher in the memory hierarchy, the pair (M 1, M2 ) functions at a hl11hcr 11r,eed than (M 2, M 3). The access time ratio tA,JtA 1 is typically IO or less, whllc I A,/ 1,i, is typically 100 or more. These differences in operating 111r,cet.l require (M 1, M 2 ) to be managed by high-speed logic circuits rather than 1101'wure routines. (M2, M3), on the other hand, is controlled mainly by pro- j.rnms within the operating system. Thus while the (M 2 , M 3) hierarchy may be trunspurent to the applications programmer but visible to the systems program- mer, ( M 1, M 2 ) is transparent to both. Another important difference is in the hlock size used. Communication within (M 1, M 2) is invariably paged, and the r,nge size is much smaller than that used in (M 2, M 3). Finally, we note that the processor generally has direct access to both M1 and M2, so that Fig. 5.53 is a more uccurate representation of the logical data paths in a cache system. For lhis reason, cache memories have also been called "look-aside" memories. Main memory M1 t'l11ure ,.5J 1)11111 nuw Inn rnrhc-hnscd ~ystcm. MtlMOIII\' OMUANlt.A'rlON Jl7 Example S.81 The eaehe 1y1tem of tht D1t1 General Corporation ECLIPSE computer RCLIPSH hi II acnerul-purr,mu, muchinc with 1-1 16-blt word 8lz.e which w1-1s introduced in the early 197011. II cont1tin11 u cache memory that uses bipolar semiconductor technology with lln access time of 200 ns. l\1tlln memory uses MOS technology with a 700-ns access time. The memory system is modular; each 8 K module M2 of main memory has 11 16-word cache MI associated with it, as depicted in Fig. 5.54. The cuche is divided into four pages of four words each. The addresses assigned to the cache 11re stored in an associative or content-addressable memory CAM. When 11 memory request is generated by the CPU, the address of the required word is sent to the CAM. If the CAM indicates that the requested word W h1 In the cache, Wis transmitted from the cache to the CPU. If Wis not In M1, its address is sent to M 2 from which Wis transmitted to the CPU. At th1 same time, a page of four consecutive words that includes Wis !lent to M1, where it replaces lhe least recently used page in M 1• The LRU 11laorlthm I• implemented by special circuits which constantly monitor the cache u11q1, Address mapping methods The addresses assigned to a cache are maintlllncd In a hardware-implemented memory map. The fastest and most general type or map uses an associative memory which permits any word in muin memory to be assigned to any location in the cache. Associative memories are expen11lvc, so that a number of more economical memory mapping methods huvc heen proposed and implemented [2, 6]. One of the simplest memory mapping techniques is c11llcd tlir,•,·t mt1ppin11. Let M1 be divided into S 1 = 2,,. regions M,(O), M1(1) ..... M1(S 1 I). cuch or which stores a block of 11 consecutive words. Muin memory M, h1 11lmlhtrly divided into one-block regions M2(0), M1( I), ...• M1(.\' 11 I). E11ch region M1(/) in M2 is mapped onto a fixed region M 1(i) in M 1• The mldrc1111 j h1 t.lc111rmln1iJ from i by the rule M~mory address bus LRU monitor Associative memory map Cad1e I I I ...... 1--1 M1 M<'niory -·--·-.. -dutu hu~ j = i (modulo .\' 1) Main memory M2 fo'l1111r,• ll,!14 l'11d111 ~Y~lclll of lhc 11( 'IIPSII, rnmr11t~r. ._....,...,.. ""'>'' _,.,., -c•~•--- 311 l'OMl'll'l'l1M ,\11,C'Hll'Mt"l'lJMI'. .t.Nll OMCl.t\Nl1.A1'10N For c:iiumr,le. if S 1 == 2. every even block in M1 IN mar,r,cd onto M 1(0), while every odd block in M 2 is mapped onto M 1(1). The hardware needed to imr,lcment direct mapping is very simple. The low-order k bits of each block nddress identify the cache region that may contain the block. These k bits therefore constitute the cache address. lfthere are 21' words per block, then the low-order k+p bits of each effective address generated by the CPU constitute the cache address of the word in question. Thus unlike associative mapping, there is no need to store the addresses of items in the cache. A disadvantage of direct mapping is that the cache hit ratio drops sharply if two or more frc4ucntly used blocks happen to map onto the same region in the cache. This rmssihility is minimized by the fact that such blocks are relatively far apart in the logical address space. There is a more general mapping method called set ussociarive which includes pure associative and direct maping as special cases. As in direct map- pinii. hlocks in main memory are grouped into equivalence classes determined hy their addresses. M2(i) and M2(j) are in the same equivalence class E if i=J (modulo S;). The cache is divided into s; regions M~(O), M~(l), ... , Mns;-- I) ~nllcd ,1·,•t.1\ each of which accommodates 28 blocks. A block M2(i) in M2 is mnpr,ed into the set M!(h) satisfying i = h (modulo Si). Each set M;(h) is con- ll'ollcd hy II small associative memory, so that mapping wfthin each set is as- ,m~lntlvc. This permits several members of the same equivalence class E to be 111111re,l l11 th,: cuchc simultaneously, which is not possible with direct mapping. St11 n,0111clulivc mapping reduces to direct mapping when the set size 2·' I; it 1·,nh1\.'c~ lo p111·c associative when 28 = S1• the number of blocks in the cache. lntommliutc set sizes lead to mapping methods requiring an intermediate 1un1111111 111' nssociutive hardware. llpdalln11 m11ln memory As noted earlier, the CPU generally has direct physi- cnl 111.:c~ss to main memory, so that when a required item is not found in the cnchc. the CPU can access it in main memory with minimal delay. The exis- t,:ncc of direct communication links between the CPU and M 2 permits some different schemes forupdating M2 when information in M1 is altered. The same swupping approach used in virtual memory sytems can be employed. An al- tered "dirty" block in the cache is tagged; when the block is to be replaced, it is lirst cupicd into main memory. A different approach is to write all information into hoth M, and M2 even when the address in question is currently assigned to M 1 • This policy, called write through, is easily implemented and ensures that Mi never contains obsolete information. This is a useful feature when M2 is hcing shared by several independent processors. On the other hand, write thrnugh results in many "unnecessary" writes to M2 when the information in (Jllcslion is also written into M1• !.3.3 Associative Memories lnlroducllon Cunsider the tuhlc shown in Pii. ,J.~ which is to he stored in a computer's memory. It con!'lists of u list of recun.1111, ,:11ch contuining three N11n1r J. Smilh J, Hond A. Jones R. Roe J. Doc II> numher 124 007 106 002 009 Alll' 24 40 .rn 19 2N Mfl,MIIMY OIIUANl1.A'l'UIN JII ~·1i1ure ! . .55 i\ lahlc to hL• stored in II comr,111~1··~ mem• ory. (major) subfields: a person's name, an identification numhcr. und un RiC, Mo11t information storage and retrieval problems involve 1:1ccessing ccrtuln 11uhtlolcJ11 within a set of one or more records in answer lo qucs1ion11 MUCh 1tN: "Whot Mrl A. Jones's ID number and age?" If a conventional random ucccNN momory I" being used. it is necessary to specify exactly the r,hy11h;11l 11ddrtm or lhl June• entry in the table, e. g,, by the instruction READ ROW~ The address ROW 3 has no logical relatiom1hir, tu J11no11; han1J1 II Win bl viewed as an artificial construct which iuJds lo r,ru11r11mmln111:ompl111Uy. An alternative approach is to search the entire tnhlo 11111lna Iha n1un1 n,ld 11 an address. In such a system, the rc4uest for the: .Junc:111 dnh1 wuulll ht In th, form of an instruction such as READ NAME A. JONES The conventional method of implementing this IIJ'J'l'Ouch involve"' 11c11nnin111tll entries in the table sequentially and comparing their NAME fields to th,: ulvcn address. A. JONES. until a match is found. Sequential scarchinl!t of this tyr,c 111 easily implemented with serial•access memories. hul it is very slow. An 1111• sociative memory eliminates this difficulty by simultaneously exumining 1111 en- tries in the table and selecting the one that matches the given address. It IN clearly useful to be able to select other fields of the record to use us an uddreu. For example. READ ID 106 uses the ID number field as an address and also accesses the Jones entry in the table. In general, an associaril'e memory is one in which any sturcli item cnn h~ accessed directly by using the contents of the item in question. generally some specified subfield, as an address. Associative memuries arc also commonly known as co111e11t addres.rnh/e 111emorie.1. The subfield chosen to address th,: rremory is culled the key. Items stored in an associative memory cun he viewed as having the formal KEY.DATA where KEY i11 11ddn:ss and DATA is the infornmtion to he 11ccesscl.l. For c,c. ample, If II J'lllill' tnhlc ot' the kind 11hown in 1-i11, .• \42 is r,l11ccd in nn 1111N11ci11tivc, 90 l'OMl•tl'l'IJM Al(l'lll'l'lll''l'llMI'. ANO OMIIANl:UTION ln1n11 Muwk r~~istcr Stnrni,w ,·ell urruy K,•y Output register Output Match Select Se led I circuit Figure 5.56 The structure of a word-organized associative memory. memory. the page name forms the key; and the page frame, presence bit, etc .• form the dnta. Word•nr11unl·il'd aHsociative memory Figure 5.56 shows the structure of a i.hnl'I, 1111N11ci11tivc memory. Each unit of stored information is a fixed-length wmd, t\ny suhllcld of the word may be chosen as the key. The desired key is lfl&U:11\,d hy the 11wJk reJ.:ister. The key is compared simultaneously with all 11I01'1Hf words: those which match the key emit a match signal which enters a ,¥111,•rt l'll'rnlr. The select circuit enables the data field to be accessed. If several cn1rh~11 hnve the sume key, then the select circuit determines which data field is to he rc1ul out: it may, for example, read out all matching entries in some prede· termined order. Since all words in the memory are required to compare their keys with the input key simultaneously, each must have its own match circuir. The mulch and select circuits make associative memories much more complex 11ml expensive than conventional memories. The advent of LSI techniques has mndc associutive memories economically feasible. However, cost consider- ations still limit them to applications where a relatively small amount of infor- mution must be accessed very rapidly, e. g .. memory address mapping. The logic circuit for a 1-bit associative memory cell is shown in Fig. 5 .57 I 11, It comprises a flip-flop, a match circuit for comparing the flip-flop contents lo nn external data bit, and circuits for reading from and writing into the cell. The output match line M is normally I. When the input enable line l E is set lo I, the input data hit ID is compared with the bit stored in the cell; if they agree. M rcrnuins at I. otherwise M becomes 0. The hit on ID can be written into the flip.flop by applying 1 to hoth IE and the write enahle tine WE. The select line (mJJrcss line) S is used to select the 1.:cll for either reading or writing. Cells of this type cun easily he nsi1cmhlcd into urmys. Figure ~Jk shows an associative memory with u c11rn1clt y oft wo 2-hit words which 111 constructed from four such L __ WF -- IE~ ID (a) S WE IE OD lU M (bl Ml•,MOltV OltllANII.A'l'ION 391 ----OD S = Select (address) WE Write enable IE Input enable ID= Input data OD Output data M "'Match M Figure 5.57 A I-bit associative memory cell: (a) logic diagram; (b) symbol. cells. Note that each column of this array represents a word, and that the M lines of all cells in the same word are connected by a wired-AND gate. In many applications, the key assigned to every word is unique. Any at- tempt to address the memory results in either no match or a single match in- dicated by a single M line in the I state. If the keys are not unique, then a mul- tiple match may occur. Note that a multiple match could result from an error in the system, even when all keys are intended to be unique. An important func- tion of the select circuit of Fig. 5.56 is to resolve multiple matches. It can do su by storing all match signals in a register, then scanning this register to access the matching words one by one. Associative memory for variable-length data We now describe a hypothcticul memory organization for associative storage and retrieval of variuhle-lcngth data based on a design by Lee and Paull [ 15]. The memory can he viewed 11s 11 one-dimensionnl array of complex associative storage cells as shown in Fig. 5.~9. R11ch cell is connected ton common set of dutu and control lines. In uddi· tinn, u cell cun 1.:ommunicatc with itll left und rittht nci1i1hhors. lnform11tlon h1 stored In koy-lfntn form, whc:rc hoth the key nnll its nssociutcd Jntu urc of vurl• JH C'UMl'll'rkM AM('Hl1'tlt'TUMII. ANO OIIUANl1,A'rl0N rl{mo 1 IE 1 So Wo,o A1hlr~R• ftom lll!h,ct circuit S1 WE W1,o OD0 { 11>11 I !l Input !11tu 1111 · 11'"··- , I Wo, l W1,1 Word j Word 0 I 2 2 Mo L-.....-----.. (a) 2 Mat ell signals to select circuit S WE 2 IB OD~ ID M2 (b) Mi 11111un• !1.58 A 2 X 2 bit associative memory array: (a) logic diagram; (h) symbol. I --001 uhlc length and can be positioned anywhere within the cellular array. Each cell is cupahle of transferring data to and from the data bus in response to external control signals placed on the common control bus. The source of these control signals is a small processor which executes (micro) programs that specify the uc11ired storage or retrieval operation. The cell army muy he of arbitrary length; hence it is not possihle to provide individual ma.Leh lini:11 which would immediutcly pinpoint matching cells. All ·iit ~ ;; Q, ; 0 0 r"' Control program Output buffer Control processor ('rll I I +- 1 'Pll 1 control MIIMIIM'I' OMCIANl:I.A'l'ION Bl ('l'II I + 1- 1- - -- -- - 1 I 11 • } n.,. hll~ --- MnMI -·----· } ,.,11 -_ -____ - l!Otllfof ..• r, ... ~ un .. Figure 5.59 Associfltive memory for variable-length strings. cell match lines are combined into a single line which tells the processor whcm, but not where, a match occurs. Two special flip-flops M and C are incluJcd In each cell. They are used to mark cells so that read and write operations cun he localized within the memory array. In addition, every cell contains a dut11 rcllh1· ter S used to store a single alphanumeric symbol. The instructions executed by the control processor have the format operation (condition) The condition field of an instruction specifies the values that C aml M must have in order for the operation to be performed. For example, write A (C •= I) instructs all cells having C = I to store A in their S registers. An unconditionul instruction such as write A is executed by every cell in the memory, Figure 5.60 lists the operations per• formed by the cells and the control processor. The "specified cells" arc those that satisfy the condition, if any, appended to the opernliLm. In order to store or retrieve data, an appropriate program for the control processor must he written. Consider the prohlem of reading a spccilk Liaw item from the memory. Assume that information is stored in the formal ... # KEY I $DATA I# KFY2$DATA2 # ... where the symhols ;, und $ act 1111 delimiters. The tziven key i~ applied clmrncter hy characlt?r to the input llntn hus 11nd the nmtch instruction 11s issued, The C' IN ('OMPUTIR UC'tllTIIC'TURI! ,\ND ni.n.ui11 •• ln•tmcthm l)'llll lniiut- outr,ut < 'ell 11111rkin11 111'111&1'11111 l.'111111'111 Oiicmliun write X wrlteb read match X (' .... a M<-a rlghtf .... a leftc • a rlghtm ~·· a leRm <-a go to LABEL If X then go to LABEL Ir MATCH then go to I i\HEI. 111111 DCNCl'ijltion Transfor Nymhol X 111 S n.:lliNter.~ of the siiecitled cells. Transfer contents of output hulfer to S registllrs of the specified cell~ Transfer S register contents of the specified cells lo output huffer. (A meaningful instruction only if all specified S registers store the same data) Place symbol X on input data bus; set M = I in every specified cell where S = X Set C to u in every specified cell Set M to u in every specified cell Set C to ll in cell to right of every specified cell Set C to u in cell to left of every specified cell Set M to a in cell to right of every specified cell Set M to a in cell to left of every specified cell Unconditional branch to instruction LABEL Conditional branch; compare symbol X lo output buffer contents: if they match go to LABEL, otherwise go to next consecutive instruction Conditional branch; if M I in some cell go to LABEL, otherwise go to next consecutive instruction Stop program execution -~---·n--------------------- l"l.11r• ~.611 ltt-!rn~·tinn ~cl for the associative memory of Fig. 5.59. lllp•llop in the cell to the right of each matching cell is marked using the instruc- t Inn rigbtc - I (M I) The next match instruction has the condition C I appended to it, which re- stricts matching cells to those that lie on the right of a previous match. In this way a complete key can be matched and its data field uniquely marked. Figure ~.61 shows a program implementing this approach where the key is AZ. The frequently used sequence of three instructions that sets C to I in the right m:ighhor of a matching cell has been defined to be the macroinstruction MARKRIUHTC. The state of the memory after the execution of the first few ins I ructions in this program is shown in Fig. 5.62. For clarity, M and C cells in the O state are shown blank. It will be observed that initially many cells are marked by the presence of I in their M or C flip-flops. The number of marked cells diminishes rapidly as the key is read in. Ultimately, if the key is unique, only one cell is marked and readout can commence. A number of practical limitations of this system can readily be discerned. The very great cell complexity probably makes its cust prohibitive. The man- usement of space in such a memory is also difficult. for example. items no lonscr ne..:ded result in gups of varying lengths uppcurin1t in the memory. Local· in111mch ;up!! for rcu10: is not easy. An obviou• •olutiun is to eliminate gaps by l.uc111lon hulrui:llun SEAKC'H: m11rti I (' - 0 rlahtc ,._ I IM-11 M-o match A I(',~ I) MARKRIGHTC match Z (C"'I) MARKRIGHTC match$ (C=I) MARKRIGHTC READ: read (C=I) If# then go to END M .... I (C=I) MARKRIGHTC go to READ END: halt C'omn""' M1uk ht11lnnln11 ur koy } MitcrnlnNlructlon MAKKKl 1hi.1 of 1he 1'Hlo111 mgmury M1• A ¥lrtu11I memory 1111 hlerurchlcul memory • .11encmlly of 1wo l1,wol11, which Is munu1ed hy un ,,ricr1t1in1 Nystcm lo uppeur like: u Nln,11lc lurae memory to the 11pplicu1l,>n11 pro11rummcr, This iN achieved hy 11u1omu1lc11lly trun11f'cr·rin11 hlock11 of informalion between M 1 und the other levels of the hierurchy, The loculity of reference pl'Operty of programs makes it possible to ensure lhu1 dula is gencrnlly in MI whan referenced by a processor. A fundamental measure of the pcrformuncc of 11 vlr• tual memory system is the hit ratio, the fraction of all memory referen1.:e11 th11t are satisfied by M ,. The performance of a virtual memory is highly pru11rllm• dependent: hence simulation is the principal design tool. Memory space is generally a limited resource of a computer and lhercfura must usually be shared by different programs. Dynamic allocation me11nN vory• ing lhe regions of memory assigned to programs while they are bcin11 uec:utad, Main-memory space is allocated in contiguous blocks. Two ullocl\llon 1ac:h• niques have been distinguished: nonpreemptive and preemptive. NunprCllfflP• tive methods assign space to incoming blocks only if an availahlc re11iun uf 11uf• ficient size exists. Best-fit and first-fit are two possible allocation method11 ur this type. Preemptive methods can assign incoming blocks to occupied rc11lon1 of M I and thereby permit more efficient use of memory space. Block11 to bo preempted are selected according to a replacement policy. Lcust recently u1ed (LRU) and first-in first-out (FIFO) are perhaps the most widely UNCopy with citationCopy as parenthetical citation