Ex Parte Bodin et alDownload PDFPatent Trial and Appeal BoardNov 26, 201211266559 (P.T.A.B. Nov. 26, 2012) Copy Citation UNITED STATES PATENT AND TRADEMARK OFFICE ____________ BEFORE THE PATENT TRIAL AND APPEAL BOARD ____________ Ex parte WILLIAM K. BODIN, DAVID JARAMILLO, JERRY W. REDMAN, and DERRAL C. THORSON ____________ Appeal 2010-007040 Application 11/266,559 Technology Center 2600 ____________ Before JOHN A. JEFFERY, DENISE M. POTHIER, and JENNIFER L. McKEOWN, Administrative Patent Judges. JEFFERY, Administrative Patent Judge. DECISION ON APPEAL Appellants appeal under 35 U.S.C. § 134(a) from the Examiner’s rejection of claims 1, 25, and 26. We have jurisdiction under 35 U.S.C. § 6(b). We affirm. STATEMENT OF THE CASE Appellants’ invention dynamically adjusts prosody for voice-rendering synthesized data. See generally Abstract. Claim 1 is illustrative: Appeal 2010-007040 Application 11/266,559 2 1. A computer-implemented method for voice-rendering synthesized data comprising: retrieving synthesized data to be voice rendered; identifying, for the synthesized data to be voice rendered, a particular prosody setting including determining current voice characteristics of the user and selecting the particular prosody setting in dependence upon the current voice characteristics of the user; determining, in dependence upon the synthesized data to be voice rendered and the context information for the context in which the synthesized data is to be voice rendered, a section of the synthesized data to be rendered including determining the context information for the context in which the synthesized data is to be voice rendered, identifying in dependence upon the context information a section length, and selecting a section of the synthesized data to be rendered in dependence upon the identified section length; wherein identifying in dependence upon the context information a section length further comprises: identifying in dependence upon the context information a rendering time; and determining a section length to be rendered in dependence upon the prosody settings and the rendering time; rendering the section of the synthesized data in dependence upon the identified particular prosody setting. THE REJECTIONS 1. The Examiner rejected claims 1, 25, and 261 under 35 U.S.C. § 102(b) as anticipated by Kibre (US 6,792,407 B2; Sept. 14, 2004). Ans. 3-5.2 1 Although the Examiner omits claims 25 and 26 from the statement of the rejection, the Examiner nonetheless includes these claims in the rejection’s discussion. Ans. 3. Accordingly, we presume that claims 25 and 26 are intended to be rejected and present the correct claim listing here for clarity. Accord App. Br. 6; Reply Br. 4. 2 Throughout this opinion, we refer to (1) the Appeal Brief filed September 25, 2009 (“App. Br.”); (2) the Examiner’s Answer mailed December 30, Appeal 2010-007040 Application 11/266,559 3 2. The Examiner newly rejected claim 26 under 35 U.S.C. § 101 as directed to non-statutory subject matter. Ans. 6. THE § 101 REJECTION The Examiner finds that claim 26 recites non-statutory subject matter since it “attempts to claim a machine readable medium” that can be interpreted as a magnetic carrier wave according to the Specification. Ans. 6. Appellants argue that the claim recites a computer-readable recording medium—not a carrier wave. Reply Br. 10. ISSUE Under § 101, has the Examiner erred in rejecting claim 26 by finding that it can include patent-ineligible transitory media? PRINCIPLES OF LAW Signals are unpatentable under § 101. In re Nuijten, 500 F.3d 1346, 1355 (Fed. Cir. 2007). According to U.S. Patent & Trademark Office (USPTO) guidelines: A claim that covers both statutory and non-statutory embodiments . . . embraces subject matter that is not eligible for patent protection and therefore is directed to non-statutory subject matter. . . . For example, a claim to a computer readable medium that can be a compact disc or a carrier wave covers a non-statutory embodiment and therefore should be rejected under § 101 as being directed to non-statutory subject matter. 2009 (“Ans.”); and (3) the Reply Brief filed February 12, 2010 (“Reply Br.”). Appeal 2010-007040 Application 11/266,559 4 U.S. Patent & Trademark Office, Interim Examination Instructions for Evaluating Subject Matter Eligibility Under 35 U.S.C. § 101, at 2 (Aug. 2009), available at http://www.uspto.gov/web/offices/pac/dapp/opla/2009- 08-25_interim_101_instructions.pdf. The USPTO also notes the following: The broadest reasonable interpretation of a claim drawn to a computer readable medium . . . typically covers forms of non- transitory tangible media and transitory propagating signals per se in view of the ordinary and customary meaning of computer readable media, particularly when the specification is silent. When the broadest reasonable interpretation of a claim covers a signal per se, the claim must be rejected under 35 U.S.C. § 101 as covering non-statutory subject matter. David J. Kappos, Subject Matter Eligibility of Computer Readable Media, 1351 Off. Gaz. Pat. Office 212 (Feb. 23, 2010) (citations omitted). ANALYSIS We will not sustain the Examiner’s rejection of claim 26 under § 101. As Appellants explain (Reply Br. 10), claim 26 unambiguously recites a computer-readable recording medium—media that is expressly distinguished from transmission media. Spec. 60:17-27. Although exemplary recording media broadly includes magnetic disks, compact disks, magnetic tape, and “others as will occur to those of skill in the art” (Spec. 60:22-24 (emphasis added))—a non-limiting and open-ended list of examples—we nonetheless find that when read in context, the recited recording medium is limited to non-transitory recordable media consistent with the listed examples. As such, the recited recording medium does not encompass ineligible transitory media such as signals. Appeal 2010-007040 Application 11/266,559 5 We are therefore persuaded that the Examiner erred in rejecting claim 26 under § 101. THE ANTICIPATION REJECTION The Examiner finds that Kibre’s method for voice-rendering synthesized data has every recited feature of independent claim 1 including retrieving synthesized data to be voice-rendered3—a feature which is said to correspond to data retrieved from “disparate data sources and types,” and is used to provide a “final rendering” to be synthesized from text to speech. Ans. 3, 6-7. According to the Examiner, Kibre also (1) determines a section of the synthesized data to be rendered depending on the synthesized data and context information, and (2) selects a section of the synthesized data depending on an identified section length as claimed. Ans. 4-5, 7-9. Appellants argue that Kibre does not retrieve synthesized data to be voice-rendered, let alone determine a section of that data depending on the synthesized data to be rendered and context information as claimed. App. Br. 7-10; Reply Br. 6-9. Appellants add that Kibre does not identify a section length depending upon the context information, much less select a section of the synthesized data depending upon the identified length. App. Br. 10-11; Reply Br. 9-10. ISSUE Under § 102, has the Examiner erred in rejecting claim 1 by finding that Kibre (1) retrieves synthesized data to be voice-rendered; (2) determines 3 Although Appellants do not hyphenate this term, we nonetheless hyphenate it for clarity and proper grammatical form. Appeal 2010-007040 Application 11/266,559 6 a section of the synthesized data depending on the synthesized data to be rendered and context information; (3) identifies a section length depending on the context information; and (4) selects a section of the synthesized data to be rendered depending on the identified length? ANALYSIS As noted above, the first disputed issue is whether Kibre retrieves synthesized data to be voice-rendered. As Appellants indicate (App. Br. 8), the Specification defines “synthesized data” as “aggregated data which has been synthesized into data of a uniform data type.” Spec. 9:1-2. The Specification also defines “aggregated data” as “the accumulation, in a single location, of data of disparate types.” Spec. 8:28-31 (emphasis added). Notably, “[d]isparate data types are data of different kind and form” which can include differences in data structure, file format, transmission protocol, “and other distinctions as will occur to those of skill in the art.” Spec. 6:5-8 (emphasis added). Exemplary disparate data types include MP3 files, XML documents, email documents, “and so on as will occur to those of skill in the art.” Spec. 6:9-14 (emphasis added). We emphasize this permissive and open-ended language, for it underscores that disparate data types that are synthesized into a uniform type to form synthesized data are hardly limited to the examples in the Specification. Turning to Kibre, concatenative synthesizer 24 produces synthesized speech from text using recorded “snippets” (i.e., recorded sound units stored as allophones, diphones, and/or triphones)4 for particular speakers that are 4 See also Kibre, col. 7, l. 49 – col. 8, l. 2 (noting that snippets may be stored as samples of digitized recorded speech or parameterized). Appeal 2010-007040 Application 11/266,559 7 stored in a database 18. Kibre, Abstract; col. 1, ll. 9-13; col. 2, ll. 23-27; col. 4, ll. 61-65; Fig. 1. The snippet database is based on text 20 that is (1) acquired from a text selection technique shown in Figure 2, and (2) read by a speaker to provide the recorded snippets. Kibre, col. 5, ll. 35-42; Figs. 1-2. As shown in Figure 2, Kibre selects text from a variety of sources, including databases 31, digitized literature 33, electronic dictionaries 34, technical reports 32, etc. Kibre, col. 5, l. 63 – col. 6, l. 9; Fig. 2. This text is parsed into words and phrases that are analyzed to produce phonemes which are, in turn, analyzed by sound analysis module 46 to produce sound units that are stored in a data structure. Kibre, col. 6, ll. 10-64; Fig. 2. Optimal set selection module 52 then (1) uses a “greedy” selection algorithm 54 to identify the smallest text subset containing all unit types needed to represent the database, and (2) provides an optimal set of words and phrases that is displayed to a speaker. Kibre, col. 7, ll. 9-23. The speaker then reads these optimal words and phrases while the speech is captured and digitized to develop the database. Kibre, col. 7, ll. 25-29. Although Kibre retrieves text data from different sources to develop the snippet database as Appellants contend (App. Br. 8; Reply Br. 7), this text data is nonetheless derived from disparate data types even assuming, without deciding, that the textual sources (databases 31, digitized literature 33, electronic dictionaries 34, technical reports 32, etc.) are the same data type (e.g., the same type of text file, such as ASCII). Notably, the Examiner reasons that since Kibre uses not only text, but also voiceto provide the final rendering for text-to-speech synthesis, Kibre retrieves synthesized data to be voice-rendered as claimed. Ans. 7. This use of voice and text to personalize text-to-speech synthesis by updating stored snippets for new speakers is a Appeal 2010-007040 Application 11/266,559 8 key aspect of Kibre’s system. See Kibre, Abstract; col. 4, l. 66 – col. 5, l. 23; col. 7, l. 30 – col. 8, l. 23; Figs. 1-3. This update is based on iteratively comparing the new speaker’s snippets with those in the database—snippets that are derived from text as noted above. Kibre, col. 2, ll. 23-27; col. 5, ll. 6-23; Figs. 1, 3. These snippets, then, fully meet “synthesized data” under Appellants’ definition since they result from aggregating disparate-type data from different documents and speakers that is synthesized into data of a uniform type, namely snippet data structures for storage and comparison. And as the Examiner indicates (Ans. 7), these snippets are to be voice- rendered, for they are retrieved and used by the concatenative synthesizer 24 to produce synthesized speech. See Kibre, col. 5, ll. 40-45; col. 7, ll. 25-32; Fig. 3. Nor are we persuaded of error in the Examiner’s finding that Kibre determines a section of the synthesized data depending on the synthesized data to be rendered and context information for the context in which the data is to be voice-rendered as claimed. Ans. 4, 7-9. As the Examiner indicates (Ans. 8), Appellants define context information quite broadly as “data describing the context in which synthesized data is to be voice rendered” and provide various non-limiting examples including “other context information . . . as will occur to those of skill in the art.” Ans. 8 (quoting Spec. 47:19-23 (emphasis added)). Despite Appellants’ arguments to the contrary (App. Br. 8-10; Reply Br. 7-9), nothing in the claim nor this open-ended description precludes Kibre’s using “context information” as the Examiner indicates (Ans. 7-9), for Kibre’s system not only determines the context of word and phrases, but also sound units which are ultimately used for voice rendering as noted above. See Kibre, col. 6, ll. 35-64; Fig. 2 (numerals 44, 48). Appeal 2010-007040 Application 11/266,559 9 Appellants’ contention that Kibre merely refers to the internal organization of the data itself and not the external voice-rendering environment (App. Br. 9-10; Reply Br. 8-9) is unavailing and not commensurate with the scope of the claim. Lastly, we are unpersuaded of error in the Examiner’s finding that Kibre (1) identifies a section length depending on the context information, and (2) selects a section of the synthesized data to be rendered depending on the identified length as claimed. Ans. 4, 10. As the Examiner indicates, Appellants define “section length” quite broadly as “typically implemented as a quantity of the synthesized content” and provide several non-limiting examples including (1) a particular number of (a) synthesized data bytes; (b) lines or paragraphs of text; or (c) chapters of content, or (2) “any other quantity of the synthesized content . . . as will occur to those of skill in the art.” Ans. 10 (quoting Spec. 58:3-10 (emphasis added)). The Examiner refers to various quantities in Kibre that correspond to this open-ended description of “section length” including syllables, demi-syllables, and pairs of half-syllables that can constitute sound units depending on the nature of the synthesizer. Ans. 10 (citing Kibre, col. 6, ll. 51-54). Since these units represent a certain quantity of synthesized content, they constitute “section lengths” under Appellants’ definition. And since sections of synthesized data are selected for rendering depending at least partly on this length, we find no error in the Examiner’s reliance on Kibre in this regard. Appellants’ arguments regarding Kibre’s recording context of words and phrases to generate a snippets database (App. Br. 10-11; Reply Br. 9-10) are unavailing and not commensurate with the scope of the claim. Appeal 2010-007040 Application 11/266,559 10 We are therefore not persuaded that the Examiner erred in rejecting representative claim 1, and claims 25 and 26 not separately argued with particularity. CONCLUSION The Examiner erred in rejecting claim 26 under § 101, but did not err in rejecting claims 1, 25, and 26 under § 102. ORDER The Examiner’s decision rejecting claims 1, 25, and 26 is affirmed. No time period for taking any subsequent action in connection with this appeal may be extended under 37 C.F.R. § 1.136(a)(1)(iv). AFFIRMED babc Copy with citationCopy as parenthetical citation