Ex Parte Gruenstein et al

Patent Trial and Appeal BoardDec 12, 2018

13838379 (P.T.A.B. Dec. 12, 2018)

UNITED STA TES p A TENT AND TRADEMARK OFFICE APPLICATION NO. FILING DATE FIRST NAMED INVENTOR 13/838,379 03/15/2013 Alexander H. Gruenstein 26192 7590 12/14/2018 FISH & RICHARDSON P.C. PO BOX 1022 MINNEAPOLIS, MN 55440-1022 UNITED STATES DEPARTMENT OF COMMERCE United States Patent and Trademark Office Address: COMMISSIONER FOR PATENTS P.O. Box 1450 Alexandria, Virginia 22313-1450 www .uspto.gov ATTORNEY DOCKET NO. CONFIRMATION NO. 16113-4437001 7937 EXAMINER OPSASNICK, MICHAEL N ART UNIT PAPER NUMBER 2658 NOTIFICATION DATE DELIVERY MODE 12/14/2018 ELECTRONIC Please find below and/or attached an Office communication concerning this application or proceeding. The time period for reply, if any, is set in the attached communication. Notice of the Office communication was sent electronically on above-indicated "Notification Date" to the following e-mail address(es): P ATDOCTC@fr.com PTOL-90A (Rev. 04/07) UNITED STATES PATENT AND TRADEMARK OFFICE BEFORE THE PATENT TRIAL AND APPEAL BOARD Ex parte ALEXANDER H. GRUENSTEIN and PETARALEKSIC Appeal2017-002237 Application 13/838,379 Technology Center 2600 Before JEREMY J. CURCURI, JUSTIN BUSCH, and PHILLIP A. BENNETT, Administrative Patent Judges. BUSCH, Administrative Patent Judge. DECISION ON APPEAL Pursuant to 35 U.S.C. Â§ 134(a), Appellants appeal from the Examiner's decision to reject claims 1, 3-8, 10-15, and 17-21. Claim 22 is objected to as being dependent on a rejected base claim. Final Act. 1, 8. Oral arguments were heard on November 14, 2018. A transcript of the hearing was placed in the record. We have jurisdiction over the pending claims under 35 U.S.C. Â§ 6(b). Claims 2, 9, and 16 were cancelled previously. We reverse. CLAIMED SUBJECT MATTER Appellants' invention generally relates to speech recognition and, more specifically, to mixed model speech recognition. Spec. ,r,r 2, 18, Title. Appeal2017-002237 Application 13/838,379 Even more specifically, aspects of Appellants' invention relate to using both a client-side limited language model speech recognizer developed using user-specific data and a server-side speech recognizer model with a larger vocabulary developed independent of user-specific data. Spec. ,r 18. Claims 1, 8, and 15 are the independent claims. Claim 1 is illustrative and reproduced below, with the disputed limitations italicized: 1. A computer-implemented method comprising: accessing audio data generated by a computing device based on audio input from a user, the audio data encoding one or more user utterances; generating a first transcription of the utterances by performing speech recognition on the audio data using a first speech recognizer, wherein the first speech recognizer employs a language model that is based on user-specific data; generating a second transcription of the utterances by performing speech recognition on the audio data using a second speech recognizer, wherein the second speech recognizer employs a language model independent of user-specific data; determining that the second transcription of the utterances includes a term from a predefined set of one or more terms associated with actions that are performable by the computing device; and based on determining that the second transcription of the utterance includes the term from the predefined set of one or more terms, providing an output of the first transcription of the utterance. REJECTION Claims 1, 7, 8, 14, 15, and 20 stand provisionally rejected on the ground of obviousness-type double-patenting over claims 1, 3-9, 11-16, and 18-20 of U.S. Application No. 13/892,590. Final Act. 3. 2 Appeal2017-002237 Application 13/838,379 Claims 1, 3-8, 10-15, and 17-20 stand rejected under 35 U.S.C. Â§ 102 as anticipated by Phillips (US 2011/0066634 Al; Mar. 17, 2011 ). Final Act. 4---6. Claim 21 stands rejected under 35 U.S.C. Â§ 103 as obvious in view of Phillips and Ma (US 2011/0144996 Al; June 16, 2011). Final Act. 7. ANALYSIS THE PROVISIONAL OBVIOUSNESS-TYPE DOUBLE-PATENTING REJECTION Although Appellants do not contest the provisional non-statutory (obviousness-type) double-patenting rejection of claims 1, 7, 8, 14, 15, and 20, see generally Appeal Br. 4--8; Reply Br. 1-3, we decline to reach this provisional rejection. We note U.S. Application No. 13/892,590 issued as U.S. Patent No. 9,058,805 on May 27, 2015, prior to the mailing date of the Office Action from which this Appeal is taken. We also note the Examiner included claims 2, 9, and 16 in this rejection, but these claims were canceled previously, THE 35 U.S.C. Â§ 102 REJECTION Appellants argue the rejection of claims 1, 3-8, 10-15, and 17-20 under 35 U.S.C. Â§ 102 as a group. See Appeal Br. 4---6; Reply Br. 1-3. Independent claims 8 and 15 recite commensurate limitations to those limitations argued with respect to claim 1 and claims 3-7, 10-14, and 17-20 depend from, and incorporate the limitations of, claims 1, 8, and 15, respectively. Thus, we select independent claim 1 as representative of this group, and the remaining claims 2-8, 10-15, and 17-20 stand or fall with claim 1. 37 C.F.R. Â§ 4I.37(c)(l)(iv). The Examiner finds Phillips discloses every limitation recited in independent claim 1. Final Act. 4--5. Specifically, the Examiner finds: 3 Appeal2017-002237 Application 13/838,379 ( 1) Phillips' language models output based on usage history in one pass of speech recognition discloses the recited output of the first transcription, Final Act. 4 ( citing Phillips ,r 60, Fig. 2); Ans. 7 ( citing Phillips ,r 66); (2) Phillips' acoustic model output and/or Phillips' "recognized/accepted text" disclose the recited second transcription, compare Final Act. 4--5 (finding "acoustic models to detect the sound" disclose the recited "generating a second transcription" (citing Phillips ,r,r 10, 69)) and Ans. 3 (same) with Ans. 7 (finding "the ASR [(Automated Speech Recognition)] server analyzes recognized/ accepted text ( second transcript)" ( citing Phillips ,r 69)); (3) Phillips' "user action" discloses determining the second transcription includes a predefined term associated with an action, Final Act. 5 (citing Phillips ,r 47, Fig. 7a); and (4) Phillips' use of "the combination of both models to generate a recognized result" discloses providing an output for the first transcription based on determining the second transcription includes a predefined term. Final Act. 5 ( citing Phillips i166). The Examiner finds the ASR server determines whether speech recognition results meet certain criteria, which "includes, among other items, client state information, recognized text, accepted text, timing information, user actions, and the like." Ans. 7 (citing Phillips ,r,r 10, 174, 189--190). If the results meet the criteria, the ASR server outputs the first transcription without changing the language models. Ans. 7. If the results do not meet the criteria, the ASR server determines the language models need to be updated and the output is different than the first transcription. Ans. 7. Appellants acknowledge that Phillips discloses using client state information to select speech recognition models and alter words in speech 4 Appeal2017-002237 Application 13/838,379 recognition results depending on the application for which speech recognition is performed----e.g., "applying address-specific changes ... in a location-based application." Appeal Br. 5 ( citing Phillips ,r 66). Appellants also acknowledge Phillips discloses using the speech recognition output to determine and perform appropriate actions on a phone. Appeal Br. 5 ( citing Phillips ,r 103). Appellants argue, however, that nothing the Examiner cites in Phillips relates to providing an output of one transcription based on a determination that a different transcription includes a predefined term. Appeal Br. 5---6. We agree with Appellants that the Examiner has not demonstrated Phillips discloses the disputed limitations. In particular, we agree that Phillips does not disclose providing an output of a first transcription of utterances, which is generated employing a language model based on user- specific data, based on a determination that a different second transcription of the utterances, which is generated employing a language model independent of user-specific data, includes a predefined term associated with an action that is performable by a device. As discussed above, the Examiner finds Phillips' output from using language models based on usage history for one pass discloses the recited "generating a first transcription ... by performing speech recognition ... using a first speech recognizer [that] employs a language model that is based on user-specific data." Final Act. 4 ( citing Phillips ,r 60, Fig. 2). The Examiner's Mapping of the Second Transcription to Phillips' Acoustic Model Output Even accepting the Examiner's findings that Phillips' use of language models for one pass discloses the recited generating a first transcription and 5 Appeal2017-002237 Application 13/838,379 Phillips' use of an acoustic model discloses the recited generating a second transcription, Appellants argue Phillips does not disclose providing an output of the transcription using the language model based on a determination that transcription using the acoustic model includes a predefined term, as required by claim 1. Reply Br. 1-3. Appellants acknowledge Phillips discloses the possibility of rerunning speech recognition on new models when the current models are not appropriate. Reply Br. 2-3. Appellants argue, however, Phillips does not disclose outputting the results of the first transcription if it reruns the speech recognition with a new model. Reply Br. 2-3. Rather, Phillips either outputs the results of the acoustic model (i.e., the output the Examiner maps to the second transcription) or Phillips combines the recognition outputs (transcriptions) of multiple language models. Reply Br. 1-2 ( citing Phillips ,r 70, Fig. 5a). Thus, Appellants argue, even assuming Phillips' speech recognition using a language model discloses the recited step of generating a first transcription and Phillips' rerunning speech recognition using an acoustic model discloses the recited step of generating a second transcription, Phillips does not disclose the disputed limitations because Phillips fails to disclose outputting the results of the language model after rerunning the speech recognition using an acoustic model. Reply Br. 2-3 ("Phillips has no disclosure of choosing the recognition output of a first run ... over the recognition output of a second run ... after the recognition is rerun based on new models."). We agree with Appellants. We see nothing in Phillips' cited portions that discloses providing an output of the first speech recognition pass in response to determining the second transcription from the acoustic model 6 Appeal2017-002237 Application 13/838,379 includes a predefined term. On the contrary, Phillips discloses that, "if the current recognition models are not appropriate given the characteristics of the audio data and the client state information," the ASR "may load new or additional recognition models" and "rerun the recognition based on these new models." Phillips ,r 69. Phillips, thus, does not disclose providing the output of the first pass when the ASR server determines the original recognition models were inappropriate. In fact, Phillips' disclosure suggests providing the results from rerunning the recognition using the new models and, presumably, discarding the old results as "not appropriate." The Examiner's Mapping of the Second Transcription to Phillips' Recognize/Accepted Text Again accepting the Examiner's findings that Phillips' use of language models for one pass discloses the recited generating a first transcription, Appellants argue the Examiner errs in finding Phillips' "recognized/accepted text" discloses the recited second transcription. Reply Br. 1, 3. Specifically, Appellants argue Phillips' "recognized" text involves optionally applying a process to alter the output words based on client state information and Phillips' "accepted" text is user-generated. Reply Br. 3 ( citing Phillips ,r,r 64, 66, 1 117). Thus, Appellants argue Phillips' recognized and accepted text fail to disclose the recited step of generating a second transcription using a second speech recognizer because Phillips' recognized/accepted text is not the result of a speech recognizer employing a language model independent of user-specific data, as recited in claim 1. Reply Br. 3. 1 Appellants erroneously cite paragraph 67, but the language quoted actually appears in paragraph 66. 7 Appeal2017-002237 Application 13/838,379 We disagree with Appellants characterization of the Examiner's findings with respect to the Phillips' "recognized/accepted text." Although the Examiner provides little more than citation to paragraphs in Phillips, the particular paragraphs cited indicate Appellants' arguments do not address the particular text the Examiner identifies. See Ans. 7 ("Examiner notes that recognized/accepted text includes recognized commands (see Phillips, para 0010 for the command format input; para 0174, para 0189-0190, showing command elements and content elements)"). Nevertheless, regardless of the particular text the Examiner cites as disclosing the recited second transcript, the Examiner's findings are based on rerunning speech recognition using new models if Phillips' ASR determines the original models are inappropriate. Thus, for the same reasons discussed above, we are persuaded the Examiner erred because Phillips' cited portions do not disclose providing an output of a first transcription based on a determination that a second transcription includes a predefined term. Moreover, even considering the additionally cited portions Phillips in the Answer, Phillips does not disclose the subject matter recited in claim 1. See Ans. 7 (citing Phillips ,r,r 10, 174, 189--190). Paragraphs 10, 174, 189, and 190 of Phillips do, in fact, disclose generating a first transcription of audio data ( of captured utterances) using a resident speech recognition facility, determining that the audio data includes a command, sending at least a portion of that audio data to a remote speech recognition facility, and presenting the output of the second speech recognition facility to the user. See Phillips ,r,r 10, 174, 189--190. Thus, Phillips' resident speech recognition facility is similar to the recited second speech recognizer and 8 Appeal2017-002237 Application 13/838,379 Phillips' remote speech recognition facility is similar to the recited first speech recognizer because Phillips provides output of the remote speech recognition facility based on a determination that the output of the resident speech recognition facility includes a command (i.e., a "term from the predefined set of one or more terms, providing an output of the first transcription of the utterance"). However, to the extent Phillips' resident speech recognition facility and remote recognition facility disclose the recited second speech recognizer and first speech recognizer, respectively, Phillips does not disclose the particular characteristics of the recited speech recogmzers. Specifically, Phillips, at most, suggests the remote speech recognition facility (similar to the recited first speech recognizer) uses a model based on user-specific data. Phillips discloses "generating speech-to-text results utilizing the remote speech recognition facility based at least in part on the speech and on the information related to the mobile communication facility." Phillips ,r 190 ( emphasis added). Phillips further discloses transmitting the information related to the mobile communication facility, which "includes information about a command recognized by the resident speech recognition facility and information about contacts stored on the mobile communication facility," from the mobile communication facility to the remote speech recognition facility. Phillips ,r 190. Thus, although Phillips may suggest to a person of ordinary skill in the art that the information related to the mobile communication facility is "user-specific data" and that such information is used to transcribe the audio data, Phillips does not explicitly disclose that the remote speech recognition facility uses a language model based on user- specific data. 9 Appeal2017-002237 Application 13/838,379 Furthermore, claim 1 recites that "the second speech recognizer employs a language model independent of user-specific data." Even assuming Phillips discloses or suggests its remote speech recognition facility uses a language model based on user-specific data, this would lead to a conclusion that Phillips' resident speech recognition facility (similar to the recited second speech recognizer) also uses a language model based on user- specific data. That is, to the extent Phillips' cited portions are understood to disclose or suggest that Phillips' remote speech recognition facility language model is based on the information about contacts stored on the mobile communication facility, Phillips also discloses that the resident speech recognition facility uses a language model based on user-specific data. See Phillips ,r,r 173-17 6, 189-190. In particular, Phillips discloses that the resident speech recognition facility processes the speech "to recognize command elements and content elements, wherein the content elements include the contact name for at least one of a text message and an email message." Phillips ,r 190. Phillips also discloses limiting processing on a mobile device and improving accuracy by processing on the device only the part of the speech that is most predictable, and processing the remainder of the speech remotely. Phillips ,r 173; see Phillips ,r,r 174--175. Notably, Phillips discloses that, because "name detection is done on the device, it is easier to manage the creation of user-chosen names without network interaction." Phillips ,r 175 ( emphasis added). Thus, although Phillips discloses providing output of one transcription based on a determination that a different transcription includes a predefined term, Phillips does not disclose the particular arrangement claimed. 10 Appeal2017-002237 Application 13/838,379 Summary of Analysis of 3 5 US. C. Â§ 102 Rejection For the reasons discussed above, we are persuaded the Examiner erred. Accordingly, we do not sustain the Examiner's rejection of independent claim 1. For similar reasons, we also do not sustain the Examiner's rejection of independent claims 8 and 15, which recite similar limitations, or claims 3-7, 10-14, and 17-20, which depend from claims 1, 8, and 15, respectively. THE 35 U.S.C. Â§ 103 REJECTION Appellants separately argue the patentability of claim 21 under 35 U.S.C. Â§ 103 as obvious in view of Phillips and Ma. Appeal Br. 6-7. Claim 21 depends directly from, and incorporates the limitations of, independent claim 1. The Examiner neither relies on Ma nor provides a rationale for modifying Phillips to cure the deficiencies identified above with respect to the disputed limitations of claim 1. Thus, we reverse the rejection of claim 21 for the same reasons discussed above with respect to claim 1. DECISION The Examiner's decision to reject claims 1, 3-8, 10-15, and 17-21 is reversed. REVERSED 11