Ex Parte Wolff et al

Patent Trial and Appeal BoardAug 31, 2017

14382839 (P.T.A.B. Aug. 31, 2017)

United States Patent and Trademark Office UNITED STATES DEPARTMENT OF COMMERCE United States Patent and Trademark Office Address: COMMISSIONER FOR PATENTS P.O.Box 1450 Alexandria, Virginia 22313-1450 www.uspto.gov APPLICATION NO. FILING DATE FIRST NAMED INVENTOR ATTORNEY DOCKET NO. CONFIRMATION NO. 14/382,839 09/04/2014 Tobias Wolff NUANCE-053PUS 3919 115788 7590 09/05/2017 Nuance EXAMINER c/o Daly, Crowley, Mofford and Durkee, LLP 354A Turnpike Street KIM, JONATHAN C Suite 301A Canton, MA 02021-2714 ART UNIT PAPER NUMBER 2659 NOTIFICATION DATE DELIVERY MODE 09/05/2017 ELECTRONIC Please find below and/or attached an Office communication concerning this application or proceeding. The time period for reply, if any, is set in the attached communication. Notice of the Office communication was sent electronically on above-indicated "Notification Date" to the following e-mail address(es): ip.inbox@nuance.com docketing@dc-m.com amk@dc-m.com PTOL-90A (Rev. 04/07) UNITED STATES PATENT AND TRADEMARK OFFICE BEFORE THE PATENT TRIAL AND APPEAL BOARD Ex parte TOBIAS WOLFF, MARKUS BUCK, TIM HAULICK, and SUHADI Appeal 2017-005023 Application 14/382,839 Technology Center 2600 Before JOHN A. JEFFERY, DENISE M. POTHIER, and JASON J. CHUNG, Administrative Patent Judges. POTHIER, Administrative Patent Judge. DECISION ON APPEAL Appeal 2017-005023 Application 14/382,839 STATEMENT OF THE CASE Appellants appeal under 35 U.S.C. § 134(a) from the Examiner’s rejection of claims 1—10 and 18—30. Claims 11—17 have been canceled. App. Br. 13—14 (Claims App’x).1,2 We have jurisdiction under 35 U.S.C § 6(b). We affirm-in-part. We present new grounds pursuant to 37 C.F.R. § 41.50(b). The Invention Appellants’ invention relates to a device, medium, and method which employ “a user dedicated, multi-mode, voice controlled interface using automatic speech recognition” (ASR). Spec. 11. The user interface is adapted to conduct a speech dialog with one or more possible speakers and includes a broad listening mode which accepts speech inputs from the possible speakers without spatial filtering, and a selective listening mode which limits speech inputs to a specific speaker using spatial filtering. The user interface switches listening modes in response to one or more switching cues. Id. 14. Switching cues include one or more words. Id. 1 5. Claim 1 is reproduced below with emphasis: 1. A device for automatic speech recognition (ASR) comprising: microphones; 1 Throughout this opinion, we refer to (1) the Final Action (Final Act.) mailed April 14, 2016, (2) the Appeal Brief (App. Br.) filed September 20, 2016, (3) the Examiner’s Answer (Ans.) mailed December 2, 2016, and (4) the Reply Brief (Reply Br.) filed January 30, 2017. 2 The Final Action (Final Act. 1) and the Appeal Brief (App. Br. 1) mistakenly include canceled claims 11—17 as pending. 2 Appeal 2017-005023 Application 14/382,839 a voice-control module coupled to the microphones, the voice control module having a computer processor and a memory configured to: interface with a user via automatic speech recognition (ASR) using at least one of the microphones including: a broad listening mode to accept speech inputs from a number of possible speakers without spatial filtering; and a selective listening mode to limit speech inputs to a specific one of the possible speakers using spatial filtering, wherein the voice-control module switches between the broad listening mode and the selective listening mode in response to one or more switching cues; and communicate to the specific one of the possible speakers that the voice-control module is in the selective listening mode for the specific one of the possible speakers. The Examiner relies on the following as evidence of unpatentability: The Rejections Claims 4, 8, 9, 21, and 25—27 are rejected under 35 U.S.C. § 112(b) or second paragraph (pre-AIA), as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor regards as the invention. Final Act. 4—5. Claims 1—3, 6, 10, 18—20, 23, 27, and 28 are rejected under § 102(a) (pre-AIA) as being anticipated by Tadashi. Id. at 5—9. MacTavish Nagahama Mozer Hart Tadashi US 2008/0162120 A1 July 3, 2008 US 2009/0055170 A1 Feb. 26, 2009 US 2009/0204410 A1 Aug. 13, 2009 US 8,700,392 B1 Apr. 15,2014 EP 1 400 814 A2 Mar. 24, 2004 3 Appeal 2017-005023 Application 14/382,839 Claims 4 and 21 are rejected under § 103 as being unpatentable over Tadashi and Mozer. Id. at 9. Claims 5, 7, 22, 24, and 29 are rejected under § 103 as being unpatentable over Tadashi and Hart. Id. at 9—11. Claims 8, 9, 25, and 26 are rejected under § 103 as being unpatentable over Tadashi and Nagahama. Id. at 11—13. Claim 30 is rejected under § 103 as being unpatentable over Tadashi, Hart, and MacTavish. Id. at 13. THE INDEFINITENESS REJECTION Appellants recognize the indefmiteness rejection of claims 4, 8, 9, 21, and 25—27, but do not present any arguments. App. Br. 4. We, thus, summarily sustain this rejection. See Hyatt v. Dudas, 551 F.3d 1307, 1314 (Fed. Cir. 2008) (explaining that when appellants fail to contest a ground of rejection, the Board may affirm the rejection without considering its substantive merits); see also 37 C.F.R. § 41.37(c)(l)(iv); Manual of Patent Examining Procedure (MPEP) § 1205.02, 9th ed., Rev.07.2015 (Nov. 2015) (“If a ground of rejection stated by the examiner is not addressed in the appellant’s brief, appellant has waived any challenge to that ground of rejection and the Board may summarily sustain it.”). THE ANTICIPATION REJECTION Regarding independent claim 1, the Examiner finds that Tadashi discloses all its elements, including a voice-control module that has memory configured to communicate to the specific one of the possible speakers that the voice-control module is in the selective listening mode for the specific 4 Appeal 2017-005023 Application 14/382,839 one of the possible speakers. Final Act. 6 (citing Tadashi 143). The Examiner cites to additional paragraphs in the Examiner’s Answer to teach the recited “configured to . . . communicate” limitation in claim 1. Ans. 3 (citing Tadashi || 40-43, 48-49). Appellants argue Tadashi’s paragraph 43 does not communicate anything to the certain utterer recognized by the system. App. Br. 5, 7. Responding to the newly-cited passages in the Examiner’s Answer, Appellants contend one skilled in the art would not have known whether a user/utterer would have perceived the disclosed camera and support table movements based on Tadashi’s discussion of camera drive controller 211 in paragraph 49. See Reply Br. 6—7. Based on this position, Appellants assert the camera and support table movements do not disclose inherently or necessarily communicating to a user that “the voice-control module is in the selective listening mode for the specific one of the possible speakers” as claimed. See id. ISSUE Under § 102(a),3 has the Examiner erred in rejecting claim 1 by finding that Tadashi discloses a “voice control module having ... a memory configured to: . . . communicate to the specific one of the possible speakers that the voice-control module is in the selective listening mode for the specific one of the possible speakers”? 3 The Examiner rejected the claims based on 35 U.S.C. § 102(a) (pre-AIA). Notably, Tadashi published (i.e., March 24, 2004) more than a year before the effective filing date of the instant application and also qualifies as prior art under § 102(b). 5 Appeal 2017-005023 Application 14/382,839 ANALYSIS Based on the record before us, we find error in the Examiner’s rejection of independent claim 1 as presented. Tadashi discloses when a certain utterer states “start,” recognition unit 300 recognizes this word as a keyword. Tadashi || 40, 43. In response, only utterances from the certain utterer are emphasized and supplied to unit 300 and directional detector 201 determines the arrival direction of the input sound (e.g., step S4 in Fig. 3). Id. 41, 43, Fig. 3. Recognizing a keyword and supplying words to a recognition unit (e.g., 300) do not disclose a voice-control module that communicates to a speaker that voice-control module is in the selective listening mode, which limits speech inputs to a specific one of the possible speakers as recited in claim 1. See id. H 40-41, 43. Nor does Tadashi’s teaching of focusing a microphone in the direction of the identified speaker as discussed by the Examiner (see Ans. 2) sufficiently disclose a voice- control module having the ability to communicate to the specific speaker or utter the voice-control module is in the selective listening mode. Although the Examiner cites paragraph 42 in Tadashi when discussing the selective listening mode (Final Act. 6; Ans. 2), the Examiner discusses different paragraphs when addressing the “communicate” limitation (Final Act. 6 (citing Tadashi 143); Ans. 2 (citing Tadashi || 48-49)). In particular, the Examiner discusses Tadashi’s camera movement as disclosing the limitation in dispute. Ans. 2 (citing Tadashi || 48-49). The Examiner states “controlling the camera to the direction of the utterer is considered a visual form of communication to the utterer that it is ready to continue receive speech input from that speaker.” Id. Based on these movements, the Examiner determines Tadashi discloses visually that the voice-control 6 Appeal 2017-005023 Application 14/382,839 module is configured to communicate to the utterer that the module is in the selective listening mode as recited in claim 1. See id. We disagree. Granted, Tadashi teaches camera drive controller 211 sets the camera view direction “to a certain utterer according to keyword utterance of the certain utterer.” Tadashi 149. Additionally, Tadashi discloses the camera is arranged on a support table and the table can change horizontal and vertical directions of the support table using a drive unit. Id. 148. However, as noted by Appellants, Tadashi does not establish the support table and camera are necessarily viewable to the recited “specific one of the possible speakers.” See Reply Br. 6—7. As such, although Tadashi may teach or suggest the possibility, or even the probability, that the support table and camera are viewable by the utterer (e.g., a specific one of the possible speakers to which speech inputs are limited), the record does not establish sufficiently that Tadashi’s support table and camera are necessarily viewable and, thus, able to communicate to the utterer “the voice-control module is in the selective listening mode for the specific one of the possible speakers” as recited. See In re Robertson, 169 F.3d 743, 745 (Fed. Cir. 1999). For the foregoing reasons, Appellants have persuaded us of error in the rejection of (1) independent claim 1, (2) independent claims 10 and 18 which recite commensurate limitations, and (3) dependent claims 2, 3, 6, 19, 20, 23, 27, and 28 for similar reasons. THE OBVIOUSNESS REJECTIONS The Examiner rejects claims 4, 5, 7—9, 21, 22, 24—26, 29, and 30 based on Tadashi and at least one other reference. Final Act. 9—13. The 7 Appeal 2017-005023 Application 14/382,839 rejections do not rely on the additional references to teach the above-noted missing feature. See id. Accordingly, we do not sustain the rejections of claims 4, 5, 7—9, 21, 22, 24—26, 29, and 30 under § 103 as presented. NEW GROUNDS OF REJECTION Claims 1—3, 6, 10, 18—20, 23, 27, and 28 are newly rejected under (I) 35 U.S.C. § 102(b) as being anticipated by Tadashi, and (II) 35 U.S.C. § 103(a) as being unpatentable over Tadashi. Claims 4, 5, 7—9, 21, 22, 24—26, 29, and 30 are newly rejected under 35 U.S.C. § 103(a) based on Tadashi and at least one other reference. Final Act. 9—13. I. ANTICIPATION REJECTION OVER TADASHI Tadashi discloses multiple embodiments of a device for ASR, including “a first embodiment” (Tadashi 1 17 (3:39—43)) (i.e., Figures 1—3), “a second embodiment” {id. (3:44^45)) (i.e., Figure 4), and “a sixth embodiment” {id. (3:52—53)) (i.e., Figures 8—9). We address these embodiments below. For the undisputed limitations in independent claims 1,10, and 18 (App. Br. 5—9), we adopt the Examiner’s findings as our own. Final Act. 5—8. The Sixth Embodiment — Figures 8—9 Tadashi teaches a sixth embodiment of an ASR device having microphones (e.g., a microphone array) and voice-control module having a computer processor and memory (e.g., voice recognition unit 403 and directional setting apparatus 404). Tadashi || 81—87, Fig. 9. This module in Tadashi is configured to interface with a user through ASR (e.g., through unit 403 and apparatus 404) using at least one microphone {id. 1 87) 8 Appeal 2017-005023 Application 14/382,839 including a broad listening mode to accept speech inputs from a number of possible speakers (e.g. setting directivity in all direction and receiving words from driver 401 and passenger 402) {id. H 86—87). Additionally, Tadashi discloses a selective listening mode that limit speech inputs to a specific one of the possible speakers (e.g., driver 401) using spatial filtering {id. H 88— 89), and the voice-control module switches between the broad listening mode and the selective listening mode in response to one or more switching cues (e.g., uttering the key phrase “car navigation”) until the specific speaker uses the releasing phrase (e.g., “thank you”) {id. 88—93). More specifically, Figure 9 demonstrates an example of how this embodiment switches between a broad listening mode (e.g., directivity in all direction and receiving inputs from driver 401 and passenger 402 shown in Figure 8) and a selective listening mode (e.g., forming a narrow directivity toward driver 401’s direction). Id. Tflf 81—95. When driver 401 uses the word “hot” and key phrase “car navigation” (first and second entries in first column), the system is in a broad listening mode and “accept[s] speech inputs from a number of possible speakers without spatial filtering” as recited as shown in third column. Id. Tflf 85—88, Fig. 9. However, once the key phrase (e.g., “car navigation”) is uttered and detected, the system switches to a more focused mode where the direction of the microphone array is on driver 401. Id. 1 88—89, Fig. 9. At this point, the system’s module in “a selective listening mode to limit speech inputs to a specific one of the possible speakers using spatial filtering, wherein the voice-control module switches between the broad listening mode and the selective listening mode in response to one or more switching cues” (e.g., using an activation phrase “car navigation”) as recited in claim 1. 9 Appeal 2017-005023 Application 14/382,839 To illustrate, when driver 401 states “temperature down” (third entry in first column), the system recognizes this command (third entry in fourth column) and responds by lowering the selection temperature of the air conditioner (third entry in fifth column). Id. Tffl 88—90, Fig. 9. When passenger 402 states “cold” or “temperature up” (first and second entries in second column), the system does not respond (fourth and fifth entries in fifth column). Id. Tffl 91—92, Fig. 9. As such, Tadashi’s response to driver 401’s lower temperature request (e.g., turning on a fan and the changing the car’s internal temperature) and Tadashi’s failure to respond to passenger 402’s request communicates aurally (e.g., fan sounds), through a temperature change to the specific speaker (e.g., driver 401), and by not responding to others’ requests that the voice-control module is in the selective listening mode for the specific one speaker as required by claim 1. See id. 11 88-90, Fig. 9. Once driver 401 utters “thank you” (fourth entry in first column), the system releases its directional command (sixth entry in fifth column) and recognition unit 300 accepts key phrases from all directions (seventh entry in third column). Id. 193, Fig. 9. Figure 9 demonstrates yet a further example where passenger 402 gains control of the system and the direction of the microphone arrays is on passenger 402. Id. 194, Fig. 9. When in the “all the directions” mode (e.g., seventh entry of third column), passenger 402 utters key phrase “car navigation” (third entry in second column) and then gains control over the system to raise the temperature by stating “temperature up” (fourth entry in second column and eighth entry in third column). Id. Utterances of “hot” and “thank you” by driver 401 (fifth and sixth entries in first column) during 10 Appeal 2017-005023 Application 14/382,839 this period are not recognized (ninth and tenth entries in fifth column). Id. 195, Fig. 9. Thus, Tadashi’s Figure 9 provides another example of communicating (e.g., aurally through sounds and through a temperature change) to the specific speaker (e.g., passenger 402) that the voice-control module is in the selective listening mode for the specific one speaker as required by claim 1. We apply a similar analysis discussed above concerning claim 1 to independent claims 10 and 18. See also Tadashi, Fig. 8. As for claim 2, the above embodiment further demonstrates different vocabularies are used in the broad listening mode and the selective listening mode. That is, in the broad listening mode, the vocabulary is limited to a key phrase, such as “car navigation.” Id. H 88, 94, Fig. 9. In selective listening mode, different vocabulary is recognized, including “temperature down” and “thank you.” Id. H 90, 93, 94, Fig. 9. These findings are consistent with the disclosure, which states: In broad listening mode the voice controlled user interface 100 uses a limited broad mode recognition vocabulary that includes a selective mode activation word. When the voice controlled user interface 100 detects the activation word, it enters a selective listening mode that uses spatial filtering to limit speech inputs to a specific speaker 102 in the room 101 using an extended selective mode recognition vocabulary. Spec. ]fl9, Fig. 1. Concerning these vocabularies, the Specification also states “[t]he different listening modes may also use different recognition vocabularies, for example, a limited vocabulary in broad listening mode and a larger recognition vocabulary in selective listening mode.” Id. 117. In the Reply Brief, Appellants argue that paragraph 98 supports the same keyword and vocabulary in both the broad and selective modes. Reply 11 Appeal 2017-005023 Application 14/382,839 Br. 4—5 (citing Tadashi H 81, 82, 85, 86, 98). Appellants then argue Tadashi discloses a single vocabulary, which do not differ. But, there is nothing in the disclosure excluding the same words (e.g., “car navigation”) from being in both vocabularies. See Spec. 17, 19. Also, consistent with the disclosure, the above-mapped broad listening mode has limited, broad mode recognition vocabulary (e.g., “car” and “navigation” making up the key phrase “car navigation”) while the mapped selective listening mode has an extended vocabulary that includes different words and phrases from the broad mode recognition vocabulary (e.g., “temperature down” and “thank you” in some embodiments). Claim 19 depends from claim 18 and is similar in scope to claim 2. We apply the same analysis to claim 19 as with claim 2 previously discussed. The Examiner’s findings regarding dependent claims 3, 6, 20, 23, 27, and 28 are undisputed. See generally App. Br. 4—9. We, therefore, adopt the Examiner’s findings as our own. Final Act. 7—8. Notably, for this embodiment, Tadashi states “the same reference numbers are attached to the same constituents, and explanation will be omitted.” Tadashi | 60. Accordingly, pursuant to our authority under 37 C.F.R. § 41.50(b), we reject claims 1—3, 6, 10, 18—20, 23, 27, and 28 under § 102(b) based on Tadashi (sixth embodiment). II. OBVIOUSNESS REJECTION OVER TADASHI A. The First Embodiment — Figures 1—3 As noted above, Tadashi discloses, when a utterer utters the word “start” (e.g., a keyword), (1) recognition unit 300 detects the phrase at step 12 Appeal 2017-005023 Application 14/382,839 S3 and determines the direction of the sound at step S4, (2) directional detector 201 outputs the arrival direction setting signal to directional controller 203, and (3) only utterances of the certain utterer are emphasized and supplied to its voice recognition apparatus as the processing sound. Tadashi || 40-43, Fig. 3. As such, these steps place Tadashi’s system into “a selective listening mode to limit speech inputs to a specific one of the possible speakers using spatial filtering” as recited. See id. 40-43. Tadashi further discloses, when in this selective listening mode, directional controller 203 outputs the processing sound, produced by adding the input sound and the directional property, in a direction of a certain utterer uttering the keyword at step S5. Id. 142, Fig. 3. By directing sounds to one user over another, Tadashi suggests to one skilled in the art that its voice-control module communicates to the specific utterer and others that the module is in a selective listening mode. As such, one skilled in the art would have recognized another aural technique taught by Tadashi for “communicat[ing] to the specific one of the possible speakers that the voice- control module is in the selective listening mode for the specific one of the possible speakers” as recited in claim 1. We apply a similar analysis to independent claims 10 and 18. The Examiner’s findings regarding dependent claims 3, 6, 20, 23, 27, and 28 are undisputed. See generally App. Br. Br. 4—9. We, therefore, adopt the Examiner’s findings as our own. Final Act. 7—8. Accordingly, pursuant to our authority under 37 C.F.R. § 41.50(b), we reject claims 1, 3, 6, 10, 18, 20, 23, 27, and 28 under § 103(a) based on Tadashi (first embodiment). 13 Appeal 2017-005023 Application 14/382,839 B. The Second Embodiment — Figure 4 Regarding Tadashi’s second embodiment, we adopt the Examiner’s undisputed findings as our own. Final Act. 5—6. Also, above, we discussed that Tadashi does not disclose explicitly or inherently that the user can see the support table and camera movements during the selective listening mode, such that the voice-control module is configured to communicate to the specific user that the voice-control module is in selective listening mode. See Tadashi || 47-49, Fig. 4. Yet, Tadashi at least suggests to one skilled in the art making the movements of the support table and camera visible to the utterer and others during use. See id. Tflf 47—50, Fig. 4. For example, allowing the users in Tadashi to see how the camera is pointed and positioned (see id.) would assist the users in positioning themselves more effectively in front of the camera and improving visibility. Moreover, there are only a finite number of predictable options where to place the disclosed support table and camera in Tadashi (e.g., either within or outside the users’ view), and one would have had good reason to pursue a known option for the above-stated reasons. See KSR Inti Co. v. Teleflex Inc., 550 U.S. 398, 416 (2007). We apply a similar analysis to independent claims 10 and 18. See also Tadashi, Fig. 4. The Examiner’s findings regarding dependent claims 3, 6, 20, 23, 27, and 28 are undisputed. See generally App. Br. 4—9. We, thus, adopt the Examiner’s findings as our own. Final Act. 7—8. Notably, for this embodiments, Tadashi states “the same reference numbers are attached to the same constituents, and explanation will be omitted.” Tadashi 146. 14 Appeal 2017-005023 Application 14/382,839 Accordingly, pursuant to our authority under 37 C.F.R. § 41.50(b), we reject claims 1, 3, 6, 10, 18, 20, 23, 27, and 28 under § 103(a) based on Tadashi (second embodiment). III. OBVIOUSNESS REJECTIONS BASED TADASHI AND AT LEAST ONE OTHER REFERENCE Dependent claims 4, 5, 7—9, 21, 22, 24—26, 29, and 30 are rejected under 35 U.S.C. § 103(a) based on Tadashi and at least one other reference. Final Act. 9—13. These claims are newly rejected over the same grounds and references presented in the Final Action, but applying the above analysis related to Tadashi (i.e., first, second, and sixth embodiments) and the disputed “communicate” limitation. For those findings and conclusions that remain undisputed (see generally App. Br. 4—9), we adopt the Examiner’s findings and conclusions as our own. Final Act. 9—13. DECISION We affirm the Examiner’s rejection of claims 4, 8, 9, 21, and 25—27 under 35 U.S.C. § 112, second paragraph. We reverse the Examiner’s rejections of (1) claims 1—3, 6, 10, 18—20, 23, 27, and 28 under 35 U.S.C. § 102 and (2) claims 4, 5, 7—9, 21, 22, 24—26, 29, and 30 under 35 U.S.C. § 103. We enter new grounds of rejection for claims 1—10 and 18—30. This decision contains a new ground of rejection pursuant to 37 C.F.R. § 41.50(b). Section 41.50(b) provides “[a] new ground of rejection pursuant to this paragraph shall not be considered final for judicial review.” Section 41.50(b) also provides: 15 Appeal 2017-005023 Application 14/382,839 When the Board enters such a non-final decision, the appellant, within two months from the date of the decision, must exercise one of the following two options with respect to the new ground of rejection to avoid termination of the appeal as to the rejected claims: (1) Reopen prosecution. Submit an appropriate amendment of the claims so rejected or new Evidence relating to the claims so rejected, or both, and have the matter reconsidered by the examiner, in which event the prosecution will be remanded to the examiner. The new ground of rejection is binding upon the examiner unless an amendment or new Evidence not previously of Record is made which, in the opinion of the examiner, overcomes the new ground of rejection designated in the decision. Should the examiner reject the claims, appellant may again appeal to the Board pursuant to this subpart. (2) Request rehearing. Request that the proceeding be reheard under § 41.52 by the Board upon the same Record. The request for rehearing must address any new ground of rejection and state with particularity the points believed to have been misapprehended or overlooked in entering the new ground of rejection and also state all other grounds upon which rehearing is sought. Further guidance on responding to a new ground of rejection can be found in the Manual of Patent Examining Procedure § 1214.01. AFFIRMED-IN-PART 37 C.F.R, $ 41.50(b) 16