Amazon Technologies, Inc.

Patent Trials and Appeals BoardNov 2, 2021

2020005358 (P.T.A.B. Nov. 2, 2021)

UNITED STATES PATENT AND TRADEMARK OFFICE UNITED STATES DEPARTMENT OF COMMERCE United States Patent and Trademark Office Address: COMMISSIONER FOR PATENTS P.O. Box 1450 Alexandria, Virginia 22313-1450 www.uspto.gov APPLICATION NO. FILING DATE FIRST NAMED INVENTOR ATTORNEY DOCKET NO. CONFIRMATION NO. 14/752,128 06/26/2015 Kenneth John Basye P23108-US 1147 136714 7590 11/02/2021 Pierce Atwood LLP (Attn: Amazon Docketing) Attn: Patent Docketing (Amazon) 100 Summer Street Suite 2250 Boston, MA 02110 EXAMINER KIM, JONATHAN C ART UNIT PAPER NUMBER 2655 NOTIFICATION DATE DELIVERY MODE 11/02/2021 ELECTRONIC Please find below and/or attached an Office communication concerning this application or proceeding. The time period for reply, if any, is set in the attached communication. Notice of the Office communication was sent electronically on above-indicated "Notification Date" to the following e-mail address(es): patent@pierceatwood.com PTOL-90A (Rev. 04/07) UNITED STATES PATENT AND TRADEMARK OFFICE ____________ BEFORE THE PATENT TRIAL AND APPEAL BOARD ____________ Ex parte KENNETH JOHN BASYE, ARTHUR RICHARD TOTH, and WILLIAM FOLWELL BARTON Appeal 2020-005358 Application 14/752,128 Technology Center 2600 ____________ Before RICHARD M. LEBOVITZ, MARC S. HOFF, and JOHNNY A. KUMAR, Administrative Patent Judges. LEBOVITZ, Administrative Patent Judge. DECISION ON APPEAL The Examiner rejected claims 4, 9, 10, 13, 18, 19, 28, 29, and 32–42 under 35 U.S.C. § 103 as obvious. Pursuant to 35 U.S.C. § 134(a), Appellant1 appeals from the Examiner’s decision to reject the claims. We have jurisdiction under 35 U.S.C. § 6(b). We AFFIRM. 1 We use the word “Appellant” to refer to “applicant” as defined in 37 C.F.R. § 1.42. Appellant identifies the real party in interest as Amazon.com, Inc.. Appeal Br. 2. Appeal 2020-005358 Application 14/752,128 2 STATEMENT OF THE CASE Claims 4, 9, 10, 13, 18, 19, 28, 29, and 32–42 stand rejected by the Examiner in the Final Office Action as follows: Claims 4, 9, 10, 13, 18, 19, 28, 29, 32, 33, 35–38, and 40-42 under 35 U.S.C. § 103 as obvious in view of Mengibar (US 2015/0287410 A1, published Oct. 8, 2015) (Mengibar”), Kalinli-Akbacak ((US 2014/0112556 A1, published Apr. 24, 2014) (Kalinli”), and Mozer et al. (US 2013/0183944 A1, published Jul. 18, 2013 (“Mozer”). Final Act. 6 Claims 34 and 39 under 35 U.S.C. § 103 as obvious in view of Mengibar, Kalinli, Mozer, and Jain (US 2006/0085183 A1, published Apr. 20, 2006) (“Jain”). Final Act. 21. Independent claim 4 is copied below (annotated with bracketed numbers for reference to the steps in the claim): 4. A computer-implemented method, comprising: [1] receiving, from a first device, audio data corresponding to an utterance; [2] determining, using at least one trained model, an indication representing paralinguistic features corresponding to the utterance; [3] performing speech processing on the audio data to determine first natural language understanding (NLU) data corresponding to the utterance; [4] determining, from among a plurality of command processors configured to process with respect to NLU data, a first command processor corresponding to the first NLU data; [5] sending, to first the command processor, the first NLU data; [6] sending, to the first command processor, the indication; and [7] causing first data, responsive to the first NLU data and corresponding to the indication, to be sent to the first device for output. Appeal 2020-005358 Application 14/752,128 3 CLAIM INTERPRETATION We begin with claim interpretation. During prosecution, claims are given their broadest reasonable interpretation in light of the Specification. See In re Morris, 127 F.3d 1048, 1054 (Fed. Cir. 1997). The claim is directed to a computer-implemented method. In the first step [1], audio data corresponding to an utterance is received by a first device. In simple terms, a user speaks into a device, which is an utterance, and it is received by the device as audio data. Paralinguistic features of the utterance are determined in step [2] of the claim. The Specification describes paralinguistic features as follows: acoustic features such as speech tone / pitch, rate of change of pitch (first derivative of pitch), speed, prosody / intonation, resonance, energy / volume, hesitation, phrasing, nasality, breath, whether the speech includes a cough, sneeze, laugh or other non-speech articulation (which are commonly ignored by ASR [automatic speech recognition] systems), detected background audio / noises, distance between the user and a device, etc. Spec. ¶ 49. In step [3] of the claim, “speech processing on the audio data to determine first natural language understanding (NLU) data corresponding to the utterance.” The Specification explains that NLU “is a field of computer science, artificial intelligence, and linguistics concerned with enabling computers to derive meaning from text input containing natural language.” Spec. ¶ 13. NLU process determines the meaning behind the text based on the individual words and then implements that meaning. NLU processing 260 interprets a text string to derive an intent or a desired action from the user as well as the Appeal 2020-005358 Application 14/752,128 4 pertinent pieces of information in the text that allow a device (e.g., device 110) to complete that action. Spec. ¶ 30. The claim refers to “NLU data,” but this exact phrase does not appear in the Specification. The Specification, however, describes, with reference to Fig. 2, aspects of speech recognition and NLU processing. Spec. ¶¶ 22–46. The Specification discloses: The output from the NLU processing (which may include tagged text, commands, etc.) may then be sent to a command processor 290, which may be located on a same or separate server 120 as part of system 100. The destination command processor 290 may be determined based on the NLU output. For example, if the NLU output includes a command to play music, the destination command processor 290 may be a music playing application, such as one located on device 110 or in a music playing appliance, configured to execute a music playing command. Spec. ¶ 47. From the disclosure in Spec ¶ 47, we interpret “NLU data” to be the output from the NLU processing, such as a command to execute an application. The command processor receives this data/command in step [5] and then acts on it in step [7]. Neither the claim nor the Specification places any restriction on the type or form of NLU data which is sent to the command processor that enables it to execute the command. The fourth step [4] is “determining, from among a plurality of command processors configured to process with respect to NLU data, a first command processor corresponding to the first NLU data.” The “command processor” is described in the Specification as “a component capable of acting on the utterance.” Spec. ¶ 63. The Specification provides examples of a “command processor”: “a query processor / search Appeal 2020-005358 Application 14/752,128 5 engine, music player, video player, calendaring application, email / messaging application, user interaction controller, personal assistant program, etc.” Id. The Specification explains when the command processor is a music player, the NLU module sends the command processor “text and semantic indicators that the utterance included a request to play music” and then the command processor selects and plays suitable music from the user’s catalog. Spec. ¶ 64. Step [4] is “determining . . . a first command processor corresponding to the first NLU data.” The step also recites that command processor is “configured to process with respect to NLU data.” The Specification explains: The destination command processor 290 may be determined based on the NLU output. For example, if the NLU output includes a command to play music, the destination command processor 290 may be a music playing application, such as one located on device 110 or in a music playing appliance, configured to execute a music playing command. Spec. ¶ 47. Based on the disclosure in the Specification as reproduced above, we interpret the determination of a command processor “corresponding to the first NLU data” which is “configured to process with respect to NLU data” to mean that a command processor is selected that is capable of acting on the utterance whose meaning is determined in step [3] of the claim. For example, when the “NLU data” is a command to play music, a command processor is selected which is configured to receive and act on a command to play music. Appeal 2020-005358 Application 14/752,128 6 Once the command processor is determined in step [4], the NLU data and indication are sent to the command processor in steps [5] and [6]. In step [7], it is recited “causing first data, responsive to the first NLU data and corresponding to the indication, to be sent to the first device for output.” There is no statement in the claim as to what device is “causing first data . . . to be sent to the first device for output.” However, in the context of the claim, we interpret the command processor to be performing this function because step [7] is dependent on the first NLU data received by the command processor. The “first data” is sent to a device for output. Appellant cites paragraphs 63–65 of the Specification for written description support for this limitation. Appeal Br. 7. The latter paragraphs refer to commands for playing music, changing volume, etc. Thus, we interpret the broadly claimed “first data” to include data, such as music data, which is sent to a device for output, such as speaker for output of the music data. REJECTION BASED ON MENGIBAR, KALINLI, AND MOZER The Examiner found that Mengibar discloses steps [1] to [3] of claim 4 of obtaining NLU data and an indication of paralinguistic features of an utterance, and then sending the data and indication to a command processor, which sends it to a device for output as in steps [5] to [7] of the claim. Final Act. 6. The Examiner also cited Kalinli for its disclosure of a method “for recognizing user's emotional state based on paralinguistic features.” Id. at 9– 10. The Examiner further found that Mozer also describes performing speech processing to obtain NLU data as in step [3] of the claim. For step [4] of claim 4, the Examiner found that each of Mengibar and Mozer describes “determining, from among a plurality of command Appeal 2020-005358 Application 14/752,128 7 processors configured to process with respect to NLU data, a first command processor corresponding to the first NLU data.” Final Act. 7, 11. The Examiner found that it would have been obvious to one of ordinary skill to determine a command processor as in step [4] of the claim “to provide a less cumbersome and confusing user interface for operating devices.” Final Act. 13. Appellant contends that Mengibar does not describe step [4] of claim 4. Appeal Br. 10–12. We agree with Appellant that the Examiner did not establish by a preponderance of the evidence that Mengibar describes all the elements of step [4]. However, this deficiency does not undermine the rejection because the Examiner also cited Mozer to meet the claim limitation. We thus turn to Mozer. Appellant argues that Mozer “does not teach any process where it determines a destination of NLU data from among multiple potential destinations.” Appeal Br. 13. Instead, Appellant argues, “Mozer teaches sending a sonic (or ultrasonic or subsonic) to a control box to operate a piece of home equipment.” Id. Appellant further argues that Mozer “does not teach or suggest selecting from command processors configured to process with respect to NLU data.” Id. Appellant states that the control boxes of Mozer are not ‘command processors configured to process with respect to NLU data.’” Appeal Br. 13. Appellant argues “they are control boxes configured to process a sonic (or ultrasonic or subsonic) signal to control a home device. There is no teaching or suggestion that the control boxes of Mozer are configured to process with respect to NLU data.” Id. at 14. We begin with Figure 1 of Mozer which summarizes the process disclosed in it. Figure 1 of Mozer is reproduced below: Appeal 2020-005358 Application 14/752,128 8 Figure 1 of Mozer shows system 100 which “includes a number of control devices, designated as control boxes, or ‘CNTRL BOX,’ 119, 121, 123, 125, and 127. The system 100 can also include a local communication device 103 and a remote device 101.” Mozer ¶ 27. Mozer explains that “control box 119 can be connected to television 105, control box 121 can be connected to music player 107, such as a stereo system. Id. The signal labeled 117 is “sonic, ultrasonic, or subsonic communication signals 117.” Mozer ¶ 34. The process of Mozer is as follows: The remote device 101 sends a voice signal to the location communication device 103. Mozer discloses that device 103 “can perform a voice recognition function to recognize the voice command contained in the voice signal.” Mozer ¶ 30. Mozer also disclose that device 103 “translate[s] the voice command [from remote device 101] into a sonic signal and to transmit[s] that signal to one or more control boxes in the sonic network via sonic, ultrasonic, or subsonic communication signals 117.” Id. ¶ 34. The Examiner found that the control boxes described by Mozer have the same function of the command processors of claim 4. Final Act. 11–12. Appeal 2020-005358 Application 14/752,128 9 With respect to speech processing, Mozer teaches: The remote user operating remote device 101 may also speak complicated commands such as “Turn on the TV in thirty minutes, go to station 24, and record the program.” Because recognition of such complex natural speech may be difficult or beyond the computational capabilities for the speech recognizer in local communication device 103, local communication device 103 can send the unrecognized voice command signal via mobile device 111 to a remote server for recognition and then act upon its response. Similarly, local communication device 103 can send the unrecognized voice command signal via sonic network signals 117, desktop, laptop, or server computer connected to one or more control boxes with the sonic network to process the more complex voice recognition functions. The computer can then rebroadcast using its connected control box the recognize voice command to one or all of the connected control boxes to activate the selected or desired functionality of one or more household devices. Mozer ¶ 43. In other words, the speech processing to determine the NLU data of step [3] of claim 4, can be accomplished in Mozer by communication device 103, by a server, or by other devices, which are capable of “more complex voice recognition functions.” Id. Once the speech processing is completed, Mozer discloses that the local communication device 103 sends “a sonic network control signal via the sonic network to one or more the control boxes 119, 121, 123, 125, and 127” which is “based on the content of the voice command and any preferences that might be associated with the user who issues the command based on the voice pattern.” Mozer ¶ 31. Mozer explains: Once the voice command is processed using the voice recognition functionality, one or more of the control boxes can determine one or more commands to issue in step 540. . . . In action 550, one or more of the control boxes, the local Appeal 2020-005358 Application 14/752,128 10 communication device or the secondary computer, can determine to which control boxes the command will be transmitted. In such embodiments, the command can be transmitted as an audio command in a sonic network communication with a specific network address for the control box connected to the household device the user would like to operate, in step 560. Mozer ¶ 60 (emphasis added). The latter disclosures in Mozer that the signal is sent to a control box “based on the content of the voice command” and that “the local communication device or the secondary computer, can determine to which control boxes the command will be transmitted” correspond to the requirement of step [4] of “determining, from among a plurality of command processors configured to process with respect to NLU data, a first command processor corresponding to the first NLU data” because the command is sent to a control box which is selected from among the group of control boxes based on the content of the command. The command is a sonic signal. Mozer discloses that the “control devices can also include circuitry to perform actions in response to received sonic signals,” such as operating “a power switch to turn-on or turn-off a lamp that is plugged into the control device.” Mozer ¶ 26. Appellant states that a “sonic signal is not NLU data.” Reply Br. 6. Appellant argues: It is simply a sonic signal to instruct a component. There is no teaching in Mozer that the sonic signal is NLU data (a representation of the meaning of the spoken utterance like an indication of an intent of the voice command or the like) as claimed. While the sonic signal of Mozer may be generated in response to receipt of a spoken utterance, Mozer’s sonic signal is not itself NLU data and thus the sonic signal of Mozer cannot be read against the claimed NLU data. Appeal 2020-005358 Application 14/752,128 11 Reply Br. 7. This argument does not persuade us that the Examiner erred. Speech processing is performed by device 103, a server, or other devices with speech processing capability. Mozer ¶¶ 30, 42 (see above discussion). Appellant has not provided any evidence to dispute the Examiner’s finding that such a process is the same as [3] “performing speech processing on the audio data to determine first natural language understanding (NLU) data corresponding to the utterance.” The result of the processing is a sonic signal. Mozer ¶ 30. The sonic signal acts as the command provided to the control box which, in turn, operates the device to which it is connected. Id. ¶ 26 (see supra at 10). The sonic signal is therefore NLU data. As explained in the claim interpretation section, the “NLU data” is the output from the NLU processing. Spec. ¶ 47. It can serve as the command provided to the command processor to operate a device, such as a music player, and cause output from the device, such a music. In Mozer, the “NLU data” is the sonic signal because it is a result of the voice recognition and provides the command that is sent to the control box/command processor to operate the device. The inventor did not restrict the “NLU data” to a specific form or type, but left it open to any type of data creating after NLU processing that could be provided to a command processor to execute the command. See supra at 4. The sonic signal does just that. Appellant’s contention that a sonic signal is not NLU data because it simply instructs a component (Reply Br. 7) ignores the claim language that the NLU data in the claim is sent to the same type of device (the command processor) as the control box in Mozer and does the same type of action in Appeal 2020-005358 Application 14/752,128 12 operating a device, such as music player, to cause output of music data on the device (see step [7] of the claim). Appellant states “[t]here is no teaching or suggestion that the control boxes of Mozer are configured to process with respect to NLU data” as recited in step [4] of the claim. Appeal Br. 13. This argument does not demonstrate Examiner error. We interpreted step [4] in light of the Specification to mean that a command processor is selected that is capable of acting on the utterance whose meaning is determined in step [3] of the claim, such as selecting a music player when the utterance is interpreted as a command to play music. This is the same thing described by Mozer. Mozer ¶ 60 (reproduced in full supra at 10) discloses “the local communication device or the secondary computer, can determine to which control boxes the command will be transmitted.” Appellant states “[t]here is no decision to select some component that can process with respect to NLU data.” Appeal Br. 14. This statement ignores the disclosure in Mozer of “Determine device to which command will be transmitted.” Mozer, Fig. 5, ¶ 60. Mozer explains: In such embodiments, the command can be transmitted as an audio command in a sonic network communication with a specific network address for the control box connected to the household device the user would like to operate, in step 560. Mozer ¶ 60 (also reproduced supra at 10). In other words, if the command is to play music, then it is determined by Mozer which control box controls the music player (see Fig. 1 supra at 8) that the user wants to operate. This step is the same as determining a command processor “configured to process with respect to the NLU data” because it is determined which control box can process the command to play Appeal 2020-005358 Application 14/752,128 13 music. There would be no point in determining what device to send the command to, other than selecting which control box controls the device that the user seeks to operate and which therefore is capable of executing the command associated with the NLU data, such as playing music when the command is to play music. It would be obvious to direct the command to the control box controlling the music because other control boxes and their associated device would be unable to do so (such as the lamp or thermostat shown in Figure 1 of Mozer). Appellant also argues that “the combination would not result in the claimed invention.” Appeal Br. 15. Appellant states the Examiner did not provide motivation to combine the references. Id. The Examiner found that all steps of the claim are substantially described in Mengibar, and gave a reason to apply Mozer to Mengibar. Final Act. 6–13. Appellant makes the unsupported statement that the Examiner did not find all the elements of the claim in the cited references or provide a reason to combine the references, when the Examiner clearly articulated where each element of the claims are found and then provided reason to combine them. Final Act. 10, 14. For the foregoing reasons, the rejection of claim 4 is affirmed. Claim 13 recites substantially the same limitations as claim 4 and is affirmed for the same reasons. Claims 9, 10, 18, 19, 28, 29, and 32–42 are not argued separately and fall with claims 4 and 13. Appeal 2020-005358 Application 14/752,128 14 CONCLUSION In summary: Claims Rejected 35 U.S.C. § Reference(s)/Basis Affirmed Reversed 4, 9, 10, 13, 18, 19, 28, 29, 32, 33, 35–38, 40– 42 103 Mengibar, Kalinli, Mozer 4, 9, 10, 13, 18, 19, 28, 29, 32, 33, 35–38, 40– 42 34, 39 103 Mengibar, Kalinli, Mozer, Jain 34, 39 Overall Outcome 4, 9, 10, 13, 18, 19, 28, 29, 32–42 TIME PERIOD No time period for taking any subsequent action in connection with this appeal may be extended under 37 C.F.R. § 1.136(a)(1)(iv). AFFIRMED