Ex Parte Aleksic et al

Patent Trial and Appeal BoardAug 29, 2017

13923545 (P.T.A.B. Aug. 29, 2017)

United States Patent and Trademark Office UNITED STATES DEPARTMENT OF COMMERCE United States Patent and Trademark Office Address: COMMISSIONER FOR PATENTS P.O.Box 1450 Alexandria, Virginia 22313-1450 www.uspto.gov APPLICATION NO. FILING DATE FIRST NAMED INVENTOR ATTORNEY DOCKET NO. CONFIRMATION NO. 13/923,545 06/21/2013 Petar Aleksic 16113-4780001 7673 26192 7590 08/31/2017 FISH & RICHARDSON P.C. PO BOX 1022 MINNEAPOLIS, MN 55440-1022 EXAMINER THOMAS-HOMESCU, ANNE L ART UNIT PAPER NUMBER 2659 NOTIFICATION DATE DELIVERY MODE 08/31/2017 ELECTRONIC Please find below and/or attached an Office communication concerning this application or proceeding. The time period for reply, if any, is set in the attached communication. Notice of the Office communication was sent electronically on above-indicated "Notification Date" to the following e-mail address(es): PATDOCTC@fr.com PTOL-90A (Rev. 04/07) UNITED STATES PATENT AND TRADEMARK OFFICE BEFORE THE PATENT TRIAL AND APPEAL BOARD Ex parte PETAR ALEKSIC and XIN LEI Appeal 2016-002183 Application 13/923,5451 Technology Center 2600 Before JUSTIN BUSCH, JAMES W. DEJMEK, and JOYCE CRAIG, Administrative Patent Judges. DEJMEK, Administrative Patent Judge. DECISION ON APPEAL Appellants appeal under 35 U.S.C. § 134(a) from a Final Rejection of claims 1—23 and 25. Appellants have canceled claim 24. App. Br. 16. We have jurisdiction over the remaining pending claims under 35 U.S.C. § 6(b). We affirm. 1 Appellants identify Google Inc. as the real party in interest. App. Br. 1. Appeal 2016-002183 Application 13/923,545 STATEMENT OF THE CASE Introduction Appellants’ disclosed and claimed invention is directed to speech recognition and, more particularly, performing video analysis based language model adaptation. Spec. H 1, 16. In a disclosed embodiment, a wearable computing device configured to perform speech recognition also uses additional inputs, including image data, to provide additional context for the user’s speech. Spec. Tflf 16—19. Claim 1 is representative of the subject matter on appeal and is reproduced below with the disputed limitations emphasized in italics'. 1. A computer-implemented method comprising: receiving audio data obtained by a microphone of a wearable computing device, wherein the audio data encodes an utterance of a user; receiving image data obtained by a camera of the wearable computing device; identifying one or more image features based on the image data; classifying the image data as pertaining to a particular activity, based at least on the one or more image features, wherein the particular activity is unrelated to providing an explicit user input to the wearable computing device', selecting one or more terms associated with a language model used by a speech recognizer to generate transcriptions', adjusting one or more probabilities associated with the language model that correspond to one or more of the selected terms based on the relevance of one or more of the selected terms to the particular activity; and obtaining, as an output of the speech recognizer that uses the adjusted probabilities, a transcription of the user utterance. 2 Appeal 2016-002183 Application 13/923,545 The Examiner’s Rejections 1. Claims 1, 5, and 13 stand rejected under 35 U.S.C. § 112(a) as failing to comply with the written description requirement. Final Act. 5—6. 2. Claims 1—4, 12, 20, 21, and 25 stand rejected under 35 U.S.C. § 103(a) as being unpatentable over Hart et al. (US 8,700,392 Bl; Apr. 15, 2014 (filed Sept. 10, 2010)) (“Hart”); King et al. (US 2011/0043652 Al; Feb. 24, 2011) (“King”); and Thorsen et al. (US 2012/0143605 Al; June 7, 2012) (“Thorsen”).2 Final Act. 6-16; 40-A3. 3. Claims 5—11, 13—19, 22, and 23 stand rejected under 35 U.S.C. § 103(a) as being unpatentable over Hart and Thorsen. Final Act. 17—39. ANALYSIS3 Rejection under 35 U.S.C. § 112(a) Independent claim 1 recites “classifying the image data as pertaining to a particular activity, . . . wherein the particular activity is unrelated to providing an explicit user input to the wearable computing device.” Independent claims 5 and 13 recite commensurate limitations. In rejecting claims 1, 5, and 13 under 35 U.S.C. § 112(a), the Examiner finds the 2 We note the header for the rejection inadvertently recites claim 24 instead of claim 25. Final Act. 6. The body of the rejection, however, correctly identifies claim 25. Final Act. 16. Appellants have not asserted any prejudice due to the Examiner’s typographical error. Accordingly, we treat the error as harmless. 3 Throughout this Decision, we have considered the Appeal Brief, filed June 3, 2015 (“App. Br.”); the Reply Brief, filed December 9, 2015 (“Reply Br.”); the Examiner’s Answer, mailed October 19, 2015 (“Ans.”); and the Final Office Action, mailed December 5, 2014 (“Final Act.”), from which this Appeal is taken. 3 Appeal 2016-002183 Application 13/923,545 Specification “does not make it necessary that the particular activity needs to be unrelated to an explicit user input.” Final Act. 5—6. Instead, the Examiner finds the Specification suggests instances wherein the user input is related to the particular activity. Appellants assert the Specification provides adequate support for the disputed limitation. App. Br. 4 (citing Spec. 119). In particular, Appellants assert the activities described in the Specification, such as driving, running, shopping, or working on a computer are unrelated to providing an explicit user input. App. Br. 4 (citing Spec. 119, Amend., filed Mar. 3, 2015). To satisfy the written description requirement, the disclosure must reasonably convey to skilled artisans that Appellants possessed the claimed invention as of the filing date. See Ariad Pharms., Inc. v. Eli Lilly & Co., 598 F.3d 1336, 1351 (Fed. Cir. 2010) (enbanc). Specifically, the description must “clearly allow persons of ordinary skill in the art to recognize that [the inventor] invented what is claimed” and the test requires an objective inquiry into the four comers of the specification from the perspective of a person of ordinary skill in the art. Based on that inquiry, the specification must describe an invention understandable to that skilled artisan and show that the inventor actually invented the invention claimed. Ariad Pharms., Inc., 598 F.3d at 1351 (internal quotations and citations omitted). In the Specification, Appellants describe various classifiers that may be used to analyze received image and/or other data, and can transmit information classifying the received data to a concept classifier engine. Spec. 118. Figure 1 is illustrative and is reproduced below. 4 Appeal 2016-002183 Application 13/923,545 m l A image Data Image Classifier M Audio Data Motion Date Cither Sensor Oats Motion Classifier ------ ► m Other Classifier m MMMMMMMMMMMMMiUUik Concept Classifier Engine lie Concept Concept General La^u*?e Language Model Mode; Concept ------ ► ’ F Concept ' F Language Model Language Model Language Model 1 Lookup Engine CoFicep! Interpolator jm Terms mi n Concept Concept lf- Terms Fins! Language Mode! A ^^ Speech Knowledge Base Recognition m System ^ FIG. 1 Figure 1 of Appellants’ Specification illustrates an exemplary system for performing video analysis based language model adaptation. Spec. 112. As shown, image data is provided to Image Classifier (102). Image Classifier (102) “can classify the image data as pertaining to a type of location associated with the user, e.g., a beach city, house, store or residence.” Spec. 133. Classification of the image data may be based on performing optical character recognition, feature matching, shape matching or other image processing techniques. Spec. 133. Motion classifier (106) “can classify motion data as pertaining to a type of activity that the user is involved in, e.g., running, driving, walking, or another activity.” Spec. 139. “[MJotion 5 Appeal 2016-002183 Application 13/923,545 data can be classified based on matching the motion data against one or more motion data signatures.” Spec. 140. Paragraph 19 of the Specification discloses that based on the information classifying the image and/or other data, the Concept Classifier Engine (110) can identify one or more concepts. Exemplary concepts include a particular type of location, a particular type of activity in which the user is engaged, identifying particular media in the environment of the user, or identifying other information used to determine the context of the utterance spoken by the user. Spec. 119. Thus, it is the output of the Concept Classifier Engine (110) that provides the identified concept. As set forth in Figure 1 of the Specification, the Concept Classifier Engine (110) may use classification output information from various classifier engines (e.g., Motion Classifier (106) and Image Classifier (102)) in determining the overall concept, which is then provided to the Language Model Lookup Engine (112). Therefore, the Specification provides some support for “classifying image data as pertaining to a particular activity, based at least on the one or more image features,” as broadly recited in claim 1. However, we note the Specification does not recite that a particular activity in which the user is engaged is “unrelated” to explicit user input. Indeed, the term “unrelated” does not appear in the Specification. Thus, on this record, we agree with the Examiner that the Specification lacks the necessary support for “classifying the image data as pertaining to a particular activity, based at least on the one or more image features, wherein the particular activity is unrelated to providing an explicit user input to the wearable computing device,” as claimed. (Emphasis added). Accordingly, 6 Appeal 2016-002183 Application 13/923,545 we sustain the Examiner’s rejection of claims 1, 5, and 13 under 35 U.S.C. § H2(a). Rejections under 35 U.S.C. § 103(a) a. classifying the image data as pertaining to a particular activity Appellants contend the Examiner erred in finding Hart classifies received image data as pertaining to a particular activity. App. Br. 4—7; Reply Br. 1—3. Instead, Appellants argue Hart merely identifies an object, location, or other such element in an image, but does not use the image captured by the user to identity an activity in which the user is engaged. App. Br. 5. For example, Appellants contend if a book is identified in an image captured by Hart, the system (in Hart) does not classify the image as pertaining to a particular activity (e.g., reading a book), but instead limits the classification to the captured object (i.e., a book). App. Br. 5 (citing Hart, col. 3,1. 66-col. 4,1. 16). Hart is generally directed to a computing device that can capture audio data and analyze the data to determine any speech information in the audio data. Hart, Abstract. Further, the computing device of Hart “can simultaneously capture image or video information which can be used to assist in analyzing the audio information.” Hart, Abstract, col. 3,1. 66- col. 4,1. 16. In a disclosed embodiment, an image is captured and an identification of the element is used to assist in determining speech content. Hart, col. 3,1. 66—col. 4,1. 16. For example, if the captured image is of a book, the device may adjust the available device dictionary “to focus on, or narrow to, terms relating to books.” Hart, col. 4,11. 4—10. 7 Appeal 2016-002183 Application 13/923,545 The Examiner finds Hart teaches, or reasonably suggests, classifying image data as pertaining to a particular activity. Final Act. 7—8 (citing Hart, col. 3,1. 66-col. 4,1. 16, col. 15,11. 40-42). The Examiner explains Hart discloses capturing an image of a book and adjusting a dictionary (i.e., language model) based on the image data. Ans. 3. The Examiner notes this is consistent with the description of the image classifier in Appellants’ Specification. Ans. 4 (citing Spec. 133). The Examiner finds Hart performs a similar image analysis and, in the identified example, the image data of a book may pertain to the particular activity of "shopping for a book.'” Ans. 4. An obviousness analysis “need not seek out precise teachings directed to the specific subject matter of the challenged claim, for a court can take account of the inferences and creative steps that a person of ordinary skill in the art would employ.” KSRInt’l Co. v. Teleflex Inc., 550 U.S. 398, 418 (2007). Here, we agree with the Examiner that—particularly given the broad claim language and disclosure of Appellants’ Specification for classifying image data as pertaining to a particular activity (as discussed above)—Hart teaches, or reasonably suggests, classifying image data (e.g., a book) as pertaining to a particular activity (e.g., shopping for a book) based at least on the captured image data. Additionally, we note, similar to Appellants’ Specification, Hart also discloses using other inputs (e.g., location information) to provide additional context in suggesting a particular activity and selecting an appropriate language model (i.e., dictionary). See Hart, col. 4,11. 12—16. Appellants do not persuasively rebut the Examiner’s findings. See Reply Br. 1—2. Accordingly, we are unpersuaded of Examiner error. 8 Appeal 2016-002183 Application 13/923,545 b. adjusting one or more probabilities associated with the language model based on the relevance of a selected term(s) to the particular activity Appellants contend the Examiner erred in finding Hart teaches adjusting one or more probabilities associated with a language model based on the relevance of a selected term to a particular activity. App. Br. 7—8; Reply Br. 3—5. Specifically, Appellants argue Hart teaches identifying an element from image capture data and then removing terms unrelated to the identified element from the device’s dictionary. App. Br. 7—8 (citing Hart, col. 3,1. 66—col. 4,1. 16). Appellants assert that removing terms from a dictionary does not adjust any probabilities associated with the language model. App. Br. 8. Appellants acknowledge the removal of terms from a dictionary (as in Hart) may cause the remaining terms to have a greater probability of being selected, but argue the system of Hart does not adjust any probabilities of the language model (i.e., dictionary), but the altered probabilities are “merely a consequence of removing terms from the dictionary.” Reply Br. 4. As the Examiner explains, removing words from the active dictionary (i.e., language model) of Hart teaches, or reasonably suggests, assigning a weight (i.e., probability) of zero to these words to disable them. Ans. 9-10. The Examiner further finds the remaining words in the dictionary have a greater probability of being selected (because the number of available words has been reduced by removing certain words). Ans. 10. We agree with the Examiner’s findings and reasoning. Further, the Examiner’s findings and explanations are consistent with Appellants’ Specification. “In some instances, some words may be removed from a 9 Appeal 2016-002183 Application 13/923,545 language model based on the one or more concepts, or can otherwise be omitted, e.g., by adjusting the probability associated with a term to zero.” Spec. 1 59. Accordingly, we are unpersuaded the Examiner erred in finding Hart teaches adjusting one or more probabilities associated with the language model, as claimed. c. selecting one or more terms associated with a language model used by a speech recognizer to generate transcription Appellants assert “while the cited portions of Hart disclose a method in which portions of transcribed speech are provided for display, the features of independent claim 1 are directed to selecting ‘terms associated with a language model used by a speech recognizer to generate transcriptions in order to modify probabilities associated with those terms before the speech recognizer is used to generate a transcription.” App. Br. 9 (citing Hart, col. 14,11. 23-28). We are unpersuaded of Examiner error because the claim language does not require modifying (i.e., adjusting) probabilities associated with the language model before the speech recognizer is used to generate a transcription. Accordingly, Appellants’ argument is not commensurate with the scope of claim 1 and, thus, does not demonstrate error in the Examiner’s rejection. See In re Self, 671 F.2d 1344, 1348 (CCPA 1982) (limitations not appearing in the claims cannot be relied upon for patentability). Further, as previously discussed, Hart teaches selecting terms associated with a language model. See, e.g., Hart, col. 4,11. 4—10, col. 17,11. 61—62. Additionally, we agree with the Examiner that “[i]n order for a transcription to be generated[,] a language model must be associated with the speech recognizer in order to generate a transcription.” Ans. 13. 10 Appeal 2016-002183 Application 13/923,545 For the reasons discussed supra (in sections a through c), we are unpersuaded of Examiner error. Accordingly, we sustain the Examiner’s rejection of independent claim 1. For similar reasons, we sustain the Examiner’s rejection of independent claims 5 and 13, which recite similar limitations and were not argued separately. See App. Br. 4—10. Additionally, we sustain the Examiner’s rejections of claims 2-4, 6—12, 14— 23 and 25, which depend therefrom and were not argued separately. See App. Br. 10. DECISION We affirm the Examiner’s decision rejecting claims 1—23 and 25. No time period for taking any subsequent action in connection with this appeal may be extended under 37 C.F.R. § 1.136(a). See 37 C.F.R. § 41.50(f). AFFIRMED 11