Ex Parte Bagga et alDownload PDFPatent Trial and Appeal BoardNov 23, 201210153550 (P.T.A.B. Nov. 23, 2012) Copy Citation UNITED STATES PATENT AND TRADEMARK OFFICE UNITED STATES DEPARTMENT OF COMMERCE United States Patent and Trademark Office Address: COMMISSIONER FOR PATENTS P.O. Box 1450 Alexandria, Virginia 22313-1450 www.uspto.gov APPLICATION NO. FILING DATE FIRST NAMED INVENTOR ATTORNEY DOCKET NO. CONFIRMATION NO. 10/153,550 05/21/2002 Amit Bagga 501020-A-US-NP(CM) 8532 95158 7590 11/23/2012 Novak Druce + Quigg LLP - Avaya Inc. Laurian Building 2810 Laurian Lane, Suite 200 Dunkirk, MD 20754 EXAMINER AN, SHAWN S ART UNIT PAPER NUMBER 2483 MAIL DATE DELIVERY MODE 11/23/2012 PAPER Please find below and/or attached an Office communication concerning this application or proceeding. The time period for reply, if any, is set in the attached communication. PTOL-90A (Rev. 04/07) UNITED STATES PATENT AND TRADEMARK OFFICE ____________ BEFORE THE PATENT TRIAL AND APPEAL BOARD ____________ Ex parte AMIT BAGGA, JIANYING HU, and JIALIN ZHONG ____________ Appeal 2011-003826 Application 10/153,550 Technology Center 2400 ____________ Before THU A. DANG, JAMES R. HUGHES, and GREGORY J. GONSALVES, Administrative Patent Judges. GONSALVES, Administrative Patent Judge. DECISION ON APPEAL Appeal 2011-003826 Application 10/153,550 2 STATEMENT OF THE CASE Appellants appeal under 35 U.S.C. § 134(a) from the final rejection of claims 1-4, 7-14, and 17-31 (App. Br. 2, 6-7). Claims 5, 6, 15, and 16 were objected to as being dependent upon a rejected base claim (Ans. 4, 13). We have jurisdiction under 35 U.S.C. § 6(b). We affirm. The Invention Exemplary Claim 1 follows: 1. A method, comprising: determining two or more combined similarity measures by using two or more text similarities of two or more audio- video scenes and two or more video similarities of the two or more audio-video scenes; and determining whether there are similar audio-video scenes in the two or more audio-video scenes by using the two or more combined similarity measures. Claim 30 stands rejected under 35 U.S.C. § 101 as being directed to non-statutory subject matter (Ans. 4). Claim 21 stands rejected under 35 U.S.C. § 112, second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which applicant regards as the invention (Ans. 4-5). Claims 1-4, 7-14, 19, 30, and 31 stand rejected under 35 U.S.C. § 103(a) as being unpatentable over Rui (Yong Rui et al., Constructing Table-of-Content for Videos, ACM Multimedia Systems Journal 1998) in view of Hershtik (U.S. Patent Number 5,790,236) and Dimitrova (U.S. Patent Number 6,100,941) (Ans. 5-7). Appeal 2011-003826 Application 10/153,550 3 Claims 17 and 18 stand rejected under 35 U.S.C. § 103(a) as being unpatentable over Rui, Hershtik, Dimitrova, and Ganapathy (U.S. Patent No. 6,411,953 B1) (Ans. 7). Claim 20 stands rejected under 35 U.S.C. § 103(a) as being unpatentable over Rui, Hershtik, Dimitrova, and Girgensohn (U.S. Patent No. 6,807,306 B1) (Ans. 8). Claim 21 stands rejected under 35 U.S.C. § 103(a) as being unpatentable over Rui, Hershtik, Dimitrova, Girgensohn, and Wilcox (U.S. Patent No. 5,889,523) (Ans. 8-9). Claim 22 stands rejected under 35 U.S.C. § 103(a) as being unpatentable over Rui in view of Wang (U.S. Patent No. 5,802,361) (Ans. 9- 10). Claims 23-28 stand rejected under 35 U.S.C. § 103(a) as being unpatentable over Rui, Wang, Girgensohn, and Aalbersberg (U.S. Patent No. 5,293,552) (Ans. 10-12). Claim 29 stands rejected under 35 U.S.C. § 103(a) as being unpatentable over Rui, Wang, Girgensohn, and Aalbersberg, and Wilcox (Ans. 12-13). FACTUAL FINDINGS We adopt the Examiner’s factual findings as set forth in the Answer (Ans. 3, et seq.). ISSUES Appellants’ responses to the Examiner’s positions present the following issues: Appeal 2011-003826 Application 10/153,550 4 1. Did the Examiner err in concluding that the combination of Rui, Hershtik, and Dimitrova teaches or would have suggested “determining two or more combined similarity measures by using two or more text similarities of two or more audio-video scenes and two or more video similarities of the two or more audio-video scenes” (emphasis added), as recited in claim 1, and as similarly recited in independent claims 30 and 31? 2. Did the Examiner err in concluding that the combination of Rui and Wang teaches or would have suggested “combining a first metric and a second metric to create a third metric, wherein the first metric is based on comparisons of text segments from at least two audio-video scenes, and wherein the second metric is based on comparisons of video segments from the at least two audio-video scenes” (emphasis added), as recited in independent claim 22? 3. Did the Examiner err in concluding that the combination of Rui, Wang, Girgensohn, and Aalbersberg teaches or would have suggested “determining a plurality of text similarities by comparing text portions from the audio-video scenes,” “determining a plurality of video similarities by comparing video portions from the audio-video scene,” “determining a plurality of combined similarities by using the normalized text and video similarities;” and “clustering the audio-video scenes into a plurality of clusters by using the combined similarities” (emphasis added), as recited in independent claim 23? Appeal 2011-003826 Application 10/153,550 5 ANALYSIS Issue 1 – Obviousness Rejections of Claims 1, 30, and 31 Appellants contend that the Examiner erred in rejecting independent claims 1, 30, and 31 as obvious because the combination of Rui, Hershtik, and Dimitrova does not teach or suggest the claim limitation emphasized above (App. Br. 7-8). In support of their contention, Appellants argue that Dimitrova, instead of comparing text from two audio-video scenes, “compares the text 120 to one or more predefined brand and product names” (id. at 8). Appellants also argue that Hershtik does not suggest “determining two or more combined similarity measures by using two or more text similarities of two or more audio-video scenes” (id. at 9). The Examiner noted, however, “that the claimed limitations did not specify determining at least two similarity measures by using at least two text similarities of audio-video scenes of each other” (Ans. 15) (emphasis omitted). Next, the Examiner concluded that Dimitrova “teaches determining at least two similarity measures (Fig. 11, 122, comparator) by using at least two text (120) similarities of audio-visual scenes” (id.) (emphasis omitted). The Examiner also concluded that Hershtik teaches “determining at least two combined similarity measures by using at least two audio differences of at least two audio visual scenes” (id. at 16) (emphasis omitted). We agree with the Examiner’s conclusions and underlying findings of fact. As noted by the Examiner, claim 1 does not require comparing text from two audio-video scenes and instead, merely recites determining similarity measures “by using two or more text similarities of two or more audio-video scenes.” Dimitrova teaches determining similarity measures Appeal 2011-003826 Application 10/153,550 6 using text similarities from audio-visual scenes by disclosing the conversion of audio and video to text and the comparison of that text to brand and product names: Audio processor 100 will send the signal through a speech recognition processor 112 which 25 will convert the sound into text 120. Similarly, video processor 114 will send input 52 through a segmentation and OCR device 118 which will convert the video into text 120. Closed captioning processor 116 will also produce text 120. Text 120 is then compared to a brand and product name bank 124 with a comparator 122. (Dimitrova, col. 18, ll. 22-30 (emphasis omitted)). Moreover, Hershtik teaches determining similarity measures using audio similarities from audio- visual scenes by disclosing the “comparison of a plurality of versions of a movie which differ at least in their audio channels” (Abstract). Finally, Rui suggests incorporating visual, audio, and text features into the same analysis framework (§ 6). Accordingly, we find that the claim limitation of determining two or more similarity measures using text and video similarities is a combination of Dimitrova’s teachings of text comparison, Hershtik’s teachings of audio comparison and Rui’s teaching of incorporating audio and text features into the same framework that yields predictable results. KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 416 (2007). Accordingly, we find no error in the Examiner’s rejections of independent claims 1, 30 and 31. Issue 2 – Obviousness Rejection of Claim 22 Appellants contend that the Examiner erred in rejecting independent claim 22 as obvious because the combination of Rui and Wang does not teach or suggest the claim limitation from claim 22 emphasized above (App. Appeal 2011-003826 Application 10/153,550 7 Br. 9-11). In particular, Appellants argue that the Examiner erred in asserting that “Wang teaches determining a first metric being based on comparison of text segments from at least two audio-video scenes and that Wang teaches determining a plurality of text similarities by comparing text portions from the video scene” (id. at 10). The Examiner concluded, however, that Wang “teaches determining a first metric being based on comparison of text segments from at least two audio-video scenes” (Ans. 16) (emphasis and citation omitted). The Examiner also found that Rui “discloses a second metric being based on comparison of video segments from at least two video scenes” (id.) (emphasis omitted). We agree with the Examiner’s conclusions and underlying findings of fact. Wang illustrates text segments associated with an audio-video scene including color, location and texture (FIG. 9). Wang also discloses that “a] low level analyzer 121 produces for each image, whether an individual image or part of video sequence, a number of side information files 115 each containing specific image data” (col. 8, ll. 21-24). Wang further discloses that a “high level analyzer 123 then analyzes 209 the side information files 115 to identify those images in the image database 113 that are most similar to the input image attributes” (col. 8, ll. 54-57). Rui discloses computing “the similarities between the current [video] shot and existing scenes” (§ 3.4, Procedure 4). Rui also suggests incorporating visual, audio, and text features into the same analysis framework (§ 6). Accordingly, we find that the claim limitation of combining a first metric based on text comparisons and a second metric based on video comparisons is a combination of Wang’s teachings of text comparison, Rui’s Appeal 2011-003826 Application 10/153,550 8 teachings of video comparison and Rui’s teaching of incorporating video and text features into the same framework that yields predictable results. KSR, 550 U.S. at 416. Accordingly, we find no error in the Examiner’s rejection of independent claim 22. Issue 3 – Obviousness Rejection of Claim 23 Appellants contend that the Examiner erred in rejecting independent claim 23 as obvious because the combination of Rui and Wang does not teach or suggest the claim limitation from claim 23 emphasized above (App. Br. 11-12). The Examiner, however, did not base the rejection of claim 23 solely on Rui and Wang. Rather, the Examiner also included Girgensohn and Aalbersberg in the basis for the rejection (Ans. 10-11). In particular, the Examiner concluded that Girgensohn “teaches a method of selecting keyframes involving clustering all candidate frames (video scenes) into a hierarchical binary tree using a clustering algorithm (abs.), and Aalbersberg teaches normalizing the text similarities” (Ans. 17) (citation omitted). We agree with the Examiner’s conclusions and underlying findings of fact. Girgensohn discloses “clustering all candidate frames into a hierarchical binary tree using a hierarchical agglomerative clustering algorithm” (Abstract). In addition, Aalbersberg discloses that “the document term weight vector and the query term weight vector are lengthwise normalized” (col. 4, ll. 48-49). Moreover, as explained supra, the Rui and Wang teach the combination of a text and video similarity metric. Accordingly, we find that the claim limitations of clustering audio- video scenes using combined, normalized text and video similarities is a combination of known features from Rui, Wang, Girgensohn, and Appeal 2011-003826 Application 10/153,550 9 Aalbersberg that yields predictable results. KSR, 550 U.S. at 416. Accordingly, we find no error in the Examiner’s rejection of independent claim 23. We also find no error in the Examiner’s obviousness rejection of the dependent claims on appeal (i.e., claims 2-4, 7-14, 17-21, and 24-29) because Appellants did not set forth any separate and distinct arguments for those claims in response to the Examiner’s conclusions and underlying findings of fact (see App. Br. 15-22). Rather, Appellants merely restated the claim limitations and presented conclusory statements that they could not find any suggestion of the limitations in the prior art (id.) We also sustain the Examiner’s non-statutory subject matter rejection of claim 30 and indefinite rejection of claim 21 pro forma because Appellants did not set forth any argument relating to these rejections (see id. at 7). DECISION We affirm the Examiner’s decision rejecting claims 1-4, 7-14, and 17- 31 as unpatentable under 35 U.S.C. § 103(a). We also affirm the Examiner’s rejections of claim 30 under 35 U.S.C. § 101 as being directed to non- statutory subject matter and claim 21 under 35 U.S.C. § 112, second paragraph, as being indefinite. No time period for taking any subsequent action in connection with this appeal may be extended under 37 C.F.R. § 1.136(a)(1)(iv). AFFIRMED llw Copy with citationCopy as parenthetical citation