Ex Parte Weng et al

Patent Trial and Appeal BoardNov 29, 2012

11266867 (P.T.A.B. Nov. 29, 2012)

UNITED STATES PATENT AND TRADEMARK OFFICE UNITED STATES DEPARTMENT OF COMMERCE United States Patent and Trademark Office Address: COMMISSIONER FOR PATENTS P.O. Box 1450 Alexandria, Virginia 22313-1450 www.uspto.gov APPLICATION NO. FILING DATE FIRST NAMED INVENTOR ATTORNEY DOCKET NO. CONFIRMATION NO. 11/266,867 11/03/2005 Fuliang Weng 11403/63 1691 26646 7590 11/29/2012 KENYON & KENYON LLP ONE BROADWAY NEW YORK, NY 10004 EXAMINER GUERRA-ERAZO, EDGAR X ART UNIT PAPER NUMBER 2659 MAIL DATE DELIVERY MODE 11/29/2012 PAPER Please find below and/or attached an Office communication concerning this application or proceeding. The time period for reply, if any, is set in the attached communication. PTOL-90A (Rev. 04/07) UNITED STATES PATENT AND TRADEMARK OFFICE ____________ BEFORE THE PATENT TRIAL AND APPEAL BOARD ____________ Ex parte FULIANG WENG and LIN ZHAO ____________ Appeal 2010-007042 Application 11/266,867 Technology Center 2600 ____________ Before JOHN A. JEFFERY, DENISE M. POTHIER, and TRENTON A. WARD, Administrative Patent Judges. JEFFERY, Administrative Patent Judge. DECISION ON APPEAL Appellants appeal under 35 U.S.C. § 134(a) from the Examiner’s rejection of claims 1-18. We have jurisdiction under 35 U.S.C. § 6(b). We reverse. STATEMENT OF THE CASE Appellants’ invention constructs a statistical model and incorporates Gaussian priors during feature selection and parameter optimization. See Appeal 2010-007042 Application 11/266,867 2 generally Abstract. Claim 6 is illustrative with key disputed limitations emphasized: 6. A method for modeling spoken language for a conversational dialog system, comprising: modeling dependency relations of the spoken language via a probabilistic dependency model; incorporating Gaussian priors during feature selection and during parameter optimization; parsing a sequence of words, the parsing including systematically searching through pairs of head words bottom-up using a chart parsing technique; and at each step in the search, computing the probabilistic scores for each pair based on the probabilistic dependency model and keeping n best candidate pairs for each region; wherein the dependency model is decomposed into a model for a first sub-region, a second sub-region, and a component which includes a last dependency relation that connects the first and second sub-regions, with an adjustment of mutual information between the last dependency relation and the first and second sub-regions. THE REJECTIONS 1. The Examiner rejected claims 1-10, 17, and 18 under 35 U.S.C. § 103(a) as unpatentable over Andrew McCallum & Wei Li, Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons, 4 PROC. 7TH CONF. ON NAT. LANGUAGE LEARNING 188 (2003) (“McCallum”); Wei Li & Andrew McCallum, Rapid Development of Hindi Named Entity Recognition Using Conditional Random Fields and Feature Induction, 2 ACM TRANSACTIONS ON ASIAN LANGUAGE INFO. PROCESSING 290 (2003) (“Li”); and Stephen Clark & James R. Curran, Parsing the WSJ Using CCG and Log-Linear Appeal 2010-007042 Application 11/266,867 3 Models, 2 PROC. 42D ANN. MEETING ON ASS’N FOR COMPUTATIONAL LINGUISTICS (2004) (“Clark”). Ans. 3-14.1 2. The Examiner rejected claims 11 and 13-16 under 35 U.S.C. § 103(a) as unpatentable over Kuhn (EP 1 079 371 B1; Nov. 2, 2005), McCallum, and Clark. Ans. 14-19. 3. The Examiner rejected claim 12 under 35 U.S.C. § 103(a) as unpatentable over Kuhn, McCallum, and Stephanie Seneff, TINA: A Natural Language System for Spoken Language Applications, 18 COMPUTATIONAL LINGUISTICS 61 (1992) (“Seneff”). Ans. 19-20. THE OBVIOUSNESS REJECTION OVER MCCALLUM, LI, AND CLARK The Examiner finds that McCallum discloses a method for modeling spoken language with every recited feature of independent claim 6 including parsing a word sequence by systematically searching through pairs of head words bottom-up using a chart parsing technique which is said to correspond to calculating original posterior distribution over states given each token by dynamic programming without approximation. Ans. 4-5, 20-21. Although the Examiner acknowledges that McCallum does not (1) compute probabilistic scores for each pair based on a probabilistic dependency model, and (2) keep best candidate pairs for each region, the Examiner cites Li for teaching these features.2 Ans. 5, 21-22. The Examiner also acknowledges 1 Throughout this opinion, we refer to (1) the Final Rejection mailed March 4, 2009 (“Fin. Rej.”); (2) the Appeal Brief filed September 10, 2009 (“App. Br.”); (3) the Examiner’s Answer mailed December 9, 2009 (“Ans.”); and (4) the Reply Brief filed February 9, 2010 (“Reply Br.”). 2 Although the Examiner repeatedly refers to the Li reference as “Lin” (see, e.g., Ans. 5-6, 21-22), we deem this error harmless. Appeal 2010-007042 Application 11/266,867 4 that McCallum and Li do not decompose the dependency model and adjust mutual information between the last dependency relation and first and second sub-regions as claimed, but cites Clark for teaching these features. Ans. 6-7, 22-25. Based on these collective teachings, the Examiner concludes that claim 6 would have been obvious. Ans. 3-7, 20-25. Appellants argue that the cited prior art does not teach or suggest parsing a word sequence by systematically searching through pairs of head words that characterize word sub-regions, let alone (1) compute probabilistic scores for each pair based on a probabilistic dependency model, and (2) keep best candidate pairs for each region. App. Br. 5-7; Reply Br. 2. Appellants add that Clark does not adjust mutual information between the last dependency relation and first and second sub-regions as claimed—an adjustment that is said to involve measuring how much knowing one of two variables reduces uncertainty about the other. App. Br. 7; Reply Br. 3. According to Appellants, the Examiner’s reliance on “maximum entropy” in this regard is misplaced since it merely measures a random variable’s uncertainty and does not refer to the requisite mutual information measurement. Id. ISSUE Under § 103, has the Examiner erred in rejecting claim 6 by finding that McCallum, Li, and Clark collectively would have taught or suggested (1) parsing a word sequence by systematically searching through pairs of head words bottom-up using a chart parsing technique; (2) computing probabilistic scores for each pair based on a probabilistic dependency model and keeping best candidate pairs for each region; and (3) decomposing the Appeal 2010-007042 Application 11/266,867 5 dependency model and adjusting mutual information between the last dependency relation and first and second sub-regions? ANALYSIS We begin by noting an inconsistency in the Examiner’s position and Appellants’ characterization of that position. According to Appellants, the Examiner allegedly admits that McCallum does not parse a word sequence by systematically searching through pairs of head words bottom-up using a chart parsing technique as claimed, and relies on Li for that feature. App. Br. 5-6; Reply Br. 2. But the Examiner’s rejection unambiguously cites McCallum—not Li—for teaching this feature. Ans. 5; Fin. Rej. 3. Rather, Li was cited for teaching, at each step in the search, computing probabilistic scores for each pair based on a probabilistic dependency model, and keeping best candidate pairs for each region. Ans. 5, 21-22; Fin. Rej. 4. Appellants’ arguments regarding Li’s alleged failure to disclose the preceding parsing step that searches through pairs of head words therefore ignores the Examiner’s reliance on McCallum for that feature. Despite this inconsistency, we are nonetheless unconvinced of error in the Examiner’s reliance on Li for teaching the recited computation for each pair of head words and keeping the best candidate pairs for each region as claimed. Ans. 5, 21-22 (citing Li §§ 2-3). Appellants argue that the recited head words characterize word sub-regions according to the Specification, and therefore are different from Li’s word prefixes and suffixes. App. Br. 6 (citing Spec. 19:14-18). Accord Reply Br. 2. This argument, however, is not commensurate with the scope of the term “pairs of head words,” which does not preclude prefixes and suffixes that characterize their associated Appeal 2010-007042 Application 11/266,867 6 sub-regions at least to the extent that they are associated with the beginning and end of words. Nevertheless, we are persuaded of error in the Examiner’s reliance on Clark for teaching adjusting mutual information between the last dependency relation and first and second sub-regions. Ans. 6-7, 22-25. According to the Examiner, Clark’s “maximum entropy supertagger” teaches this feature since “maximum entropy” is purportedly understood as “mutual information of the shared knowledge of two random variables.” Ans. 24-25 (citing Clark §§ 6-7). Appellants, however, disagree with this characterization since “mutual information” is said to be commonly understood to measure information that two random variables share (i.e., it measures how much knowing one of two variables reduces uncertainty about the other). App. Br. 7; Reply Br. 3. Although Appellants do not cite authority to support their position, we nonetheless find that the weight of the evidence on this record favors their position, at least to the extent that “mutual information” and “maximum entropy” are not equivalent as the Examiner seems to suggest. See Ans. 25. First, Appellants describe “maximum entropy” in the context of modeling in the Specification—a term that is distinct from the usage of “mutual information” elsewhere in the Specification. Compare Spec 12:1-25 with Spec. 19:25–20:3. This differentiation suggests that the terms are not equivalent. Second, the terms have recognized meanings in the art that are different. “Mutual information” is an “information theoretic quantity representing the amount of information given by one random variable about another.” COMPREHENSIVE DICTIONARY OF ELECTRICAL ENGINEERING 461 (Phillip A. Laplante ed. 2005). The same dictionary, however, defines Appeal 2010-007042 Application 11/266,867 7 “maximum entropy” as “a procedure that maximizes the entropy of a signal process.” Id. at 427.3 This lexicographic distinction only further undercuts the Examiner’s position that the terms are equivalent. Therefore, even assuming, without deciding, that Clark suggests some sort of adjustment involving maximum entropy between dependency relations and sub-regions as the Examiner seems to suggest (see Ans. 22-25), the Examiner’s position is still problematic due to the recognized distinction between mutual information and maximum entropy. We are therefore persuaded that the Examiner erred in rejecting (1) independent claim 6; (2) independent claim 18 which recites commensurate limitations; and (3) dependent claims 1-5, 7-10, and 17 for similar reasons. Since this issue is dispositive regarding our reversing the rejection of these claims, we need not address Appellants’ other arguments regarding dependent claim 17. App. Br. 8; Reply Br. 4. THE REJECTION OVER KUHN, MCCALLUM, AND CLARK Since the Examiner similarly relies on Clark for teaching adjusting mutual information between the last dependency relation and first and second sub-regions (Ans. 17-18, 27)—a position that we find problematic for the reasons noted above—we reverse the Examiner’s rejection of 3 Accord Anthanasios Papoulis, PROBABILITY, RANDOM VARIABLES, AND STOCHASTIC PROCESSES 500 (1984) (noting that entropy measures uncertainty about the occurrence or nonoccurrence of an event, and that this uncertainty is maximized with a 50% probability that the event will occur). Appeal 2010-007042 Application 11/266,867 8 independent claim 11. We likewise reverse the Examiner’s rejection of dependent claims 13-16 for similar reasons.4 THE REJECTION OVER KUHN, MCCALLUM, AND SENEFF We also reverse the Examiner’s rejection of claim 12 over Kuhn, McCallum, and Seneff. Ans. 19-20. Notably, this claim depends on claim 11 which was rejected over different references, namely Kuhn, McCallum, and Clark as noted above, and is problematic for that reason alone. But even assuming, without deciding, that the Examiner intended to include Clark in this rejection, the Examiner’s rejection would still be erroneous for the reasons noted previously. CONCLUSION The Examiner erred in rejecting claims 1-18 under § 103. ORDER The Examiner’s decision rejecting claims 1-18 is reversed. REVERSED babc 4 Notably, the Examiner omits (1) claim 12 from this rejection from which claims 13-16 depend, and (2) Li from the statement of the rejection despite citing it in the rejection’s discussion. Compare Ans. 14 with Ans. 16. But even if we were to deem these errors harmless, the Examiner’s rejection is still problematic for the reasons noted in the opinion.