Freshub, Ltd.

Patent Trials and Appeals BoardJan 11, 2021

IPR2020-01147 (P.T.A.B. Jan. 11, 2021)

Trials@uspto.gov Paper 10 571-272-7822 Date: January 11, 2021 UNITED STATES PATENT AND TRADEMARK OFFICE _______________ BEFORE THE PATENT TRIAL AND APPEAL BOARD _______________ AMAZON.COM, INC., AMAZON.COM SERVICES LLC (formerly AMAZON DIGITAL SERVICES LLC), PRIME NOW LLC, and WHOLE FOODS MARKET SERVICES, INC., Petitioner, v. FRESHUB, LTD., Patent Owner. _______________ IPR2020-01147 Patent 10,213,810 B2 _______________ Before WILLIAM V. SAINDON, FRANCES L. IPPOLITO, and ERIC C. JESCHKE, Administrative Patent Judges. JESCHKE, Administrative Patent Judge. DECISION Denying Institution of Inter Partes Review 35 U.S.C. § 314 IPR2020-01147 Patent 10,213,810 B2 2 I. BACKGROUND Amazon.com, Inc., Amazon.com Services LLC, Prime Now LLC, and Whole Foods Market Services, Inc. (collectively, “Petitioner”) filed a Petition to institute an inter partes review of claims 1–29 (the “challenged claims”) of U.S. Patent No. 10,213,810 B2 (Ex. 1001, “the ’810 patent”). Paper 1 (“Pet.”). Freshub, Ltd. (“Patent Owner”) filed a Preliminary Response. Paper 6 (“Prelim. Resp.”). With our authorization (Paper 7), Petitioner filed a Preliminary Reply to Patent Owner’s Preliminary Response (Paper 8, “Prelim. Reply”) and Patent Owner filed a Preliminary Sur-reply to Petitioner’s Reply (Paper 9, “Prelim. Sur-reply”). We have authority to determine whether to institute an inter partes review. See 35 U.S.C. § 314 (2018); 37 C.F.R. § 42.4(a) (2019) (“The Board institutes the trial on behalf of the Director.”). Section 314(a) of Title 35 of the United States Code provides that an inter partes review may not be instituted “unless . . . the information presented in the petition . . . shows that there is a reasonable likelihood that the petitioner would prevail with respect to at least 1 of the claims challenged in the petition.” Upon consideration of the evidence and arguments in the Petition (including its supporting testimonial evidence) and the additional briefing, for the reasons below, we determine that the information presented does not show a reasonable likelihood that Petitioner would prevail with respect to at least one of the challenged claims. We thus deny institution of inter partes review. A. Related Proceedings The parties identify an active proceeding in the U.S. District Court for the Western District of Texas (“the Texas District Court”) involving the ’810 IPR2020-01147 Patent 10,213,810 B2 3 patent: Freshub, Inc. v. Amazon.com, Inc., No. 1:19-cv-00885-ADA (W.D. Tex.), filed June 24, 2019 (the “Texas Litigation”). Pet. 2; Paper 3 (Patent Owner’s Mandatory Notices), at 1. The Texas District Court issued a claim construction order on July 6, 2020. See Ex. 2002. The Texas Litigation also involves U.S. Patent No. 10,239,094 B2 (“the ’094 patent”), U.S. Patent No. 9,908,153 B2 (“the ’153 patent”), and U.S. Patent No. 10,232,408 B2 (“the ’408 patent”). Ex. 2003 (complaint in the Texas Litigation). On the same day as the filing of the Petition in this proceeding, Petitioner also filed petitions for inter partes review of (1) claims 1–24 of the ’094 patent, in IPR2020-01144, (2) claims 1–11 of the ’153 patent, in IPR2020-01145, and (3) claims 1–30 of the ’408 patent, in IPR2020-01146. See Amazon.com, Inc. v. Freshub, Ltd., IPR2020-01144, Paper 1 (PTAB June 22, 2020); Amazon.com, Inc. v. Freshub, Ltd., IPR2020-01145, Paper 1 (PTAB June 22, 2020); Amazon.com, Inc. v. Freshub, Ltd., IPR2020-01146, Paper 1 (PTAB June 22, 2020). Concurrently with the issuance of this Decision, the Board denies institution in IPR2020-01144, IPR2020-01145, and IPR2020-01146. B. The ’810 Patent The ’810 patent relates to a system for processing voice orders and presenting lists of items based on users’ verbal orders. Ex. 1001, 2:25–32, 2:59–60. The system uses a voice recording device to record a user’s spoken words regarding, e.g., product descriptions and verbally provided product orders, and create digital files from the recordings. Id. at 8:17–38. IPR2020-01147 Patent 10,213,810 B2 4 The system then uses voice recognition software to translate the digital files into text files. Id. Figure 2, reproduced below, depicts a networked storage system that converts spoken language including spoken orders, into a digital representation. Id. at 2:49–50, 12:11–19. Figure 2 illustrates a networked storage system that converts spoken language into a digital representation. Ex. 1001, 2:49–50, 12:11–19. The networked storage system illustrated in Figure 2 includes a computer system 202 that collects and stores information scanned from items stored in multiple storage units. Id. at 12:11–34. The computer system is coupled to a local scanner 204, a screen 206 (such as, e.g., a touch screen that can receive user inputs via finger and/or pen), and a microphone 203. Id. at 12:11–17. The microphone is coupled to a digitizer that converts spoken language into a digital representation. Id. at 12:17–19. Computer system 202, scanner 204, and screen 206 may be removably mounted to a refrigerator 208, or may be mounted on a wall, stand, or other supporting IPR2020-01147 Patent 10,213,810 B2 5 structure. Id. at 12:21–26. Scanners coupled to computer system 202 are configured to scan other storage units, such as another refrigerator 210 and a cabinet 212. Id. at 12:26–32. Each storage unit may also have its own associated computer system. Id. at 12:34–36. Figure 8 of the ’810 patent, reproduced below, illustrates a method for processing a voice order. Id. at 2:59–60, 13:57–58. Figure 8 illustrates a method for processing a voice order. Ex. 1001, 2:59–60, 13:57–58. As illustrated in Figure 8, the method starts with a user’s verbally provided order (state 802). Id. at 13:57–14:8. To provide the order, the user may press a “record shopping list” control, pursuant to which the system prompts the user, via a display and/or via a spoken instruction, to verbally record a shopping list or speak the order. Id. The user may speak the order into microphone 203 illustrated in Figure 2. Id. at 14:9–10. The system then digitizes and records the spoken order in a file, and transmits the IPR2020-01147 Patent 10,213,810 B2 6 digitized verbal order to a remote system, such as remote system 214 illustrated in Figure 2 (state 804). Id. at 14:10–13. The remote system performs voice recognition on the order in order to interpret the spoken order, and converts the spoken order into text (state 806). Id. at 14:13–16. The remote system may use grammar constrained recognition and/or natural language recognition. Id. at 14:16–18. The remote system then transmits the text version of the order for display to the user so that the user can check if the text version is an accurate interpretation of the spoken order (state 808). Id. at 14:19–21. If the user determines that the order was not correctly translated, the user can provide a corrected order (e.g., via keyboard, or by speaking the order again) to the remote system. Ex. 1001, 14:22–27. The remote system then transmits the translated version of the order (i.e., the text version) to one or more providers (e.g., supermarkets, wholesale establishments, etc.) in order to receive quotes (state 810). Id. at 14:28–31. Upon receiving quotes from potential providers, the remote system transmits the quotes to the user (state 812). Id. at 14:35–37. Thereafter, the user selects a provider and authorizes placement of the order (state 814), and the remote system places the order with the selected provider (state 816). Id. at 14:37–40. C. Challenged Claims Petitioner challenges claims 1–29, of which claims 1 and 17 are independent. Claims 2–16 depend from claim 1, and claims 18–29 depend from claim 17. Independent claims 1 and 17 are reproduced below, with bracketed numbers added: 1. [1.0] A voice processing system comprising: a networks interface; IPR2020-01147 Patent 10,213,810 B2 7 a computer; [1.1] non-transitory memory that stores instructions that when executed by the computer cause the computer to perform operations comprising: [1.2] associate a unique identifier with a remote system configured to receive user spoken words, [1.3] the remote system comprising: a microphone, a wireless network interface, a voice output system, and a digitizer coupled to the microphone, wherein the digitizer is configured to convert spoken words into a digital representation; [1.4] download configuration data to the remote system; [1.5] receive, using the network interface, a digitized order of a user from the remote system; [1.6] translate at least a portion of the digitized order to text; [1.7] use the text, translated from the digitized order, to identify an item corresponding to the text description; [1.8] include the identified item in a set of one or more items associated with the user; [1.9] enable the set of items, including the identified item, to be displayed to the user via a user device different than the remote system; and [1.10] enable the set of items, including the identified item, to be provided to an item provider. IPR2020-01147 Patent 10,213,810 B2 8 Ex. 1001, 14:48–15:7.1 17. [17.0] A computer-implemented method, the method comprising: [17.1] associating a unique identifier with a remote system configured to receive user spoken words, [17.2] the remote system comprising: a microphone, a wireless network interface, a voice output system, and a digitizer coupled to the microphone, wherein the digitizer is configured to convert spoken words into a digital representation; [17.3] downloading configuration data to the remote system; [17.4] receiving over a network at a network interface a digitized order of a user from the remote system; [17.5] receiving, using the network interface, a digitized order of a user from the remote system; [17.6] translating at least a portion of the digitized order to text; [17.7] using the text, translated from the digitized order, to identify an item corresponding to the text description; [17.8] including the identified item in a set of one or more items associated with the user; [17.9] enabling the set of items, including the identified item, to be displayed to the user via a user device different than the remote system; and 1 We adopt Petitioner’s designations for the elements of the challenged claims. See Pet. 18–33, 37–39 (showing numerical designations for the language in claims 1 and 17). We apply these designations throughout this Decision. IPR2020-01147 Patent 10,213,810 B2 9 [17.10] enabling the set of items, including the identified item, to be provided to an item provider. Ex. 1001, 15:66–16:23. D. Asserted Grounds of Unpatentability Petitioner challenges claims 1–29 of the ’810 patent on the following grounds: Claim(s) Challenged 35 U.S.C. § Reference(s)/Basis 1, 12, 13, 15–17, 26, 28, 29 103 Calderone2, Ogasawara3, Sanchez4 2–4, 7–11, 18, 21–25 103 Calderone, Ogasawara, Sanchez, Partovi5 5, 19 103 Calderone, Ogasawara, Sanchez, Kuhn6 6, 20 103 Calderone, Ogasawara, Sanchez, Sichelman7 14, 27 103 Calderone, Ogasawara, Sanchez, Cooper8 2 US 2001/0056350 A1, published December 27, 2001 (Ex. 1003, “Calderone”). 3 US 6,543,052 B1, issued April 1, 2003 (Ex. 1004, “Ogasawara”). 4 US 2002/0194604 A1, published December 19, 2002 (Ex. 1005, “Sanchez”). 5 US 7,376,586 B1, issued May 20, 2008 (Ex. 1006, “Partovi”). 6 US 6,553,345 B1, issued April 22, 2003 (Ex. 1020, “Kuhn”). 7 US 2003/0235282 A1, published December 25, 2003 (Ex. 1008, “Sichelman”). 8 US 6,757,362 B1, issued June 29, 2004 (Ex. 1007, “Cooper”). IPR2020-01147 Patent 10,213,810 B2 10 Petitioner supports its challenges with a declaration from Dr. Dan R. Olsen, Jr. (Ex. 1002, “the Olsen Declaration” or “Olsen Decl.”), who Petitioner has retained as an independent expert (id. ¶¶ 1, 10, 11). II. DISCUSSION A. The Level of Ordinary Skill in the Art The level of ordinary skill in the art is “a prism or lens” through which we view the prior art and the claimed invention. Okajima v. Bourdeau, 261 F.3d 1350, 1355 (Fed. Cir. 2001). The person of ordinary skill in the art is a hypothetical person presumed to have known the relevant art at the time of the invention. In re GPAC Inc., 57 F.3d 1573, 1579 (Fed. Cir. 1995). In determining the level of ordinary skill in the art, we may consider certain factors, including the “type of problems encountered in the art; prior art solutions to those problems; rapidity with which innovations are made; sophistication of the technology; and educational level of active workers in the field.” Id. (internal quotation marks and citation omitted). Petitioner contends that one of ordinary skill in the art at the time of the invention “would have [had] at least a Bachelor-level degree in computer science, computer engineering, electrical engineering, or a related field in computing technology, and two years of experience with automatic speech recognition and natural language understanding, or equivalent education, research experience, or knowledge.” Pet. 4 (citing Olsen Decl. ¶¶ 24–26). Patent Owner does not address or dispute Petitioner’s proposed definition of the level of ordinary skill in the art, which appears consistent with the record at this stage of the proceeding, including the prior art. See GPAC Inc., 57 F.3d at 1579. For purposes of this Decision, we adopt the definition of the level of ordinary skill in the art proposed by Petitioner. IPR2020-01147 Patent 10,213,810 B2 11 B. Claim Construction In inter partes reviews, the Board interprets claim language using the standard described in Phillips v. AWH Corp., 415 F.3d 1303 (Fed. Cir. 2005) (en banc). See 37 C.F.R. § 42.100(b). Under that standard, we generally give claim terms their ordinary and customary meaning, as would be understood by a person of ordinary skill in the art at the time of the invention, in light of the language of the claims, the specification, and the prosecution history. See Phillips, 415 F.3d at 1313–14. Although extrinsic evidence, when available, may also be useful when construing claim terms under this standard, extrinsic evidence should be considered in the context of the intrinsic evidence. See id. at 1317–19. Petitioner proposes (1) adopting the Texas District Court’s construction of “increase recognition accuracy” in claims 5 and 6 and (2) applying “plain and ordinary meanings to the remaining terms in the challenged claims.” Pet. 7. Patent Owner responds by stating that Petitioner “agrees that the plain and ordinary meaning of the terms applies here.” Prelim. Resp. 14. At this stage of the proceeding and based on the current record, we do not discern a need to construe explicitly the term “increase recognition accuracy” or any other claim terms because doing so would have no effect on the analysis below. See Nidec Motor Corp. v. Zhongshan Broad Ocean Motor Co., 868 F.3d 1013, 1017 (Fed. Cir. 2017) (stating that “we need only construe terms ‘that are in controversy, and only to the extent necessary to resolve the controversy’”) (quoting Vivid Techs., Inc. v. Am. Sci. & Eng’g, Inc., 200 F.3d 795, 803 (Fed. Cir. 1999)). IPR2020-01147 Patent 10,213,810 B2 12 C. Asserted Obviousness of Claims 1, 12, 13, 15–17, 26, 28, and 29 Based on Calderone, Ogasawara, and Sanchez Petitioner asserts that claims 1, 12, 13, 15–17, 26, 28, and 29 of the ’810 patent are unpatentable under 35 U.S.C. § 103(a) based on Calderone, Ogasawara, and Sanchez. Pet. 15, 16–40. Patent Owner provides arguments specifically addressing this asserted ground. Prelim. Resp. 16–21. We first summarize aspects of the relied-upon references. 1. Calderone Calderone describes a system and method for voice recognition near a wireline node of a network supporting cable television and/or video delivery. Ex. 1003, Title, Abstract. Calderone’s system provides speech recognition services to a collection of users over a network, user identification based upon the speech recognition over the network, and user identified speech contracting over the network for real-time auctions and contracting. Id. ¶ 39. Spoken commands from a cable subscriber are recognized and acted upon to control the delivery of entertainment and information services, such as Video On Demand (VOD), Pay Per View, Channel control, and on-line shopping. Id. ¶¶ 41, 539. Calderone’s Figure 3, reproduced below, illustrates a system providing speech recognition services. Id. ¶ 60. IPR2020-01147 Patent 10,213,810 B2 13 Figure 3 illustrates a remote control unit 1000 coupled to a set-top apparatus 1100 communicating via a wireline physical transport 1200, a distributor node 1300, and a high speed physical transport 1400, with one or more gateways 3100 and one or more server arrays 3200 of a server farm 3000. Ex. 1003 ¶ 60. Remote control unit 1000 is fitted with a microphone that relays the subscriber’s speech commands to a central speech recognition engine. Id. ¶¶ 110–111. “The analog signs picked up from the microphone are converted to digital signals where they undergo additional processing before being transmitted to the speech recognition and identification engine located in the . . . centralized location.” Id. ¶ 115. Calderone’s Figures 20A and 20B further teach that the set-top apparatus 1100 may include computer IPR2020-01147 Patent 10,213,810 B2 14 1150, remote interface 1130, network interface 1170, and memory 1160. Id. ¶¶ 268–275, Figs. 20A–B. Calderone’s central speech recognition engine may process a multiplicity of received speech channels to create a multiplicity of identified speech content, and then responds to the identified speech content to create an identified speech content response, for each of the multiplicity of the identified speech contents. Ex. 1003 ¶¶ 217–218. Once a complete spoken request has been received, the speech input processor may use a sample’s source address identifying a user site to target the speech data to a specific speech processing processor. Id. ¶ 148. The speech engine determines the most likely spoken request based on statistical analysis, and may return a text string corresponding to the spoken request. Id. ¶ 162. Additionally, Calderone teaches that the speech recognition engine returns a result, and visual text corresponding to the recognized spoken request may be transmitted back to the set-top box. Id. ¶¶ 166–167. Software executing within the set-top box displays the text information. Id. ¶ 167. “By displaying the text of the possible recognition results, the user can easily select from the returned list.” Id. ¶ 168. 2. Ogasawara Ogasawara discloses “an Internet shopping system hosted on a television-set-top-box combination including a remote controller with voice recognition capabilities.” Ex. 1004, 1:6–10, Abstract. The remote control unit includes a keypad and a microphone, and the set-top box includes voice recognition software and bar code recognition software to support the electronic shopping system. Id. Data are input to an Internet shopping Web program accessed through a Web browser associated with the set-top box. IPR2020-01147 Patent 10,213,810 B2 15 Id. Ogasawara’s Figure 1, reproduced below, illustrates an electronic shopping system. Id. at 2:48–51. Figure 1 illustrates an electronic shopping system including a television set, a set-top box, and a remote control unit. Ex. 1004, 2:48–51. The electronic shopping system illustrated in Figure 1 includes set-top box 10, television 12, and remote control unit 14. Id. at 3:54–65. Set-top box 10 receives television signals for performing conventional television reception functions. Id. Remote control unit 14 is in communication with set-top box 10, and includes a keypad for allowing input of keypad data to set-top box 10, and a microphone for capturing voice data from the user. Id. at 4:13–38. The user may thus provide oral commands to the system during Internet shopping, to select purchase items. Id. at 4:28–38, 9:40–42. Ogasawara describes a purchasing process in which voice recognition is performed by converting the voice data to the corresponding character data. The extracted character data is then transferred to the transaction program [downloaded to the set-top box]. . . . The data input process continues until all IPR2020-01147 Patent 10,213,810 B2 16 necessary selections have been made by the user to complete an item selection . . . . The client purchase transaction program in the [set-top box] 10 is in communication with the server purchase transaction program on the Web server 72 [to which the set-top box’s tuner provides an Internet connection]. Upon client selection of an item, the server program retrieves information corresponding to the selected item from a Price Lookup (PLU) Table. In the described embodiment, all merchandise information is maintained in the PLU Table. The PLU Table is, in turn, stored and maintained in the Web server 72 database. Ex. 1004, 9:30–10:5. 3. Sanchez Sanchez describes an interactive television virtual shopping cart that facilitates product purchases in an interactive television system. Ex. 1005, Abstract. Upon presentation of an advertisement, movie, or other television program in a programming stream, an indication such as an icon may be presented to a viewer indicating that product or service information is available. Id. The viewer may select the icon and store the corresponding product or service information in a virtual shopping cart or shopping list. Id. The viewer may also tune to a virtual channel and interact with the virtual shopping cart in order to add, delete, or initiate a purchase of products or services. Id. The viewer’s purchase requests may be conveyed via the Internet. Id. 4. Analysis a. Independent Claims 1 and 17 For independent claims 1 and 17, Petitioner contends that the combination of Calderone, Ogasawara, and Sanchez satisfies each of the limitations. Pet. 16–33, 37–39. Patent Owner argues that neither Calderone IPR2020-01147 Patent 10,213,810 B2 17 nor Ogasawara disclose a “computer” that performs the “translate” steps in elements 1.7 and 17.7 (as identified above). See Prelim. Resp. 16–21. For the reasons below, we determine that the Petition does not show a reasonable likelihood that Petitioner would prevail in demonstrating that claims 1 and 17 would have been obvious based on Calderone, Ogasawara, and Sanchez. (1) Relevant Requirements in Claims 1 and 17 We first discuss certain requirements of claims 1 and 17 relevant to the analysis below. Claim 1 recites a “voice-processing system comprising: a networks interface; a computer; [and] non-transitory memory that stores instructions that when executed by the computer cause the computer to perform” several recited “operations.” Ex. 1001, 14:48–53 (emphasis added). Among the operations recited as performed by “the computer” are (1) to “translate at least a portion of the digitized order to text” in element 1.6 (the “‘translate’ step”) and (2) to “use the text, translated from the digitized order, to identify an item corresponding to the text description” in element 1.7 (the “‘identify’ step”). Id. at 14:64–67. Claim 17 recites a “computer-implemented method” with several recited steps, including essentially the same “translate” step and “identify” step in claim 1, except with the gerund forms of the initial verbs. Compare Ex. 1001, 14:64–67 (claim 1, reciting “translate” and “use”), with id. at 16:14–16 (claim 17, reciting “translating” and “using”).9 Thus, the 9 The parties address the “translate” step in claim 1 and the “identify” step in claim 1 in the same manner as the similar steps in claim 17. See, e.g., Pet. 38 (stating that each of these steps in claim 17 is “identical to that of claim 1” except for the gerund form); Prelim. Resp. 16–21. We will refer to IPR2020-01147 Patent 10,213,810 B2 18 “translate” step and “identify” step in claim 17 are, similar to claim 1, implemented by the “computer.”10 In addition to the “computer,” both claims recite a “remote system” that is involved in certain steps of the “computer.” For example, claim 1 requires the “computer” to “download configuration data to the remote system” and to “receive, using the network interface, a digitized order of a user from the remote system.” Ex. 1001, 14:61–63. Claim 17 includes similar requirements. See id. at 16:8–10. (2) The Contentions as to Claim 1 We turn now to Petitioner’s application of the relied-upon prior art to claim 1, and Patent Owner’s arguments in response. For the “translate” step in claim 1, Petitioner states that “[t]he speech recognition engine of Calderone ‘return[s] a text string corresponding to the spoken request’ that was digitized.” Pet. 28 (quoting Ex. 1003 ¶ 162). Petitioner also states that “Calderone discloses that the ‘recognized text string’ may be displayed as ‘visual text corresponding to the recognized spoken request.’” Id. at 29 (quoting Ex. 1003 ¶¶ 166–167) (citing Olsen Decl. ¶ 107). Because the “computer” performs the “translate” step (as discussed above), Petitioner thus relies on the “speech recognition engine” (or “speech engine”)11 in the limitations at issue in claim 17 as the “translate” step and the “identify” step as well. 10 Although we refer to “the ‘computer,’” we need not and do not take a position as to whether either claim 1 or claim 17 require that only one “computer” perform the recited steps. 11 Based on the disclosures in the cited paragraphs, we agree with Petitioner’s implicit position that Calderone uses at least the terms “speech recognition engine” and “speech engine” interchangeably. See Ex. 1003 IPR2020-01147 Patent 10,213,810 B2 19 Calderone as to the recited “computer.” See also Pet. 20 (discussing element 1.1: “The computer that performs the ‘operations of the speech engine as shown in Fig. 10’ is ‘controlled by a program system including program steps residing in accessibly coupled memory.’”) (citing Ex. 1003 ¶¶ 220, 381); Ex. 1003 ¶ 220 (discussing Figure 10, and stating that “[t]he speech processing system may include at least one computer”), ¶ 381 (“Note that a single computer may perform the operations of the speech engine as shown in FIG. 10.”). For the “identify” step in claim 1, Petitioner provides this discussion: The Calderone system performs this function by responding to “identified speech content to create an identified speech content response.” Ex. 1003, [0217]-[0218]; Ex. 1002, ¶ 108. To respond to VOD, “Pay Per View” (“PPV”), or “on- line shopping” requests (Ex. 1003, [0041]) requires identifying content, i.e. items, corresponding to the text of those requests. Calderone discloses searching for matching “names of movies,” “program names,” and names or descriptions of products in response to user requests for VOD, PPV, and “on-line shopping” services. Id. at [0048], [0137], [0170], [0539]. Ogasawara similarly discloses using “extracted character data” to search for matching “text string[s] giving the brand or trade name of the product and including a generic description of the product.” Ex. 1004, 9:67-10:14. A [person of ordinary skill in the art] would have understood that Calderone and Ogasawara disclose using these matching names or descriptions to identify the actual item corresponding to the text of each user request, whether it is a movie, a program, or other merchandise. Ex. 1003, [0041], [0163], [539] (user can receive the movies/programs requested or obtain “shopping content”); Ex. 1004, 10:17-19 (the user can buy the item identified by the merchandise entry); Ex. 1002, ¶ 108. Thus, Calderone and Ogasawara disclose “use the text, ¶ 162 (referring to a “speech engine”), ¶ 166 (referring to a “speech recognition engine”). IPR2020-01147 Patent 10,213,810 B2 20 translated from the digitized order, to identify an item corresponding to the text description.” Pet. 29–30. Patent Owner argues that the relied-upon combination of Calderone and Ogasawara does not disclose a “computer” that performs the “identify” step of claim 1.12 See Prelim. Resp. 16–21. According to Patent Owner, the ’810 patent requires “that translation and identification both happen on the back-end system,” but “the combination of Calderone and Ogasawara has translation and identification happening in different places, such that they cannot disclose the claimed elements.” Id. at 20–21 (emphasis omitted); see also id. at 17 (arguing that “the process of identifying takes place on the user’s front-end system, not the back-end system as claimed”). Although claim 1 does not recite a requirement as to a “back-end system,” for the reasons above (see supra § II.C.4.a.1), we agree with Patent Owner that the “translate” step and the “identify” step have to be performed by the “computer.” Moreover, on the current record, we agree with Patent Owner that Petitioner has not made a sufficient showing as to why, in the context of the combined system, the “identify” step would be performed in the identified “computer”—i.e., the speech recognition engine of Calderone—as required. We first discuss the nature of the proposed combination of Calderone and Ogasawara as it relates to the “identify” step. Taking into consideration Petitioner’s briefing as a whole, including the discussion of the “identify” 12 Although this ground involves Calderone, Ogasawara, and Sanchez, Petitioner only discusses Calderone and Ogasawara as to the “identify” step. See Pet. 29–30. IPR2020-01147 Patent 10,213,810 B2 21 step (quoted in its entirety above), the initial discussion of the prior art relied upon in this ground (Pet. 16–17), and the additional authorized briefing, Petitioner has failed to clearly explain the precise nature of any proposed combination of Calderone and Ogasawara as to the “identify” step.13 In the Petition, Petitioner appears to assert that Calderone alone discloses the “identify” step and that Ogasawara alone also discloses the “identify” step. See Pet. 29–30. We turn now to the additional briefing. In this proceeding, we authorized a Preliminary Reply by Petitioner and a Preliminary Sur-reply by Patent Owner, with Petitioner “limited to addressing Patent Owner’s discussion” of the factors in the Board’s precedential decision in Apple Inc. v. Fintiv, Inc., IPR2020-00019, Paper 11 at 5–6 (PTAB Mar. 20, 2020), from the Preliminary Response. See Paper 7 at 3. In the Preliminary Reply, Petitioner instead directly addressed—and cited to—the merits portion of the Preliminary Response. See Prelim Reply 5–6 (citing Prelim. Resp. 16, 21) (discussing why “[t]he merits challenges raised by Patent Owner do not withstand scrutiny”). As an initial matter, Petitioner’s discussion of the merits is potentially outside the scope of the authorized preliminary reply, but we need not reach that issue because, for the reasons explained below, even if we consider Petitioner’s discussion, it is not persuasive. See Prelim. Resp. 10–11; Consolidated Trial Practice Guide 74 (Nov. 2019), https://www.uspto.gov/TrialPracticeGuideConsolidated (“Consolidated TPG”) (“Generally, a reply or sur-reply may only respond to arguments raised in the preceding brief. . . . While replies and sur-replies 13 Petitioner does, however, discuss certain aspects of the combination of prior art in the context of other elements, such as, for example, elements 1.1, 1.3, 1.4, and 1.8–1.10. See Pet. 21, 25, 27, 30, 32, 33. IPR2020-01147 Patent 10,213,810 B2 22 can help crystalize issues for decision, a reply or sur-reply that raises a new issue or belatedly presents evidence may not be considered.”); see also Prelim. Sur-reply 6 (arguing that Petitioner’s “discussion of the merits of their Petition is outside the permitted scope of their Preliminary Reply and should be given no weight”). Considering the positions stated by Petitioner in the additional briefing, the precise nature of any proposed combination of Calderone and Ogasawara as to the “identify” step is no clearer. In the Preliminary Reply, Petitioner states: “To the extent Patent Owner’s arguments are based on actions performed on front-end vs. back-end systems, these distinctions are immaterial as [one of ordinary skill in the art] would have known ‘that certain operations or processes could be distributed across different hardware and software components based on known technologies and system architectures at the time.’” Prelim. Reply 5 (quoting Olsen Decl. ¶ 53). As an initial matter, the quoted paragraph from Dr. Olsen’s Declaration was not cited in the Petition, which indicates that the proposed combination implied may be an improper new theory of unpatentability raised for the first time in reply. Consolidated TPG 73 (“Petitioner may not submit new evidence or argument in reply that it could have presented earlier, e.g. to make out a prima facie case of unpatentability.”). In addition, the generic statement by Dr. Olsen—provided in a summary section on the background of the technology (Olsen Decl. pp. 13–23)—provides no indication as to how or whether one of ordinary skill in the art would have combined the specific prior art relied on in this asserted ground. Olsen Decl. ¶ 53. Although the precise nature of the proposed combination is unclear, in light of the overall outcome here, we address below Petitioner’s reliance on IPR2020-01147 Patent 10,213,810 B2 23 each of Calderone and Ogasawara in both the Petition and the Preliminary Reply. We begin with Calderone. Petitioner first highlights paragraphs 217 and 218 of Calderone. See Pet. 29. Although these paragraphs describe processes in the identified “computer”—i.e., the speech recognition engine—as argued by Patent Owner, they do not disclose the “identify” step. See Prelim. Resp. 17. Referring to Figure 10, Calderone explains that the speech recognition engine processes “a multiplicity of the received identified speech channels to create a multiplicity of identified speech content” and “respond[s] to the identified speech content to create an identified speech content response, for each of the multiplicity of the identified speech contents.” Ex. 1003 ¶¶ 217– 218. Petitioner implicitly argues that the disclosed “identified speech content response” satisfies the requirement to “identify an item,” such as a movie or video on demand (see Pet. 29–30), but Petitioner does not sufficiently explain its position. Instead, the record more strongly supports Patent Owner’s view that these paragraphs relate to speech recognition—i.e., translating speech to text as in the “translate” step. See Prelim. Resp. 17; Ex. 1003 ¶¶ 166–167 (“This system provides a mechanism by which the user receives rapid visual feedback regarding the recognition process. Soon after the speech recognition engine has returned a result, visual text corresponding to the recognized spoken request is displayed on the display, e.g. television, screen. This rapid visual feedback may be accomplished by transmitting the recognized text string back to the set-top box.”). Relied-upon paragraphs 41, 48, 137, and 539 in Calderone together indicate that an objective of the system overall is to use speech recognition to, for example, “control the delivery of entertainment and information IPR2020-01147 Patent 10,213,810 B2 24 services, such as Video On Demand, Pay Per View, Channel control, on-line shopping, and the Internet.” See, e.g., Ex. 1003 ¶ 41. But, as argued by Patent Owner, these paragraphs do not address the specific requirements of the “identify” step. See Prelim. Resp. 17 (arguing that these paragraphs “reference speech recognition or accuracy improvement methodologies generally” but that none “suggest[s] identifying a corresponding content item associated with translated speech”). For example, none discloses the use of “text” translated from speech to identify a movie or other content, and none discloses that process taking place in the identified “computer”—i.e., the speech recognition engine. Relied-upon paragraph 163 of Calderone discusses how the “recognition results” of translations from speech to text of different information can have either “low cost” or “high cost.” Ex. 1003 ¶ 163. As an example of “low cost” recognition results, Calderone discloses a “request to display listings for a particular movie,” whereas, as an example of “high cost” recognition results, Calderone discloses a “request to purchase a movie.” Id. To the extent Petitioner relies upon a selection by the user based on either of these results, however, that selection would be on set-top box 1100 or remote 1000 in Calderone (identified as the “remote system” in claim 1), not the speech recognition engine. See Pet. 22–24; Ex. 1003 ¶ 167 (“This rapid visual feedback may be accomplished by transmitting the recognized text string back to the set-top box. Software executing within the set-top box displays the text information in a special window on top or overlaying of the existing application display.”), ¶ 168 (“In cases where the recognition accuracy is particularly poor, and the speech engine returns several possible recognition results, this overlay display capability may be IPR2020-01147 Patent 10,213,810 B2 25 used to help refine the user’s query. By displaying the text of the possible recognition results, the user can easily select from the returned list.”). Moreover, to the extent Petitioner relies on some operation after the selection by the user, Petitioner has not sufficiently shown that process takes place in the identified “computer”—i.e., the speech recognition engine. Relied-upon paragraph 170 of Calderone discloses the use of “similarity searching”—i.e., searching for names of movie titles and actors, “which are only partially matched, or which resemble the recognized phrase, without requiring precise specification of the exact title or name.” Ex. 1003 ¶ 170. Even assuming that this process uses a “text string” returned from the speech recognition engine and then selected by the user (see id. ¶¶ 167–169), Petitioner has not sufficiently shown that this process takes place in the identified “computer”—i.e., the speech recognition engine—rather than another part of the system overall. The statements in paragraph 108 of the Olsen Declaration, cited by Petitioner, essentially just restate the arguments as to the paragraphs of Calderone discussed above, but do not provide further support for those arguments. Compare Pet. 29–30, with Olsen Decl. ¶ 108; see also 37 C.F.R. § 42.65(a) (“Expert testimony that does not disclose the underlying facts or data on which the opinion is based is entitled to little or no weight.”). We turn now to the Preliminary Reply, where Petitioner adds that “Calderone discloses matching translated text to text descriptions of items, e.g., searching a database for content matching movie titles or actor names.” Prelim. Reply 5 (citing Olsen ¶ 104; Ex. 1003 ¶¶ 41, 48, 170). Petitioner states, “[n]aturally, a corresponding item such as a movie or actor is identified (by the back-end system) based on the match.” Id. at 5–6 (citing IPR2020-01147 Patent 10,213,810 B2 26 Olsen ¶ 108; Ex. 1003 ¶¶ 163). For the same reasons discussed above, paragraphs 41, 48, 163, and 170 of Calderone do not provide sufficient support for Petitioner’s position. Moreover, paragraph 104 of the Olsen Declaration relates to element 1.5 rather than the “identify” step, and paragraph 108 (as discussed above) does not provide additional support for Petitioner’s position. Further, even assuming Petitioner is correct that an identification of an “item” such as a movie takes place in “the back-end system,” Petitioner would still have to demonstrate that the same identification takes place in the “computer” identified in the Petition—i.e., the speech recognition engine. For these reasons, at this stage of the proceeding and on the current record, we determine that Petitioner has not made a sufficient showing that Calderone discloses the “identify” step. We turn now to Petitioner’s reliance on aspects of Ogasawara as to the “identify” step. As quoted above, Petitioner argues that Ogasawara “discloses using ‘extracted character data’ to search for matching ‘text string[s] giving the brand or trade name of the product and including a generic description of the product.’” Pet. 29 (quoting Ex. 1004, 9:67– 10:14). Petitioner also contends that one of ordinary skill in the art “would have understood that . . . Ogasawara disclose[s] using these matching names or descriptions to identify the actual item corresponding to the text of each user request, whether it is a movie, a program, or other merchandise.” Id. at 29–30 (citing, inter alia, Ex. 1004, 10:17–19; Olsen Decl. ¶ 108). Patent Owner responds that Ogasawara does not perform the “identify” step because the use of text, translated from voice, to “identify an item” takes place by the user at the set-top box—akin to the “remote system” in claim 1—rather than on the back-end server, which Petitioner relies on as IPR2020-01147 Patent 10,213,810 B2 27 the recited “computer” as to Ogasawara. See Prelim. Resp. 18–19 (arguing that Ogasawara “discloses selections having been made by the user to complete an item selection, meaning that any item identification occurs on the front-end system (specifically, the set-top box), rather than the back-end system” (citing Ex. 1004, 9:62–64)). According to Patent Owner, “[a]fter an item selection is made on the front-end system, the server program on the back-end system searches the PLU Table based on the already-identified item selection to pull additional information.” Id. at 19–20. Petitioner argues in its Preliminary Reply (though not in the Petition) that Ogasawara “discloses identifying a corresponding item (by the back-end system) when it ‘retrieves information corresponding to the selected item from a [PLU] Table’ which maintains ‘all merchandise information.’” Prelim. Reply 6 (quoting Ex. 1004, 10:1–4) (citing Olsen Decl. ¶ 108). As an initial matter, as discussed above, Petitioner’s discussion of the merits is potentially outside the scope of the authorized preliminary reply. Regardless, even if we consider the positions stated by Petitioner in the additional briefing, Petitioner has not sufficiently shown that the back-end server’s retrieval of additional information corresponding to the selected item in Ogasawara meets the requirements of the “identify” step. In the relied-upon embodiment in Ogasawara, the user speaks into a microphone in the remote control, and the voice data is converted to “extracted character data”—i.e., text—that is then transferred to the set-top box. See Ex. 1004, 9:48–51, 4:29–30 (“The remote control unit 14 also includes a microphone 32 for capturing voice data upon an utterance by the user.”), quoted at Pet. 11. After the input of any data, including any text, a user “complete[s] an item selection” using the set-top box interface. See IPR2020-01147 Patent 10,213,810 B2 28 Ex. 1004, 9:62–64. Then, “[u]pon client selection of an item, the server program retrieves information corresponding to the selected item from a Price Lookup (PLU) Table.” Id. at 9:67–10:2. The PLU Table is stored on a web server database, and includes an “entry specific to a particular piece of merchandise.” Id. at 10:4–9. A given merchandise entry “might further include” the “text string giving the brand or trade name of the product and including a generic description of the product” highlighted by Petitioner. Id. at 10:12–14, quoted at Pet. 29. With this understanding of the relied-upon embodiment, we agree with Patent Owner that the lookup operation by the back-end server in Ogasawara does not “use” the identified “text”—i.e., the “extracted character data” (Ex. 1004, 9:50) to “identify an item” as recited in the “identify” step. Instead, the item selection by the user at the set-top box is the process that “use[s]” the “text” to “identify an item.” See Ex. 1004, 9:62–64, cited at Prelim. Resp. 19; see also Prelim. Resp. 20 (“The downloaded transaction program on the set-top box is the only program that receives and uses the extracted character data derived from the user’s voice data.”). In contrast, the record does not support Petitioner’s assertion that Ogasawara “discloses using ‘extracted character data’ to search for matching ‘text string[s] giving the brand or trade name of the product and including a generic description of the product.’” Pet. 29 (emphasis added). As argued by Patent Owner, this characterization conflates two distinct processes in Ogasawara: the use of text to select an item by the user on the set-top box and the later lookup by the back-end server of additional information, which may include the “text string[s] giving the brand or trade IPR2020-01147 Patent 10,213,810 B2 29 name of the product and including a generic description of the product” discussed by Petitioner (Pet. 29). See Prelim. Resp. 20 (arguing that Petitioner “conflates these distinct processes in Ogasawara to purport that the identification occurs on the back-end Web server, instead of the front- end system” (citing Pet. 29)); Ex. 1004, 10:12–14. In other words, contrary to Petitioner’s argument, the “extracted character data” is not used to “search for matching text string[s]” and thereby “identify an item” on the back-end server. Pet. 29 (quotations omitted); see Prelim. Reply 6 (“Ogasawara likewise discloses identifying a corresponding item (by the back-end system) when it ‘retrieves information corresponding to the selected item from a [PLU] Table’ which maintains ‘all merchandise information.’” (quoting Ex. 1004, 10:1–4) (citing Olsen Decl. ¶ 108)). In the Preliminary Reply, Petitioner argues that “Patent Owner is essentially arguing that a server retrieves information about an item from a database without identifying the item.” Prelim. Reply 6 (citing Prelim. Resp. 21). We disagree with Petitioner’s characterization of Patent Owner’s argument. Even assuming that the back-end server must “identify” a specific item (e.g., in the sense of being notified as to a specific item based on data sent from the set-top box (Ex. 1004, 9:67–10:2)) prior to retrieval of the additional information, here, Patent Owner argues that any such notification does not satisfy the “identify” step because the back-end server itself does not “use” the “text” as required. See Prelim. Resp. 20 (“The downloaded transaction program on the set-top box is the only program that receives and uses the extracted character data derived from the user’s voice data.”). For the reasons discussed above, we agree with Patent Owner’s position, as it is supported by Ogasawara. Dr. Olsen’s testimony does not IPR2020-01147 Patent 10,213,810 B2 30 persuade us otherwise, as it essentially restates Petitioner’s arguments as to Ogasawara, and does not provide further support. Compare Pet. 29–30, with Olsen Decl. ¶ 108; 37 C.F.R. § 42.65(a). Moreover, even assuming that the two distinct processes in Ogasawara discussed above—the use of text to select an item by the user on the set-top box and the later lookup by the back-end server of additional information—together satisfy the “identify” step, Petitioner has not adequately explained why, in the context of the combination of Calderone and Ogasawara, these two processes would be performed in the identified “computer,” i.e., the speech recognition engine of Calderone. As discussed above, the precise nature of any proposed combination of Calderone and Ogasawara as to the “identify” step is not clear. For these reasons, at this stage of the proceeding and on the current record, we also determine that Petitioner has not made a sufficient showing that Ogasawara discloses the “identify” step. Accordingly, we determine, based on the current record, that the Petition does not show a reasonable likelihood that Petitioner would prevail in demonstrating that claim 1 would have been obvious based on Calderone, Ogasawara, and Sanchez. (3) The Contentions as to Claim 17 We turn now to Petitioner’s application of the relied-upon prior art to claim 17. For the “computer” performing the “computer-implemented method” recited in the preamble, Petitioner identifies not only the speech recognition engine in Calderone (i.e., the “computer” relied on as to claim 1), but also the “modulator engine” and “content engine.” See Pet. 37 (“The speech processor computer interacts with other modules to prepare speech response content, including a modulator engine and a content engine, i.e., IPR2020-01147 Patent 10,213,810 B2 31 performing a computer-implemented method.” (citing Ex. 1003 ¶¶ 321, 322, 332; Olsen Decl. ¶ 123)). For the “translate” step in claim 17 (in element 17.6), and for the “identify” step in claim 17 (in element 17.7), however, Petitioner relies on the same aspects in the prior art discussed above as to the “translate” step and the “identify” step in claim 1. See Pet. 38 (discussing element 17.6 (citing Pet. 28–29; Olsen ¶ 129) and discussing element 17.7 (citing Pet. 29– 30; Olsen ¶ 130)). For the same reasons discussed above in the context of claim 1, Petitioner has not adequately shown that the speech recognition engine in Calderone performs the “identify” step in claim 17, based on the aspects of the prior art relied upon as to element 1.7 in claim 1. Moreover, although paragraphs 321, 322, and 332 of Calderone, cited in the context of the preamble of claim 17 (Pet. 37), discuss interaction between the speech recognition engine, modulator engine, and content engine, Petitioner has not adequately explained why these three engines would, in the context of the combined system, perform the specific requirements of the “identify step.” Dr. Olsen’s testimony does not persuade us otherwise, as it essentially just restates Petitioner’s arguments, and does not provide further support. Compare Pet. 37, 38, with Olsen Decl. ¶¶ 123, 129, 130; 37 C.F.R. § 42.65(a). For these reasons, we determine, based on the current record, that the Petition does not show a reasonable likelihood that Petitioner would prevail in demonstrating that claims 1 and 17 would have been obvious based on Calderone, Ogasawara, and Sanchez. IPR2020-01147 Patent 10,213,810 B2 32 b. Dependent Claims 12, 13, 15, 16, 26, 28, and 29 Claims 12, 13, 15, and 16 depend from claim 1, and claims 26, 28, and 29 depend from claim 17. This asserted ground as to claims 12, 13, 15, 16, 26, 28, and 29 includes the same deficiency discussed in the prior section addressing claims 1 and 17 (see supra § II.C.4.a). Thus, we determine that the Petition does not show a reasonable likelihood that Petitioner would prevail with respect to the contention that claims 12, 13, 15, 16, 26, 28, and 29 would have been obvious based on Calderone, Ogasawara, and Sanchez. D. Asserted Obviousness of Claims 2–11, 14, 18–25, and 27 Based on Calderone, Ogasawara, Sanchez, and Other Prior Art Claims 2–11 and 14 depend from claim 1, and claims 18–25 and 27 depend from claim 17. Petitioner’s added reliance on Partovi, Kuhn, Sichelman, and Cooper does not remedy the deficiencies discussed above regarding the asserted obviousness as to claims 1 and 17 based on Calderone, Ogasawara, and Sanchez (see supra § II.C.4.a). Thus, we determine that the Petition does not show a reasonable likelihood that Petitioner would prevail with respect to at least one of claims 2–11, 14, 18– 25, and 27 as unpatentable based on Calderone, Ogasawara, Sanchez, and the other relied-upon prior art. III. CONCLUSION For the reasons above, we determine that the Petition does not show a reasonable likelihood that Petitioner would prevail with respect to at least one of challenged claims 1–29 of the ’810 patent. IPR2020-01147 Patent 10,213,810 B2 33 IV. ORDER Accordingly, it is hereby: ORDERED that the Petition is denied as to all challenged claims, and no inter partes review is instituted. IPR2020-01147 Patent 10,213,810 B2 34 FOR PETITIONER: J. David Hadden Saina Shamilov Allen Wang FENWICK & WEST LLP dhadden@fenwick.com sshamilov@fenwick.com allen.wang@fenwick.com FOR PATENT OWNER: James Hannah Jeffrey H. Price Jonathan Caplan KRAMER LEVIN NAFTALIS & FRANKEL LLP jhannah@kramerlevin.com jprice@kramerlevin.com jcaplan@kramerlevin.com