Looking beyond ST.26: Is it time for patent offices to enter the bioinformatic age?

7th April 2023

http://ipkitten.blogspot.com/2023/04/looking-beyond-st26-is-it-time-for.html

There has been much discussion regarding the implementation of the new sequence listing requirement for patent applications, ST.26 (IPKat). This Kat proposes taking a step back from the legal minutiae of ST.26 implementation to ask a more fundamental question. In a world in which incalculable amounts of sophisticated sequence data is freely available, are the clunky processes necessary to input patent sequence data really fit-for-purpose?

The dual-purpose of patent sequence listings

All patent applications containing sequence data in the claims, figures or description are required to submit a sequence listing. The sequence listing provides the sequences, together with a unique sequence identification number (SEQ ID NO:) in a prescribed format and additional data such as the organism and the position of any unusual features in the sequence.

The sequence listing serves two purposes. First, the sequence listing is used by the patent office to search for the sequences disclosed in the patent application. The prescribed format of the sequence listing assists in the automation of this search. Importantly, all the sequences in the patent application must be included in the sequence listing, regardless of whether they are part of the invention or just relate to tool compounds used in the examples. If a sequence is disclosed in the specification, the sequence must be included in the sequence listing (with a few exceptions, such as very short sequences). This allows the patent office to search for all of the disclosed sequences in the application.

Sequence data

The second function of the sequence listing is to facilitate public access to the sequence information disclosed in patent applications. There is an understandable desire for the sequence data in patents, which may not be published elsewhere, to be searchable in public databases of sequence information such as GenBank. One of the purposes of the shift to ST.26 sequence listing format was to facilitate better integration between patent sequence data and these public databases of sequences.

Introduction of ST.26

From 1 July 2022, the old international standard for sequence listings, ST.25, was replaced by the new ST.26 standard (IPKat). The shift to ST.26, and the introduction of the new WIPO software for preparing ST.26 sequence listings had the aim of increasing public access to patent sequences. The process of preparing ST.26 sequence listings has some improvements over ST.25, but also some new disadvantages. One of the main issues with ST.26 is that the sequence listing is no longer submitted in a human-readable txt format, but instead as a complex XML file. In an effort to address this problem, WIPO has now introduced the ability to visualise ST.26 sequence listings in an internet browser directly from patentscope, so that users no longer have to download the XML sequence listing and import it into WIPO sequence.

Further information about ST.26 and WIPO sequence can be found at the WIPO Sequence and ST.26 Knowledge Base. Users can also subscribe to the WIPO sequence listing newsletter.

The risks of errors in patent sequence data

Sequence information is often the most important part of a patent. Many inventions in the biotech field will be defined in the claims by their sequences, and usually by a SEQ ID for that sequence, as provided in the sequence listing. Therefore, if the sequence listing is incorrect even by a single letter, then the claims of the patent may define completely different subject matter to what the applicant intended to claim. In some cases, this could mean that the patent does not cover the commercial embodiment of an invention.

The decision in T 1213/05 demonstrates the potentially fatal consequences of there being errors in a patent’s sequence data. In this case, the sequences provided in the priority document contained inadvertent sequence errors. The Board of Appeal found that a priority claim for the corrected sequences in the European patent was therefore invalid. The Board of Appeal cited with agreement the reasoning in T 70/05 that a priority claim to an incorrect sequence cannot be maintained, regardless of the reasons for the possible mistakes, either arising from unintended sequencing or typing errors.

The Board of Appeal T 1213/05 also rejected the patentee’s arguments that the skilled person’s knowledge of a certain margin of error in sequence data permitted there to be some deviation between the sequence in the priority document and the patent application claiming priority. For the Board of Appeal, the DNA sequences had to be identical to relate to “the same invention” and permit a valid priority claim. Claims directed to the corrected sequences were consequently found invalid in view of intervening prior art disclosing the sequences.

The decision in T 1213/05 shows that anything less than 100% accuracy in the sequence data can be fatal for a patent. In this context, it is worth bearing in mind that sequence data consists of a list of many strings of letters, where each string (“sequence”) can be thousands of letters long. Making a mistake in just one letter out of the potentially millions of letters in a sequence listing can be both at once very easy to do and almost impossible to detect. The burden to applicants in preparing sequence data for a patent application therefore does not just include the time and cost associated with preparing the sequence listing. In order to avoid potentially fatal errors in the sequence data, robust procedures for checking and validating the sequencing listing are also necessary. However, the only way to effectively minimize errors in sequence listings for which there may be hundreds or even thousands of sequences, is to automate the process.

Automated processes for dealing with sequences – Lessons to be learnt from bioinformatics?

With the arrival of high throughput sequencing, it became necessary for the academic community to devise automated processes for dealing with vast quantities of sequence data and for uploading these sequences to public sequence databases. Compared with the automation tools used by bioinformaticians, the process of preparing and validating sequence listings for patent applications is exceedingly clunky.

In order to prepare a ST.26 sequence listing it is necessary to input each sequence and its features into the purpose-built WIPO sequence tool. Unlike ST.25, it is possible to import multiple sequences for your ST.26 sequence listing at once, e.g. in FASTA format, instead of copying and pasting each individual sequence. However, the “features” of each sequence in a sequence listing, such as unusual amino acids, still have to be inputted manually to WIPO sequence. The manual process of adding features can take an extraordinary amount of time. The growth in next generation oligonucleotide technologies also means that there is an increasing amount of “unusual” sequence information that must be inputted as features of the sequence.

In contrast to WIPO sequence, public databases of sequence have purpose-built submission tools that facilitate the upload of vast quantities of annotated sequence information with little manual input. The submission tools for GenBank (e.g. BankIt), for example, allows automated input of sequence information in a format that includes feature information in the form of a 5-column Feature table.

There is thus a huge disconnect between the automation tools necessary for high-throughput processing of sequence data in academia, and the clunky tools available for preparing patent sequence listing, despite the similar aims of both processes. Aligning patent sequence data submission with the automated processes for submitting sequences to publicly available sequence data would facilitate access to patent sequence data whilst simultaneously improving the process of sequence submission for applicants.

Final thoughts

It seems to this Kat that the dual-function of the sequence listing, first as a search tool of the patent office and second as a tool for increasing accessibility to patent sequence information, has resulted in a prescriptive sequence listing format that does not satisfactorily fulfill either purpose. Applicants are forced to submit lengthy sequence listings, in which only a small fraction of the sequences actually relate to the invention, using a manual process for inputting feature data that creates a substantial risk of sequence errors. Given that the tools for automating sequence submission to public sequence databases already exist, it seems to this Kat that a radical rethinking of how patent sequence data is called-for. In the bioinformatic age, the present situation by which patent applicants are forced to manually input sequence data would be almost comical, if it didn’t have such potentially dire consequences for the accuracy of patent sequence data.

Further reading

ST. 26 sequence listings: A forward or backward step for ease of access to patent sequence data? (20 Feb 2022)

EPO under fire for its approach to ST.26 sequence listings (1 Aug 2022)

EPO responds to criticism over ST.26 implementation (9 Aug 2022)

Content reproduced from The IPKat as permitted under the Creative Commons Licence (UK).