Title: Efficient molecular encoding in multifunctional self-immolative urethanes
Authors: Samuel D. Dahlhauser et al.
Journal: Cell Reports Physical Science
Computers store data in binary code, represented by strings of 0’s and 1’s. Meanwhile, living systems encode genes in quaternary code: they use four nucleobases to store the code to synthesize proteins. A more efficient form of data storage might use more symbols to encode equal quantities of data in fewer characters. What would a hexadecimal (based on 16 characters) data storage method look like? In this paper, the Anslyn group at UT Austin explored using oligourethanes, which are biologically inspired polymers, as a hexadecimal system for data storage.
Oligourethanes are polymers similar to peptides; they have a backbone that includes an amide group and can have variable functional groups attached to the alpha carbon. However, instead of amino acids, their backbone consists of urethane molecules, which have an oxygen attached to the amide group. As a result, the backbone of oligourethanes are longer and each chain has a terminal alcohol group (Figure 1). They have several characteristics that make them appropriate for data storage. First, each monomer is added to the polymer chain with the addition of a β-amino alcohol. These compounds are highly variable and inexpensive to obtain. As a result, it is relatively easy to find characters for the code. Second, they can easily be synthesized in a sequence-specific manner using solid-phase synthesis, a technique also used to produce peptides. In this method, the growing polymer chain remains attached to a resin as more units are added, so it’s easy to automate the steps and generate several sequence-defined chains in parallel. Thus, the data being encoded could be “written” easily. Third, in an earlier paper, the Anslyn group published a method to easily sequence the polymers by mass spectrometry, thus enabling a straightforward way to “read” the code1.
To sequence the polymers with that simple method, the group took advantage of the fact that the terminal unit of the oligourethanes has an alcohol group. In basic conditions, the alcohol group can attack the nearest amide group, thus catalyzing an intramolecular cyclization and the release of the terminal unit in a process that the paper refers to as “self-immolation” (Figure 2). The terminal unit’s mass can be measured via mass spectrometry, which can be used to determine the identity of the monomer that got released in the self-immolation reaction. This reaction is relatively slow, so that chains of different lengths can exist once the reaction has been carried out for a determinate amount of time. When a sample is ran through an LC-MS machine, the resulting spectrum has peaks corresponding to each monomer that was released in the self-immolation reaction and the polymer chains of different lengths that were generated as each terminal unit was released. The former data can be used to identity the structure of each monomer in a chain, and the latter data can be used to determine the order of the urethane units (Figure 3). As a result, the sequence of an oligourethane can be identified with a single measurement by mass spectrometry.
Next, they needed to identify 16 monomers in order to be able to generate a hexadecimal code. Eight of the monomers are depicted in figure 4. They chose side chains whose masses could be differentiated through mass spectrometry and would be stable in the basic conditions needed for the self-immolation reaction. They needed 31 unique characters to be able to store data in English, which include the 26 letters of the alphabet, spaces, and punctuation marks. Each character was denoted by a one urethane or a specific pair of urethane monomers.
With a polymer that could be synthesized and sequenced easily, the group decided to test whether it could actually be used to store data. They decided to encode an excerpt from Jane Austen’s book, Mansfield Park: “If one scheme of happiness fails, human nature turns to another; if the first calculation is wrong, we make a second better: we find comfort somewhere.” The text could then be “written” by synthesizing 18 polymers with a maximum length of 10 monomers. The order each polymer would be “read” out was determined by their position in a 96-well plate. To show the effectiveness of their data storage method, they had an independent group “read” the data. The collaborating group was given the 96-well plate with the polymers with instructions on how to carry out the sequencing reaction and convert the resulting mass spectra to English text (Figure 5). They were able to sequence each of the polymers accurately and decipher the quote that was given to them, thus showing that oligourethanes could be used for data storage.
As silicon becomes scarce and the world’s data storage-needs skyrocket, alternative materials with data-storage capabilities need to be developed. Oligourethanes are made of abundant elements and are thus easily renewable, making them an attractive candidate for this goal.
- Dahlhauser, S.D; et al. Sequencing of Sequence-Defined Oligourethanes via Controlled Self-Immolation. J. Am. Chem. Soc. 2020, 142, 6, 2744–2749