Why do we need standardised Cell Line Names?

In the last decade, the number of human embryonic stem cells (ESC) and human induced pluripotent stem cell (iPSC) lines has dramatically increased. The lack of guidelines for cell line names allows the generator of a new line determine the name individually. Consequently, the accelerated generation of stem cell lines is accompanied by confusion and duplication of cell line names and cell identities. As names are not even used consistently in publications and deviations from the original name are not rare, the situation increasingly causes confusion about the identity of cell lines used in specific studies. Additionally, semantic errors, for example by inaccurate spelling, use of special characters or spaces, may lead to errors in literature searches, which are further inherited in references, registries and other databanks. In an attempt to solve these issues and to follow up on the "Call for Standardized Naming and Reporting of Human ESC and iPSC Lines" (1), a tool was created to use identifying information to generate a unique cell line name. We developed and implemented such a nomenclature for the European human pluripotent stem cell registry (hPSCreg), which needs to: 1) unambiguously identify a registered cell line; 2) allow tracing of subclones of a particular line; and 3) enable the assignment of different lines to a specific donor origin. As it is not feasible to generate this standardised name manually for the many thousand lines available, an automatic tool was implemented that generates a unique traceable namei to a cell line upon registration of nomenclature relevant information.

The Nomenclature

We roughly follow the recommendation of Luong et al. for naming human stem cell lines. The automatically generated and standardised name is limited to 15 characters. This restriction allows readability and manageability, not only for stem cell banks and literature search but also in daily lab use. All data required to generate the name are submitted to the hPSCreg.eu database during cell line registration. The tool will assign these data into an exclusive and understandable cell line name, answering the call for standardisation.

Table 1. gives an overview about the composition of the generated cell line name and its structure. In the following the components of the name are explained and summarised in Table 2.

How the name is generated automatically is described under Automatic Generation of the Cell Line Name. Here it is also explained how the naming can be influenced and what to bear in mind during registration in order to ensure a cell line to be named uniquely.

XXXXXXi001-A
Example UKBi001-A-1
Explanations Universitaets Klinikum Bonn iPSC first donor - donor's first cell line - subclone

Table 1: Structure and example of a cell line name. (blue) the institution, (red) the cell line type, (green) the unique donor ID and (orange) a letter that is incremented depending on how many cell lines are generated from the same donor. Additionally if a subclone has been generated, the name is be extended by a hyphen and the subclone number starting with 1 (dark green).

The name starts with the acronym of the institution that generated the cell line. It is restricted to a maximum length of six and a minimum of two (uppercase) letters. The origin of the institutional abbreviation is explained below. Including the generator in the name allows for traceability, as the origin of the cell line can be tracked down. Consequently this will simplify the access to the pluripotent cell line provenance.

The institutional acronym is followed by information on the cell type, indicated by either an i for induced or an e for embryonic pluripotent stem cell lines, respectively. This single letter allows both induced end embryonic stem cell lines to be distinguished at a glance.

The most complex challenge to generate an exclusive cell line name is the donor ID. It must be unique for each donor and the donor identity must of course remain anonymous. The donor ID consists of institutional origin (generator) and a three-digit alphanumerical code that starts with decimal numbers 001-999, as shown in the example (Table 1). After reaching 999 the donor ID is continued alphanumerically with 00A to ZZZ (excluding previously used IDs). This leads to 46,655 different donor codes. This systematic numbering indicates the temporal order by which material from a donor has been used to generate (or register) a cell line by a generator’s laboratory or institution. The generation of the donor ID is explained in detail below.

The subsequent part, the clone number, represents the number of cell lines generated from the same donor. It is currently limited to 702 cell lines per donor, using one or two letters.

The final element of the standard name defines whether a cell line is a subclone of another cell line. Subclones include for example transgenic lines and isogenic clones or lines that arise spontaneously due to stable karyotype changes during cultivation. To indicate subclones, a hyphen followed by a 1-2 digit alphanumeric subclone code is added to the end of a name. Similar to the donor code we use the decimal numbers 1-99 and continue alphanumerically with A-ZZ. In order to maximize the number of possible codes we differentiate between single digit codes and codes starting with "0", but use these only after all other codes are already used. This leads to 1330 possible subclone codes. For the example in Table 1, the first subclone line derived from XXXXXXi001-A would thus have the name Universitaets Klinikum Bonn iPSC first donor - donor's first cell line - subclone. As it is inevitable that the subclone is an extension of the parental cell line name, certain aspects need to be considered while registration of a subclone. For a detailed description please refer to the section below.

Component No. Digits Example Comments
Generator acronym 2-6 XXXXXX Community wish
Cell line type 1 "e" or "i" "e": embryonic stem cell
"i": induced pluripotent stem cell
Donor ID 3 001 Alphanumeric; Limited to 46,655 donors per generator
Clone number 2-3 -A Alphabetic; Preceded by hyphen; Limited to 702 lines per donor
Identifier for subclone (only required if applicable) 0-3 -1 Alphanumeric; Preceded by hyphen; Limited to 1330 subclones per line
Total Characters 8-16 Minimum length 8 (institutional abbreviation with 2 characters) without subclone

Table 2: Overview about the composition and the structure of the Nomenclature

Automatic Generation of the Cell Line Names

This section describes what fields and information of the hPSCreg.eu naming tool are used in order to automatically generate a unique and standardised cell line name. The naming tool can be found here.

Before being able to register a new stem cell line, the cell line type and the generator of the cell line have to be defined in order to start the registration form. In doing this, two crucial elements for the name are set: The generator and the cell line type, which is indicated by the "i" or "e". The former is the institution that generated the cell line, hence it is represented in the first part of the name by a unique acronym. This abbreviation is limited to six characters and is defined by the person entering the new cell line. In order to prevent duplication and to ensure unique names, it will be checked after submission.

When registering cell lines that are derived from an already registered donor (e.g. a second line from the same donor) or when registering a subclone, it is important to always select the generator of the previously existing or the parental cell line, respectively. Contact details and the department or working group involved can be entered separately and will have no impact on the name. Although not important for naming, the information enables users to get in touch with the generators of a secondary line from a pre-existing donor if the generator differs from the one of the parental cell line.

By following this procedure the correct donor ID for the generator can be determined. The ID is a three-letter alphanumeric code, which follows the cell type symbol. The appropriate donor ID is set by answering the registration question "Does a comparator cell line exist?". If the answer is Comparator line or No comparator cell line, a new donor ID is generated by incrementing the ID of the latest registered cell line by the selected generator. If the answer is Comparator line from the same donor the donor ID is retained and the next clone number is assigned to the new cell line.

If the answer to the question Does a comparator cell line exist? is Subclone a parental cell has to be selected. Subclones are named by their parental cell line, followed by a hyphen and the subclone number (Table 3).

Characteristics of the cell line naming Example
Cell lines from different donors UKBi001-A
UKBi002-A
Cell lines sharing one donor UKBi001-A
UKBi001-B
Parental cell line and its subclone UKBi001-A
UKBi001-A-1

Table 3: characteristics of cell line naming and examples

To sum up, this naming tool system implemented in the hPSCreg.eu cell line registry allows in theory 321,272,380 cell line generators to name 46,655 different donors, for each donor 702 different cell lines per type (iPSC or ESC), for each cell line 1330 subclones and hence can provide more than 87 billion cell lines per generator and over 28 quintillion (1018) names altogether. Furthermore, the standardised name provides information about a cell line’s provenance as well as allowing traceability and comparability of the nomenclature-specific information. Finally, an unambiguous name will be most useful to support the work of national and international stem cell banks and registries worldwide and promote transparency, consistency and accessibility of human pluripotent stem cell lines.

Using the API to Generate Cell Line Names

hPSCreg provides a REST API that allows programmatic generation of cell line names. The required data is sent in JSON and users are identified using HTTP Authentification.