For identification of CTCs, we pooled sc transcriptomes from all six patients. Shared nearest neighbor modularity clustering revealed 14 distinct clusters containing a total of 9,659 cells (Fig.
1C). High cluster stability and consistency, especially of the CTC cluster, was confirmed by Jaccard Index analysis and Silhouette width (Supplementary Fig.
2A and B, Supplementary Table
3). We utilized reference-based cell type annotation comparing to a reference data set with 'SingleR' and canonical marker gene expression and identified 7 distinct epithelial cell clusters; other identified cell types included megakaryocytes, neutrophils, natural killer (NK) cells, megakaryocyte/erythroid progenitors (MEPs), common myeloid progenitors (CMPs)/pro-myelocytes (CMPs) and granulocyte-monocyte progenitors (GMPs) (Fig.
1D &
E, Supplementary Fig.
3A & B, Supplementary Table
4). Trajectory analysis, which enables the study of dynamic changes in gene expression, revealed a separate branch of epithelial cells compared to HPC, confirming a different origin for these cells (Supplementary Fig.
3C & D). Additionally, comparison of differentially expressed genes (DEGs) between HPCs and CTCs indicated substantially higher levels of epithelial markers, including keratins as well as increased expression of cell cycle (
CCND1) and proliferation-related genes (
SFN) in epithelial cells.
S100A2, NQO1, and
ID1 were also amongst the top DEGs. These genes are known to be expressed in epithelial cells, including respiratory cells and are associated with cancer progression [
9,
10]. In contrast, HPCs exhibited higher expression of genes associated with hematopoietic lineages, such as
MPO (myeloperoxidase),
DEFA3 (neutrophil defensin 3), and
HBA1 and
HBA2 (hemoglobin subunits alpha 1 and 2, Supplementary Fig.
3E, Supplementary Table
5). We specifically also investigated endothelial and fibroblast maker genes, as these cells can be found in the peripheral blood. None of the marker genes were identified in either HPC clusters or CTC clusters, with the exception of
COL1A1, which was partially expressed in CTC cluster 6 (Supplementary Fig.
3F). Since cells in CTC cluster 6 also expressed
EPCAM and
KRT19, these cells are of epithelial origin (Fig.
1J). Comparing inferred copy number variations (CNVs) from epithelial cells to reference HPCs revealed notable evidence for CNVs in epithelial cells. Thus, these cells are henceforth referred to as CTCs (Fig.
1F). Overall, a total number of 3,363 NSCLC-CTCs was identified. As a note, inferCNV analyses from CTCs with healthy lung epithelial cells as reference confirmed increased CNV in CTCs (Supplementary Fig.
3G).
Notably, the Human Primary Cell Atlas (HPCA) reference within SingleR does not classify megakaryocytes per se, but platelets (Supplementary Fig.
3A). Since most platelets were excluded during enrichment, we tested whether cluster 8 (Fig.
1C) was composed of platelet-coated CTCs. The DEGs between cluster 8 (‘platelets’) and HPCs did not indicate any expression of epithelial marker genes such as keratins or
EPCAM, thus contradicting this hypothesis (Supplementary Table
6). Hence, cluster 8 was classified as megakaryocytes.