BMIRDS Datasets

Dartmouth Lung Cancer Histology Dataset

This dataset comprises 143 hematoxylin and eosin (H&E)-stained formalin-fixed paraffin-embedded (FFPE) whole-slide images of lung adenocarcinoma from the Department of Pathology and Laboratory Medicine at Dartmouth-Hitchcock Medical Center (DHMC). The dataset is de-identified and released with permission from Dartmouth-Hitchcock Health (D-HH) Institutional Review Board (IRB). All whole-slide images are labeled according to the consensus opinion of three pathologists, Drs. Laura Tafe, Yevgeniy Linnik, and Louis Vaickus, at the Department of Pathology and Laboratory Medicine at DHMC for the predominant pattern of lung adenocarcinoma. For more information about this dataset, please refer to “Pathologist-level classification of histologic patterns on resected lung adenocarcinoma slides with deep neural networks”.

Lung Adenocarcinoma Classification

Classification of histological patterns in lung adenocarcinoma is critical for determining tumor grade and treatment. However, this task is often challenging due to the heterogeneous nature of lung adenocarcinoma and the subjective criteria for evaluation. This dataset and its associated annotations aim to foster collaboration with the research community and facilitate developing and evaluating new methodologies for accurate histology image analysis in this domain. Classes in our dataset indicate the predominant histological pattern of each whole-slide image and are as follows:

  • Lepidic
  • Acinar
  • Papillary
  • Micropapillary
  • Solid

Sample Whole-Slide Images

sample whole-slide images

Dataset Description

The dataset includes:

  • DHMC_wsi_1.zip - (Images 1-39, 16.2 GB)
  • DHMC_wsi_2.zip - (Images 40-79, 13.18 GB)
  • DHMC_wsi_3.zip - (Images 80-119, 13.96 GB)
  • DHMC_wsi_4.zip - (Images 120-143, 6.7 GB)
  • MetaData.csv

Each zip file contains whole-slide images in .tif image format, which were scanned by an Aperio AT2 whole-slide scanner at 20x or 40x magnification and converted to Generic tiled Pyramidal TIFF format using libvips. The list of scanned slides, as well as their classes, magnification, and other details, are available in MetaData.csv.

Code Repository

DeepSlide, our open-source framework for histology image analysis in PyTorch, is available to develop deep learning models for whole-slide image classification.

Accessing Dataset

Please fill out the form below to receive the links to download the dataset by email.

Citation

If you use this dataset, please cite the corresponding paper:

Jason Wei, Laura Tafe, Yevgeniy Linnik, Louis Vaickus, Naofumi Tomita, Saeed Hassanpour, "Pathologist-level Classification of Histologic Patterns on Resected Lung Adenocarcinoma Slides with Deep Neural Networks", Scientific Reports;9:3358 (2019).

FAQ

“I haven’t received any email after submitting the form.”

Please check your Junk/Spam email folder just in case the email got delivered there instead of your inbox.

If you still couldn’t find an email, please wait for a few hours and submit the form again.

By default, the download links will be expired after 4 hours. Please submit the form again to receive new links and download data before the links expire.




For inquiries, please contact us at :mailbox:BMIRDS.

If you are interested in histology image analysis, please check out other datasets from our group.