Dataset

The dataset tool in MMScan allows seamless access to data required for various tasks within MMScan.

Usage

Initialize the dataset for a specific task with:

from mmscan import MMScan

# (1) The dataset tool
my_dataset = MMScan(split='train'/'test'/'val', task='MMScan-VG'/'MMScan-QA')
# Access a specific sample
print(my_dataset[index])

Data Access

Each dataset item is a dictionary containing key elements (Bold text represents key elements directly related to the task, while the rest represents auxiliary elements; [TS-V] indicates that the element is visible in the test set, while other elements are invalid/not visible in the test set):

3D Modality
1. "pcds" ([TS-V], np.ndarray): Point cloud data with dimensions [n_points, 6(xyz+rgb)], representing the coordinates and color of each point.
2. "bboxes" (dict): Information about bounding boxes within the scan, structured as { object ID: { "type": object type (str), "bbox": 9 DoF box (np.ndarray) }}
3. "ori_pcds" ([TS-V], tuple[tensor]): Original point cloud data extracted from the .pth file.
4. "instance_labels" (np.ndarray): Instance ID assigned to each point in the point cloud.
5. "class_labels" (np.ndarray): Class IDs assigned to each point in the point cloud.
Language Modality
- Category and ID info
  1. "sub_class": The category of the sample.
  2. "ID": The sample's ID.
  3. "scan_id": The scan's ID.
- For Visual Grounding task
  1. "target_id" (list[int]): IDs of target objects.
  2. "text" ([TS-V], str): Text used for grounding.
  3. "target" (list[str]): Text prompt to specify the target grounding object.
  4. "anchors" (list[str]): Types of anchor objects.
  5. "anchor_ids" (list[int]): IDs of anchor objects.
  6. "tokens_positive" (dict): Indices of positions where mentioned objects appear in the text.
- For Question Answering task
  1. "question" ([TS-V], str): The text of the question.
  2. "answers" (list[str]): List of possible answers.
  3. "object_ids" (list[int]): Object IDs referenced in the question.
  4. "object_names" (list[str]): Types of referenced objects.
  5. "input_bboxes_id" ([TS-V], list[int]): IDs of input bounding boxes.
  6. "input_bboxes" ([TS-V], list[np.ndarray]): Input 9-DoF bounding boxes.
2D Modality
1. 'img_path' ([TS-V], str): File path to the RGB image.
2. 'depth_img_path' ([TS-V], str): File path to the depth image.
3. 'intrinsic' ([TS-V], np.ndarray): Intrinsic parameters of the camera for RGB images.
4. 'depth_intrinsic' ([TS-V], np.ndarray): Intrinsic parameters of the camera for depth images.
5. 'extrinsic' ([TS-V], np.ndarray): Extrinsic parameters of the camera.
6. 'visible_instance_id' (list): IDs of visible objects in the image.

NextEvaluator

Last updated 6 months ago