Dataset

The dataset tool in MMScan allows seamless access to data required for various tasks within MMScan.

Usage

Initialize the dataset for a specific task with:

from mmscan import MMScan

# (1) The dataset tool
my_dataset = MMScan(split='train'/'test'/'val', task='MMScan-VG'/'MMScan-QA')
# Access a specific sample
print(my_dataset[index])

Data Access

Each dataset item is a dictionary containing key elements (Bold text represents key elements directly related to the task, while the rest represents auxiliary elements; [TS-V] indicates that the element is visible in the test set, while other elements are invalid/not visible in the test set):

  • 3D Modality

    1. "pcds" ([TS-V], np.ndarray): Point cloud data with dimensions [n_points, 6(xyz+rgb)], representing the coordinates and color of each point.

    2. "bboxes" (dict): Information about bounding boxes within the scan, structured as { object ID: { "type": object type (str), "bbox": 9 DoF box (np.ndarray) }}

    3. "ori_pcds" ([TS-V], tuple[tensor]): Original point cloud data extracted from the .pth file.

    4. "instance_labels" (np.ndarray): Instance ID assigned to each point in the point cloud.

    5. "class_labels" (np.ndarray): Class IDs assigned to each point in the point cloud.

  • Language Modality

    • Category and ID info

      1. "sub_class": The category of the sample.

      2. "ID": The sample's ID.

      3. "scan_id": The scan's ID.

    • For Visual Grounding task

      1. "target_id" (list[int]): IDs of target objects.

      2. "text" ([TS-V], str): Text used for grounding.

      3. "target" (list[str]): Text prompt to specify the target grounding object.

      4. "anchors" (list[str]): Types of anchor objects.

      5. "anchor_ids" (list[int]): IDs of anchor objects.

      6. "tokens_positive" (dict): Indices of positions where mentioned objects appear in the text.

    • For Question Answering task

      1. "question" ([TS-V], str): The text of the question.

      2. "answers" (list[str]): List of possible answers.

      3. "object_ids" (list[int]): Object IDs referenced in the question.

      4. "object_names" (list[str]): Types of referenced objects.

      5. "input_bboxes_id" ([TS-V], list[int]): IDs of input bounding boxes.

      6. "input_bboxes" ([TS-V], list[np.ndarray]): Input 9-DoF bounding boxes.

  • 2D Modality

    1. 'img_path' ([TS-V], str): File path to the RGB image.

    2. 'depth_img_path' ([TS-V], str): File path to the depth image.

    3. 'intrinsic' ([TS-V], np.ndarray): Intrinsic parameters of the camera for RGB images.

    4. 'depth_intrinsic' ([TS-V], np.ndarray): Intrinsic parameters of the camera for depth images.

    5. 'extrinsic' ([TS-V], np.ndarray): Extrinsic parameters of the camera.

    6. 'visible_instance_id' (list): IDs of visible objects in the image.

Last updated