Dataset
The dataset tool in MMScan allows seamless access to data required for various tasks within MMScan.
Usage
Initialize the dataset for a specific task with:
from mmscan import MMScan
# (1) The dataset tool
my_dataset = MMScan(split='train'/'test'/'val', task='MMScan-VG'/'MMScan-QA')
# Access a specific sample
print(my_dataset[index])
Data Access
Each dataset item is a dictionary containing key elements (Bold text represents key elements directly related to the task, while the rest represents auxiliary elements; [TS-V] indicates that the element is visible in the test set, while other elements are invalid/not visible in the test set):
3D Modality
"pcds" ([TS-V], np.ndarray): Point cloud data with dimensions [n_points, 6(xyz+rgb)], representing the coordinates and color of each point.
"bboxes" (dict): Information about bounding boxes within the scan, structured as { object ID: { "type": object type (str), "bbox": 9 DoF box (np.ndarray) }}
"ori_pcds" ([TS-V], tuple[tensor]): Original point cloud data extracted from the .pth file.
"instance_labels" (np.ndarray): Instance ID assigned to each point in the point cloud.
"class_labels" (np.ndarray): Class IDs assigned to each point in the point cloud.
Language Modality
Category and ID info
"sub_class": The category of the sample.
"ID": The sample's ID.
"scan_id": The scan's ID.
For Visual Grounding task
"target_id" (list[int]): IDs of target objects.
"text" ([TS-V], str): Text used for grounding.
"target" (list[str]): Text prompt to specify the target grounding object.
"anchors" (list[str]): Types of anchor objects.
"anchor_ids" (list[int]): IDs of anchor objects.
"tokens_positive" (dict): Indices of positions where mentioned objects appear in the text.
For Question Answering task
"question" ([TS-V], str): The text of the question.
"answers" (list[str]): List of possible answers.
"object_ids" (list[int]): Object IDs referenced in the question.
"object_names" (list[str]): Types of referenced objects.
"input_bboxes_id" ([TS-V], list[int]): IDs of input bounding boxes.
"input_bboxes" ([TS-V], list[np.ndarray]): Input 9-DoF bounding boxes.
2D Modality
'img_path' ([TS-V], str): File path to the RGB image.
'depth_img_path' ([TS-V], str): File path to the depth image.
'intrinsic' ([TS-V], np.ndarray): Intrinsic parameters of the camera for RGB images.
'depth_intrinsic' ([TS-V], np.ndarray): Intrinsic parameters of the camera for depth images.
'extrinsic' ([TS-V], np.ndarray): Extrinsic parameters of the camera.
'visible_instance_id' (list): IDs of visible objects in the image.
Last updated