I have been experimenting with the iPhone 3D Scanner App. Modern iPhones include a LiDAR sensor that captures depth maps aligned with the regular camera images. I wanted to show how to take those RGB and depth frames, combine them into simple RGBD images, and use them for object classification.
The process is straightforward. I pick a few objects around the house and record a short scan of each one. Afterward, I crop the data to remove the unnecessary background, leaving only the object.
The “All Data” export option gives me the raw camera frames, depth maps, and confidence images. Each depth map aligns with its corresponding RGB frame, forming a four-channel representation. Three channels hold the color information, and one carries the distance to the camera. This simple combination already adds a sense of shape that pure RGB cannot provide. For small experiments, it is more than enough.
A compact CNN can then learn to classify the object based on these RGBD inputs. The goal is not to chase accuracy or build a classification benchmark. The value lies in hands-on learning. I collect the data myself, understand its noise and limitations, and train a model on data captured by my own camera. That experience teaches more than working with polished 3D objects downloaded from the internet.
Later, I will attach the GitHub repository with the supporting code.
No comments:
Post a Comment