Download the StreetVault data HERE*.
* This dataset is made publicly available under the Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0).
StreetVault is a dataset tailored for the development and evaluation of privacy-aware AI vision systems in urban environments. Specifically, the dataset supports the training of object detection models on privacy-protected data acquired directly through a defocused optical system. The StreetVault dataset comprises paired high-quality and blurred street-level images captured in real-world conditions using a custom-built dual-camera embedded vision system.

The image acquisition setup consists of two synchronized cameras mounted side-by-side: one capturing sharp, high-resolution images, and the other capturing privacy-preserving defocused images. By employing controlled lens defocusing at the point of image acquisition, the system ensures that identifiable details, such as license plates and facial features, are inherently obscured before storage or transmission, thereby enforcing privacy-by-design principles at the sensor level.

StreetVault is structured into three subsets: sharp (S), lightly blurred (LB), and intensely blurred (IB) images, allowing the investigation of privacy-utility tradeoffs across different blur intensities. To facilitate supervised learning, object annotations (e.g., vehicle bounding boxes) are first semi-automatically generated on sharp images using a pretrained detection model, then manually verified and transferred to the corresponding blurry images via geometric alignment techniques based on SIFT keypoint matching and RANSAC. This process enables robust cross-domain supervision between clear and blurred image domains.

The dataset includes over 4,000 annotated images (2,000 sharp, 1,000 LB, and 1,000 IB), captured at a busy city intersection, and is evaluated using multiple quality and privacy metrics, including PSNR, SSIM, LPIPS, and OCR success rate on license plates. Results confirm that sensitive content is effectively anonymized even under attempts at deblurring with state-of-the-art methods. The dataset supports benchmarking of AI models across a range of modern object detectors and blur intensities, offering a foundation for developing practical, privacy-preserving AI solutions for smart cities.

We make the StreetVault dataset publicly available to facilitate further research in sensor-level anonymization and privacy-safe urban monitoring.