We present a hierarchical and robust algorithm addressing the problem of filtering and segmentation of 3-D acoustic images. This algorithm is based on the tensor voting, which is a unified computational framework for the inference of multiple salient structures. Unlike most previous approaches, no models or prior information of the underwater environment, nor the intensity information of acoustic images is considered in this algorithm. Salient structures and outlier noisy points are directly clustered in two steps according to both the density and the structural information of the input data. Our experimental trials show promising results, very robust with respect to the low computational complexity.