The proposed framework has the advantage of applying visual-semantic embedding that effectively addresses the semantic redundancy among labels; it also models the label co-occurrence without introducing additional subnets that are to be integrated in the CNN framework.