Query:
学者姓名:柯逍
Refining:
Year
Type
Indexed by
Source
Complex
Former Name
Co-
Language
Clean All
Abstract :
增强水下图像质量对水下作业领域的发展具有重要意义 . 现有的水下图像增强方法通常基于成对的水下图像和参考图像进行训练,然而实际获取与水下图像对应的参考图像比较困难,相比之下获得非成对高质量水下图像或者陆上图像较为容易. 此外,现有的水下图像增强方法很难同时针对各种失真类型进行图像增强. 为了避免对成对训练数据的依赖和进一步降低获得训练数据的难度,并应对多样的水下图像失真类型,本文提出了一种基于分频式生成对抗网络(Frequency-Decomposed Generative Adversarial Network,FD-GAN)的非成对水下图像增强方法,并在此基础上设计了高低频双分支生成器用于重建高质量水下增强图像. 具体来说,本文引入特征级别的小波变换将特征分为低频和高频部分,并基于循环一致性生成对抗网络对低频和高频部分区分处理. 其中,低频分支采用结合低频注意力机制的编码-解码器结构实现对图像颜色和亮度的增强,高频分支则采用并行的高频注意力机制对各高频分量进行增强,从而实现对图像细节的恢复. 在多个标准水下图像数据集上的实验结果表明,本文提出的方法在使用非成对的高质量水下图像和引入部分陆上图像的情况下,均能有效生成高质量的水下增强图像,且有效性和泛化性均优于当 前主流的水下图像增强方法.
Keyword :
小波变换 小波变换 水下图像增强 水下图像增强 注意力机制 注意力机制 生成对抗网络 生成对抗网络 高低频双分支生成器 高低频双分支生成器
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | 牛玉贞 , 张凌昕 , 兰杰 et al. 基于分频式生成对抗网络的非成对水下图像增强 [J]. | 电子学报 , 2025 . |
MLA | 牛玉贞 et al. "基于分频式生成对抗网络的非成对水下图像增强" . | 电子学报 (2025) . |
APA | 牛玉贞 , 张凌昕 , 兰杰 , 许瑞 , 柯逍 . 基于分频式生成对抗网络的非成对水下图像增强 . | 电子学报 , 2025 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
3D anomaly detection aims to solve the problem that image anomaly detection is greatly affected by lighting conditions. As commercial confidentiality and personal privacy become increasingly paramount, access to training samples is often restricted. To address these challenges, we propose a zero-shot 3D anomaly detection method. Unlike previous CLIP-based methods, the proposed method does not require any prompt and is capable of detecting anomalies on the depth modality. Furthermore, we also propose a pre-trained structural rerouting strategy, which modifies the transformer without retraining or fine-tuning for the anomaly detection task. Most importantly, this paper proposes an online voter mechanism that registers voters and performs majority voter scoring in a one-stage, zero-start and growth-oriented manner, enabling direct anomaly detection on unlabeled test sets. Finally, we also propose a confirmatory judge credibility assessment mechanism, which provides an efficient adaptation for possible few-shot conditions. Results on datasets such as MVTec3D-AD demonstrate that the proposed method can achieve superior zero-shot 3D anomaly detection performance, indicating its pioneering contributions within the pertinent domain.
Keyword :
Anomaly detection Anomaly detection Multimodal Multimodal Online voter mechanism Online voter mechanism Pretrained model Pretrained model Zero-shot Zero-shot
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Zheng, Wukun , Ke, Xiao , Guo, Wenzhong . Zero-shot 3D anomaly detection via online voter mechanism [J]. | NEURAL NETWORKS , 2025 , 187 . |
MLA | Zheng, Wukun et al. "Zero-shot 3D anomaly detection via online voter mechanism" . | NEURAL NETWORKS 187 (2025) . |
APA | Zheng, Wukun , Ke, Xiao , Guo, Wenzhong . Zero-shot 3D anomaly detection via online voter mechanism . | NEURAL NETWORKS , 2025 , 187 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
With the advancement of Vision–Language Pretraining, unified multimodal models have exhibited promising performance in various downstream tasks. However, existing models, which relied on a generic structure for knowledge extraction in Visual Grounding task, did not fully leverage the consistency information between modalities, leading to challenges in generalization. To address this, we proposed a Language–Image Consistency Augmentation and Distillation Network (CADN) based on CLIP model. CADN balanced task loss to alleviate overfitting by allocating weights based on consistency information within CLIP features to penalize task loss. Furthermore, to retain consistency information from the pretrained model during training, we designed Consistency-Aware Self-Distillation Module (CASD), utilized as a converter after feature encoding. This module introduced additional loss functions supervised by CLIP similarity matrix and self-attention weights to ensure the restoration of consistency information in features. Additionally, we proposed Language-Enhanced Masked Attention (LEMA) to generate Spatial-Channel masks, guiding cross-modal attention to adaptively select regions and intensities of multimodal features. This enhanced the quality of decoding features, enabling query vectors to focus on semantic features relevant to textual descriptions, thus improving model performance in VG task. Experimental results not only demonstrated that our proposed model can achieve superior performance on REC datasets, but also demonstrated the distribution changes of feature consistency information through visualization experiments. These results may offer new insights and methodologies for the study and application of cross-modal consistency. © 2025 Elsevier Ltd
Keyword :
Referring expression comprehension Referring expression comprehension Self-distillation Self-distillation Visual grounding Visual grounding
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Ke, X. , Xu, P. , Guo, W. . Language–Image Consistency Augmentation and Distillation Network for visual grounding [J]. | Pattern Recognition , 2025 , 166 . |
MLA | Ke, X. et al. "Language–Image Consistency Augmentation and Distillation Network for visual grounding" . | Pattern Recognition 166 (2025) . |
APA | Ke, X. , Xu, P. , Guo, W. . Language–Image Consistency Augmentation and Distillation Network for visual grounding . | Pattern Recognition , 2025 , 166 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
Action quality assessment (AQA) is a challenging vision task that requires discerning and quantifying subtle differences in actions from the same class. While recent research has made strides in creating fine-grained annotations for more precise analysis, existing methods primarily focus on coarse action segmentation, leading to limited identification of discriminative action frames. To address this issue, we propose a Vision-Language Action Knowledge Learning approach for action quality assessment, along with a multi-grained alignment framework to understand different levels of action knowledge. In our framework, prior knowledge, such as specialized terminology, is embedded into video-level, stage-level, and frame-level representations via CLIP. We further propose a new semantic-aware collaborative attention module to prevent confusing interactions and preserve textual knowledge in cross-modal and cross-semantic spaces. Specifically, we leverage the powerful cross-modal knowledge of CLIP to embed textual semantics into image features, which then guide action spatial-temporal representations. Our approach can be plug-and-played with existing AQA methods, frame-wise annotations or not. Extensive experiments and ablation studies show that our approach achieves state-of-the-art on four public short and long-term AQA benchmarks: FineDiving, MTL-AQA, JIGSAWS, and Fis-V. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
Keyword :
Action quality assessment Action quality assessment Semantic-aware learning Semantic-aware learning Vision-language pre-training Vision-language pre-training
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Xu, H. , Ke, X. , Li, Y. et al. Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment [未知]. |
MLA | Xu, H. et al. "Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment" [未知]. |
APA | Xu, H. , Ke, X. , Li, Y. , Xu, R. , Wu, H. , Lin, X. et al. Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment [未知]. |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
Few-shot object detection achieves rapid detection of novel-class objects by training detectors with a minimal number of novel-class annotated instances. Transfer learning-based few-shot object detection methods have shown better performance compared to other methods such as meta-learning. However, when training with base-class data, the model may gradually bias towards learning the characteristics of each category in the base-class data, which could result in a decrease in learning ability during fine-tuning on novel classes, and further overfitting due to data scarcity. In this paper, we first find that the generalization performance of the base-class model has a significant impact on novel class detection performance and proposes a generalization feature extraction network framework to address this issue. This framework perturbs the base model during training to encourage it to learn generalization features and solves the impact of changes in object shape and size on overall detection performance, improving the generalization performance of the base model. Additionally, we propose a feature-level data augmentation method based on self-distillation to further enhance the overall generalization ability of the model. Our method achieves state-of-the-art results on both the COCO and PASCAL VOC datasets, with a 6.94% improvement on the PASCAL VOC 10-shot dataset. IEEE
Keyword :
Adaptation models Adaptation models Computational modeling Computational modeling data augmentation data augmentation Data models Data models Feature extraction Feature extraction few-shot learning few-shot learning object detection object detection Object detection Object detection self-distillation self-distillation Shape Shape Training Training Transfer learning Transfer learning
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Ke, X. , Chen, Q. , Liu, H. et al. GFENet: Generalization Feature Extraction Network for Few-Shot Object Detection [J]. | IEEE Transactions on Circuits and Systems for Video Technology , 2024 , 34 (12) : 1-1 . |
MLA | Ke, X. et al. "GFENet: Generalization Feature Extraction Network for Few-Shot Object Detection" . | IEEE Transactions on Circuits and Systems for Video Technology 34 . 12 (2024) : 1-1 . |
APA | Ke, X. , Chen, Q. , Liu, H. , Guo, W. . GFENet: Generalization Feature Extraction Network for Few-Shot Object Detection . | IEEE Transactions on Circuits and Systems for Video Technology , 2024 , 34 (12) , 1-1 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
Stereoscopic images typically consist of left and right views along with depth information. Assessing the quality of stereoscopic/3D images (SIQA) is often more complex than that of 2D images due to scene disparities between the left and right views and the intricate process of fusion in binocular vision. To address the problem of quality prediction bias of multi-distortion images, we investigated the visual physiology and the processing of visual information by the primary visual cortex of the human brain and proposed a no-reference stereoscopic image quality evaluation method. The method mainly includes an innovative end-to-end NR-SIQA neural network with a picture patch generation algorithm. The algorithm generates a saliency map by fusing the left and right views and then guides the image cropping in the database based on the saliency map. The proposed models are validated and compared based on publicly available databases. The results show that the model and algorithm together outperform the state-of-the-art NR-SIQA metric in the LIVE 3D database and the WIVC 3D database, and have excellent results in the specific noise metric. The model generalization experiments demonstrate a certain degree of generality of our proposed model. © 2024 Elsevier Ltd
Keyword :
Image quality Image quality Stereo image processing Stereo image processing
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Wang, Hanling , Ke, Xiao , Guo, Wenzhong et al. No-reference stereoscopic image quality assessment based on binocular collaboration [J]. | Neural Networks , 2024 , 180 . |
MLA | Wang, Hanling et al. "No-reference stereoscopic image quality assessment based on binocular collaboration" . | Neural Networks 180 (2024) . |
APA | Wang, Hanling , Ke, Xiao , Guo, Wenzhong , Zheng, Wukun . No-reference stereoscopic image quality assessment based on binocular collaboration . | Neural Networks , 2024 , 180 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
Spatiotemporal action detection requires incorporation of video spatial and temporal information. Current state‑of‑the‑art approaches usually use a 2D CNN (Convolutionsl Neural Networks) or a 3D CNN architecture. However, due to the complexity of network structure and spatiotemporal information extraction, these methods are usually non‑real‑ time and offline. To solve this problem, this paper proposes a real‑time action detection method based on spatiotemporal interaction perception. First of all, the input video is rearranged out of order to enhance the temporal information. As 2D or 3D backbone networks cannot be used to model spatiotemporal features effectively, a multi‑branch feature extraction network is proposed to extract features from different sources. And a multi‑scale attention network is proposed to extract long‑ term time‑dependent and spatial context information. Then, for the fusion of temporal and spatial features from two different sources, a new motion saliency enhancement fusion strategy is proposed, which guides the fusion between features by encoding temporal and spatial features to highlight more discriminative spatiotemporal features. Finally, action tube links are generated online based on the frame‑level detector results. The proposed method achieves an accuracy of 84.71% and 78.4% on two spatiotemporal motion datasets UCF101‑24 and JHMDB‑21. And it provides a speed of 73 frames per second, which is superior to the state‑of‑the‑art methods. In addition, for the problems of high inter‑class similarity and easy confusion of difficult sample data in the JHMDB‑21 dataset, this paper proposes an action detection method of key frame optical flow based on action representation, which avoids the generation of redundant optical flow and further improves the accuracy of action detection. © 2024 Chinese Institute of Electronics. All rights reserved.
Keyword :
Optical flows Optical flows
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Ke, Xiao , Miao, Xin , Guo, Wen-Zhong . Real-Time Action Detection Based on Spatio-Temporal Interaction Perception [J]. | Acta Electronica Sinica , 2024 , 52 (2) : 574-588 . |
MLA | Ke, Xiao et al. "Real-Time Action Detection Based on Spatio-Temporal Interaction Perception" . | Acta Electronica Sinica 52 . 2 (2024) : 574-588 . |
APA | Ke, Xiao , Miao, Xin , Guo, Wen-Zhong . Real-Time Action Detection Based on Spatio-Temporal Interaction Perception . | Acta Electronica Sinica , 2024 , 52 (2) , 574-588 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
Light field imaging, as an image type capable of capturing light information from every position in a scene, holds broad application prospects in fields such as electronic imaging, medical imaging, and virtual reality. Light field image quality assessment (LFIQA) aims to measure the quality of such images, yet current methods confront significant challenges arising from the heterogeneity between visual effects and textual modalities. To address these issues, this paper proposes a multi-modal light field image quality assessment model grounded in text-vision integration. Specifically, for the visual modality, we devise a multi-task model that effectively enriches the crucial representational features of light field images by incorporating an edge auto-thresholding algorithm. On the textual side, we accurately identify noise categories in light field images based on the comparison between input noise features and predicted noise features, thereby validating the importance of noise prediction in optimizing visual representations. Building upon these findings, we further introduce an optimized universal noise text configuration approach combined with an edge enhancement strategy, which notably enhances the accuracy and generalization capabilities of the baseline model in LFIQA. Additionally, ablation experiments are conducted to assess the contribution of each component to the overall model performance, thereby verifying the effectiveness and robustness of our proposed method. Experimental results demonstrate that our approach not only excels in tests on public datasets like Win5-LID and NBU-LF1.0 but also shows remarkable outcomes in fused datasets. Compared to the state-of-the-art algorithms, our method achieves performance improvements of 2% and 6% respectively on the two databases. The noise verification strategy and configuration method presented in this paper not only provide valuable insights for light field noise prediction tasks but can also be applied as auxiliary tools for other noise prediction types. © 2024 Chinese Institute of Electronics. All rights reserved.
Keyword :
image enhancement image enhancement image quality assessment image quality assessment light field images light field images multi-task mode multi-task mode noise prediction noise prediction visual-textual model visual-textual model
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Wang, H.-L. , Ke, X. , Jiang, A.-X. et al. Quality Assessment of Light Field Images Based on Contrastive Visual-Textual Model; [基于对比性视觉-文本模型的光场图像质量评估] [J]. | Acta Electronica Sinica , 2024 , 52 (10) : 3562-3577 . |
MLA | Wang, H.-L. et al. "Quality Assessment of Light Field Images Based on Contrastive Visual-Textual Model; [基于对比性视觉-文本模型的光场图像质量评估]" . | Acta Electronica Sinica 52 . 10 (2024) : 3562-3577 . |
APA | Wang, H.-L. , Ke, X. , Jiang, A.-X. , Guo, W.-Z. . Quality Assessment of Light Field Images Based on Contrastive Visual-Textual Model; [基于对比性视觉-文本模型的光场图像质量评估] . | Acta Electronica Sinica , 2024 , 52 (10) , 3562-3577 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
Few-shot object detection achieves rapid detection of novel-class objects by training detectors with a minimal number of novel-class annotated instances. Transfer learning-based few-shot object detection methods have shown better performance compared to other methods such as meta-learning. However, when training with base-class data, the model may gradually bias towards learning the characteristics of each category in the base-class data, which could result in a decrease in learning ability during fine-tuning on novel classes, and further overfitting due to data scarcity. In this paper, we first find that the generalization performance of the base-class model has a significant impact on novel class detection performance and proposes a generalization feature extraction network framework to address this issue. This framework perturbs the base model during training to encourage it to learn generalization features and solves the impact of changes in object shape and size on overall detection performance, improving the generalization performance of the base model. Additionally, we propose a feature-level data augmentation method based on self-distillation to further enhance the overall generalization ability of the model. Our method achieves state-of-the-art results on both the COCO and PASCAL VOC datasets, with a 6.94% improvement on the PASCAL VOC 10-shot dataset.
Keyword :
Adaptation models Adaptation models Computational modeling Computational modeling data augmentation data augmentation Data models Data models Feature extraction Feature extraction few-shot learning few-shot learning object detection object detection Object detection Object detection self-distillation self-distillation Shape Shape Training Training Transfer learning Transfer learning
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Ke, Xiao , Chen, Qiuqin , Liu, Hao et al. GFENet: Generalization Feature Extraction Network for Few-Shot Object Detection [J]. | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY , 2024 , 34 (12) : 12741-12755 . |
MLA | Ke, Xiao et al. "GFENet: Generalization Feature Extraction Network for Few-Shot Object Detection" . | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 34 . 12 (2024) : 12741-12755 . |
APA | Ke, Xiao , Chen, Qiuqin , Liu, Hao , Guo, Wenzhong . GFENet: Generalization Feature Extraction Network for Few-Shot Object Detection . | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY , 2024 , 34 (12) , 12741-12755 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
时空动作检测依赖于视频空间信息与时间信息的学习. 目前,最先进的基于卷积神经网络的动作检测器采用2D CNN或3D CNN架构,取得了显著的效果. 然而,由于网络结构的复杂性与时空信息感知的原因,这些方法通常采用非实时、离线的方式. 时空动作检测主要的挑战在于设计高效的检测网络架构,并能有效地感知融合时空特征. 考虑到上述问题,本文提出了一种基于时空交叉感知的实时动作检测方法. 该方法首先通过对输入视频进行乱序重排来增强时序信息,针对仅使用2D或3D骨干网络无法有效对时空特征进行建模,提出了基于时空交叉感知的多分支特征提取网络. 针对单一尺度时空特征描述性不足,提出一个多尺度注意力网络来学习长期的时间依赖和空间上下文信息. 针对时序和空间两种不同来源特征的融合,提出了一种新的运动显著性增强融合策略,对时空信息进行编码交叉映射,引导时序特征和空间特征之间的融合,突出更具辨别力的时空特征表示. 最后,基于帧级检测器结果在线计算动作关联性链接 . 本文提出的方法在两个时空动作数据集 UCF101-24 和 JHMDB-21 上分别达到了 84.71% 和78.4%的准确率,优于现有最先进的方法,并达到 73帧/秒的速度 . 此外,针对 JHMDB-21数据集存在高类间相似性与难样本数据易于混淆等问题,本文提出了基于动作表示的关键帧光流动作检测方法,避免了冗余光流的产生,进一步提升了动作检测准确率.
Keyword :
多尺度注意力 多尺度注意力 实时动作检测 实时动作检测 时空交叉感知 时空交叉感知
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | 柯逍 , 缪欣 , 郭文忠 . 基于时空交叉感知的实时动作检测方法 [J]. | 电子学报 , 2024 . |
MLA | 柯逍 et al. "基于时空交叉感知的实时动作检测方法" . | 电子学报 (2024) . |
APA | 柯逍 , 缪欣 , 郭文忠 . 基于时空交叉感知的实时动作检测方法 . | 电子学报 , 2024 . |
Export to | NoteExpress RIS BibTex |
Version :
Export
Results: |
Selected to |
Format: |