C'e s t L a V I E
Visual Inference&Evaluation
Group Seminar
![]() |
Cross-Modal Transferable Image-to-Video Attack on Video Quality Metrics |
![]() |
SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images |
![]() |
RadGPT- Constructing 3D Image-Text Tumor Datasets |
![]() |
Boosting Image Quality Assessment Through Efficient Transformer Adaptation with Local Feature Enhancement |
![]() |
Dog-IQA: Standard-Guided Zero-Shot MLLM for Mix-Grained Image Quality Assessment |
![]() |
ALIGNING LARGE MULTIMODAL MODELS WITH FACTUALLY AUGMENTED RLHF |
![]() |
VideoSAGE: Video Summarization with Graph Representation Learning |
![]() |
EraseDraw: Learning to Insert Objects by Erasing Them from Images |
![]() |
Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare |
![]() |
VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging |
![]() |
CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model |
![]() |
Enhancing Spatiotemporal Disease Progression Models via Latent Diffusion and Prior Knowledge |
![]() |
Blind Image Quality Assessment for Pathological Microscopic Image Under Screen and Immersion Scenarios |
![]() |
Deep Model Reference: Simple yet Effective Confidence Estimation for Image Classification |
![]() |
Multimodal Action Quality Assessment |
![]() |
Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model |
![]() |
PointLLM: Empowering Large Language Models to Understand Point Clouds |
![]() |
A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment |
![]() |
Q-Ground: Image Quality Grounding with Large Multi-modality Models |
![]() |
MEDICAL SAM 2: SEGMENT MEDICAL IMAGES AS VIDEO VIA SEGMENT ANYTHING MODEL 2 |
![]() |
LAR-IQA: A Lightweight, Accurate, and Robust No-Reference Image Quality Assessment Model |
![]() |
Your Diffusion Model is Secretly a Zero-Shot Classifier |
![]() |
ExpertAF: Expert Actionable Feedback from Video |
![]() |
PC2: Projection-Conditioned Point Cloud Diffusion for Single-Image 3D Reconstruction |
![]() |
CausalPC: Improving the Robustness of Point Cloud Classification by Causal Effect Identification |
![]() |
Rich Human Feedback for Text-to-Image Generation |
![]() |
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs |
![]() |
pix2gestalt: Amodal Segmentation by Synthesizing Wholes |
![]() |
Magic Mamba |
![]() |
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction |
![]() |
U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation |
![]() |
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation |
![]() |
Scaling Vision with Sparse Mixture of Experts |
![]() |
From Feline Classification to Skills Evaluation: A Multitask Learning Framework for Evaluating Micro Suturing Neurosurgical Skills |
![]() |
SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation |
![]() |
Understanding Zero-Shot Adversarial Robustness for Large-Scale Models |
![]() |
Comparison of No-Reference Image Quality Models via MAP Estimation in Diffusion Latents |
![]() |
Ferret: Refer and Ground Anything Anywhere at Any Granularity |
![]() |
Disruptive Autoencoders- Leveraging Low-level features for 3D Medical Image Pre-training |
![]() |
Keep Your Eye on the Best: Contrastive Regression Transformer for Skill Assessment in Robotic Surgery |
![]() |
Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild |
![]() |
DiffPoint: Single and Multi-view Point Cloud Reconstruction with ViT Based Diffusion Model |
![]() |
Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning |
![]() |
Scalable Diffusion Models with Transformers |
![]() |
PECoP: Parameter Efficient Continual Pretraining for Action Quality Assessment |
![]() |
Zero-1-to-3: Zero-shot One Image to 3D Object |
![]() |
Multimodal Optimal Transport-based Co-Attention Transformer with Global Structure Consistency for Survival Prediction |
![]() |
Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation |
![]() |
Objects do not disappear: Video object detection by single-frame object location anticipation |
![]() |
MedLSAM: Localize and Segment Anything Model for 3D CT Images |
![]() |
Test Time Adaptation for Blind Image Quality Assessment |
![]() |
CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning |
![]() |
UMIFormer: Mining the Correlations between Similar Tokens for Multi-View 3D Reconstruction |
![]() |
Attention Discriminant Sampling for Point Clouds |
![]() |
Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Localization |
![]() |
Coarse-to-Fine Amodal Segmentation with Shape Prior |
![]() |
DDG-Net: Discriminability-Driven Graph Network forWeakly-supervised Temporal Action Localization |
![]() |
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models |
![]() |
Blindly Assess Image Quality in the Wild Guided by A Self-Adaptive Hyper Network |
![]() |
Deep Evidential Regression |
![]() |
MA-SAM- Modality-agnostic SAM Adaptation for 3D Medical Image Segmentation |
![]() |
Segment and Track Anything |
![]() |
MARS: Model-agnostic Biased Object Removal without Additional Supervision for Weakly-Supervised Semantic Segmentation |
![]() |
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation |
![]() |
Blind image quality assessment based on progressive multi-task learning |
![]() |
Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in Temporal Action Localization Tasks |
![]() |
RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model |
![]() |
E-LPIPS: Robust Perceptual Image Similarity via Random Transformation Ensembles |
![]() |
SegGPT: Segmenting Everything In Context |
![]() |
Volumetric memory network for interactive medical image segmentation |
![]() |
Semi-Supervised Authentically Distorted Image Quality Assessment with Consistency-Preserving Dual-Branch Convolutional Neural Network |
![]() |
3D Cinemagraphy from a Single Image |
![]() |
Decomposed Cross-modal Distillation for RGB-based Temporal Action Detection |
![]() |
Perceptual Attacks of No-Reference Image Quality Models with Human-in-the-Loop |
![]() |
Knowledge-Guided Blind Image Quality Assessment with Few Training Samples |
![]() |
PIDNet- A Real-time Semantic Segmentation Network Inspired by PID Controllers |
![]() |
Images Speak in Images: A Generalist Painter for In-Context Visual Learning |
![]() |
Sibling-Attack: Rethinking Transferable Adversarial Attacks against Face Recognition |
![]() |
Image Quality Assessment using Semi-Supervised Representation Learning |
![]() |
Segment Anything |
![]() |
FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment |
![]() |
Unsupervised Pre-training for Temporal Action Localization Tasks |
![]() |
Towards Implicit Text-Guided 3D Shape Generation: Supplementary Material |
![]() |
HCSC: Hierarchical Contrastive Selective Coding |
![]() |
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation |
![]() |
PATCHDCT: PATCH REFINEMENT FOR HIGH QUALITY INSTANCE SEGMENTATION |
![]() |
No Reference Opinion Unaware Quality Assessment of Authentically Distorted Images |
![]() |
Towards certifying Linf robustness using neural networks with Linf-dist neurons |
![]() |
Instance Shadow Detection |
![]() |
Transductive Semi-Supervised Deep Learning using Min-Max Features |
![]() |
V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation |
![]() |
Bridge-Prompt Towards Ordinal Action Understanding in Instructional |
![]() |
Hyperbolic Image Segmentation |
![]() |
Image Quality Assessment using Contrastive Learning |
![]() |
【写作技巧】Abstract和Introduction鉴赏 |
![]() |
Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation |
![]() |
Fast and Unsupervised Action Boundary Detection for Action Segmentation |
![]() |
Natural Color Fool: Towards Boosting Black-box Unrestricted Attacks |
![]() |
Theme-Aware Semi-Supervised Image Aesthetic Quality Assessment |
![]() |
Uncertainty-Guided Probabilistic Transformer for Complex Action Recognition |
![]() |
ObjectBox: From Centers to Boxes for Anchor-Free Object Detection |
![]() |
Learning to Refactor Action and Co-occurrence Features for Temporal Action Localization |
![]() |
Shift-tolerant Perceptual Similarity Metric |
![]() |
Incorporating Semi-Supervised and Positive-Unlabeled Learning for Boosting Full Reference Image Quality Assessment |
![]() |
Deep Evidential Regression |
![]() |
In Defense of Online Models for Video Instance Segmentation |
![]() |
The Dimpled Manifold Model of Adversarial Examples in Machine Learning |
![]() |
Modeling Localness for Self-Attention Networks |
![]() |
Single-View 3D Object Reconstruction from Shape Priors in Memory |
![]() |
Class Semantic-based Attention for Action Detection |
![]() |
Optimism in the Face of Adversity: Understanding and Improving Deep Learning Through Adversarial Robustness |
![]() |
Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification |
![]() |
MaD-DLS: Mean and Deviation of Deep and Local Similarity for Image Quality Assessment |
![]() |
E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation |
![]() |
Prototypical Cross-Attention Networks forMultiple Object Tracking and Segmentation |
![]() |
A Simple Semi-Supervised Learning Framework for Object Detection |
![]() |
FreeSOLO: Learning to Segment Objects without Annotations |
![]() |
Deep Self-Dissimilarities as Powerful Visual Fingerprints |
![]() |
Temporal Action Detection with Multi-level Supervision |
![]() |
An Image Patch is a Wave: Quantum Inspired Vision MLP |
![]() |
Dual Adversarial Network: Toward Real-world Noise Removal and Noise Generation |
![]() |
Reducing Flipping Errors in Deep Neural Networks |
![]() |
Vision-Language Pre-Training with Triple Contrastive Learning |
![]() |
Instant Teaching: An End to End Semi Supervised Object Detection Framework |
![]() |
Learning Action Completeness from Points forWeakly-supervised Temporal Action Localization |
![]() |
Weakly Supervised Instance Segmentation using Class Peak Response |
![]() |
First-order Adversarial Vulnerability of Neural Networks and Input Dimension |
![]() |
Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment |
![]() |
Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion |
![]() |
End-to-End Video Instance Segmentation with Transformers |
![]() |
Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields |
![]() |
Unsupervised Domain Adaptation in Semantic Segmentation |
![]() |
Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization |
![]() |
Pixel Difference Networks for Efficient Edge Detection |
![]() |
Image Quality Assessment: Unifying Structure and Texture Similarity |
![]() |
Cubemap-Based Perception-Driven Blind Quality Assessment for 360-degree Images |
![]() |
Railroad is not a Train: Saliency as Pseudo-pixel Supervision for Weakly Supervised Semantic Segmentation |
![]() |
A Weakly Supervised Amodal Segmenter with Boundary Uncertainty Estimation |
![]() |
Perceptual Adversarial Robustness: Defense Against Unseen Threat Models |
![]() |
GLoRIA A Multimodal Global-Local Representation Learning Framework for Label-efficient Medical Image Recognition |
![]() |
Weakly-Supervised Semantic Segmentation via Sub-category Exploration |
![]() |
Learning to Resize Images for Computer Vision Tasks |
![]() |
Topology-Imbalance Learning for Semi-Supervised Node Classification |
![]() |
MUSIQ: Multi-scale Image Quality Transformer |
![]() |
Video Self-Stitching Graph Network for Temporal Action Localization |
![]() |
Self-Supervised 3D Mesh Reconstruction from Single Images |
![]() |
DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort |
![]() |
LOWKEY: Leveraging Adversarial Attacks to Protect Social Media Users From Facial Recognition |
![]() |
Blind Omnidirectional Image Quality Assessment Based on Structure and Natural Features |
![]() |
Pre-Trained Image Processing Transformer |
![]() |
Rich features for perceptual quality assessment of UGC videos |
![]() |
BoxInst: High-Performance Instance Segmentation with Box Annotations |
![]() |
Instance Similarity Learning for Unsupervised Feature Representation |
![]() |
Enriching Local and Global Contexts for Temporal Action Localization |
![]() |
Prototype Completion with Primitive Knowledge for Few-Shot Learning |
![]() |
Feature Importance-aware Transferable Adversarial Attacks |
![]() |
Iterative Shrinking for Referring Expression Grounding Using Deep Reinforcement Learning |
![]() |
ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation |
![]() |
Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild |
![]() |
MDETR - Modulated Detection for End-to-End Multi-Modal Understanding |
![]() |
Temporal Query Networks for Fine-Grained Video Understanding |
![]() |
Recent work |
![]() |
Hierarchical Recurrent Deep Fusion Using Adaptive Clip Summarization for Sign Language Translation |
![]() |
Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation |
![]() |
Patch-VQ: ‘Patching Up’ the Video Quality Problem |
![]() |
Extreme Rotation Estimation using Dense Correlation Volumes |
![]() |
Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting |
![]() |
Point2Skeleton: Learning Skeletal Representations from Point Clouds |
![]() |
TDN: Temporal Difference Networks for Efficient Action Recognition |
![]() |
The Devil is in the Boundary Exploiting Boundary Representation for Basis-based Instance Segmentation |
![]() |
CO2: CONSISTENT CONTRAST FOR UNSUPERVISED VISUAL REPRESENTATION LEARNING |
![]() |
Emerging Properties in Self-Supervised Vision Transformers |
![]() |
Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers |
![]() |
进展汇报:小工具推荐 |
![]() |
Feature Selection for Zero-Shot Gesture Recognition |
![]() |
Detecting and Mapping Video Impairments |
![]() |
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis |
![]() |
Hallucinated-IQA No-Reference Image Quality Assessment |
![]() |
Quantifying Visual Image Quality: A Bayesian View |
![]() |
Distilling Knowledge via Knowledge Review |
![]() |
Transferable Perturbations of Deep Feature Distributions |
![]() |
Personality-Assisted Multi-Task Learning for Generic and Personalized Image Aesthetics Assessment |
![]() |
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows |
![]() |
Learning Continuous Image Representation with Local Implicit Image Function |
![]() |
A survey on visual transformer |
![]() |
Depth and Amodal Segmentation |
![]() |
Unlearnable examples: making personal data exploitable |
![]() |
Learning Transferable Visual Models From Natural Language Supervision |
![]() |
Blind Image Quality Assessment with Active Inference |
![]() |
Conditional Convolutions for Instance Segmentation |
![]() |
Disentangled Non-Local Networks |
![]() |
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric |
![]() |
RankIQA Learning From Rankings for No-Reference Image Quality Assessment |
![]() |
Towards Open World Object Detection |
![]() |
Peer Collaborative Learning for Online Knowledge Distillation |
![]() |
Weakly-Supervised Action Localization by Generative Attention Modeling |
![]() |
Modeling Multi-Label Action Dependencies for Temporal Action Localization |
![]() |
RESTRICTING THE FLOW: INFORMATION BOTTLENECKS FOR ATTRIBUTION |
![]() |
Unsupervised Multi-Modal Image Registration via Geometry Preserving |
![]() |
Continual Learning for Blind Image Quality Assessment |
![]() |
Stabilized Medical Image Attack |
![]() |
Recent Progress on Self-Supervised Representation Learning |
![]() |
Amodal Segmentation Based on Visible Region Segmentation and Shape Prior |
![]() |
The VC Dimension |
![]() |
Unsupervised Deep Homography A Fast and Robust Homography Estimation Model |
![]() |
Rethinking softmax cross entropy loss for adversarial robustness |
![]() |
Learning via Uniform Convergence |
![]() |
Joint Semantic Segmentation and Boundary Detection using Iterative Pyramid Contexts |
![]() |
Temporal Pyramid Network for Action Recognition |
![]() |
PAC-Learning |
![]() |
Growing Neural Cellular Automata |
![]() |
RIRNet: Recurrent-In-Recurrent Network for Video Quality Assessment |
![]() |
An Unsupervised Information-Theoretic Perceptual Quality Metric |
![]() |
Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space |
![]() |
A Mathematical Theory of Evidence |
![]() |
Correlating Edge, Pose with Parsing |
![]() |
Spatially Adaptive Inference with Stochastic Feature Sampling and Interpolation |
![]() |
Adversarial Weight Perturbation Helps Robust Generalization |
![]() |
Memory-augmented Dense Predictive Coding for Video Representation Learning |
![]() |
Appearance-Preserving 3D Convolution for Video-based Person Re-identification |
![]() |
Synthesize then Compare - Detecting Failures and Anomalies for Semantic Segmentation |
![]() |
ArcFace: Additive Angular Margin Loss for Deep Face Recognition |
![]() |
Understanding the Role of Individual Units in a Deep Network |
![]() |
Object Instance Annotation with Deep Extreme Level Set Evolution |
![]() |
Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation |
![]() |
Self-Attention: From Image Recognition to Image Segmentation |
![]() |
Beyond Vision: A Multimodal Recurrent Attention Convolutional Neural Network for Unified Image Aesthetic Prediction Tasks |
![]() |
High-frequency Component Helps Explain the Generalization of Convolutional Neural Networks |
![]() |
An Asymmetric Modeling for Action Assessment |
![]() |
ECCV 20 Segmentation Theme |
![]() |
Collaborative Video Object Segmentation by Foreground-Background Integration |
![]() |
End-to-End Object Detection with Transformers |
![]() |
A Unified Framework of Surrogate Loss by Refactoring and Interpolation |
![]() |
Adv-watermark_A Novel Watermark Perturbation for Adversarial Examples |
![]() |
[Survey] Meta-Learning |
![]() |
Dual Super-Resolution Learning for Semantic Segmentation |
![]() |
SRFlow: Learning the Super-Resolution Space with Normalizing Flow |
![]() |
Closed-loop Matters: Dual Regression Networks for Single Image Super-Resolution |
![]() |
Exploring Self-attention for Image Recognition |
![]() |
Efficient Semantic Video Segmentation with Per-frame Inference |
![]() |
Attacks Which Do Not Kill Training Make Adversarial Learning Stronger |
![]() |
FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding |
![]() |
Learning Instance Occlusion for Panoptic Segmentation |
![]() |
Bootstrap Your Own Latent A New Approach to Self-Supervised Learning |
![]() |
Understanding SSIM |
![]() |
Image Processing Using Multi-Code GAN Prior |
![]() |
RetinaTrack: Online Single Stage Joint Detection and Tracking |
![]() |
Not All Areas Are Equal: Transfer Learning for Semantic Segmentation via Hierarchical Region Selection |
![]() |
Second-Order Provable Defenses against Adversarial Attacks |
![]() |
Learning Fast and Robust Target Models for Video Object Segmentation |
![]() |
SpeedNet: Learning the Speediness in Videos |
![]() |
Intra- and Inter-Action Understanding via Temporal Action Parsing |
![]() |
Gradient Centralization: A New Optimization Technique for Deep Neural Networks |
![]() |
Visualizing the Invisible Occluded Vehicle Segmentation and Recovery |
![]() |
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy |
![]() |
Multi-Modal Domain Adaptation for Fine-Grained Action Recognition |
![]() |
Destruction and Construction Learning for Fine-grained Image Recognition |
![]() |
Blurry Video Frame Interpolation |
![]() |
Spatially Transformed Adversarial ExamplesSpatially Transformed Adversarial Examples |
![]() |
SC4D: A Sparse 4D Convolutional Network for Skeleton-Based Action Recognition |
![]() |
Circle Loss: A Unified Perspective of Pair Similarity Optimization |
![]() |
MOPT: Multi-Object Panoptic Tracking |
![]() |
Deep Unfolding Network for Image Super-Resolution |
![]() |
Difficulty-Aware Attention Network with Confidence Learning for Medical Image Segmentation |
![]() |
Towards Discriminability and Diversity: Batch Nuclear-norm Maximization under Label Insufficient Situations |
![]() |
Diagnosing Error in Object Detectors |
![]() |
BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition |
![]() |
Feedback Graph Convolutional Network for Skeleton-based Action Recognition |
![]() |
How Useful is Self-Supervised Pretraining for Visual Tasks |
![]() |
See the Sound, Hear the Pixels |
![]() |
PolarMask: Single Shot Instance Segmentation with Polar Representation |
![]() |
RANet: Ranking Attention Network for Fast Video Object Segmentation |
![]() |
PSENet: Psoriasis Severity Evaluation Network |
![]() |
GIQA: Generated Image Quality Assessment |
![]() |
Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation |
![]() |
From Patches to Pictures (PaQ-2-PiQ): Mapping the Perceptual Space of Picture Quality |
![]() |
Functional Adversarial Attacks |
![]() |
PF-Net- Point Fractal Network for 3D Point Cloud Completion |
![]() |
DynamoNet: Dynamic Action and Motion Network |
![]() |
Deep Snake for Real-Time Instance Segmentation |
![]() |
Spatial-Temporal Relation Networks for Multi-Object Tracking |
![]() |
Part-Level Graph Convolutional Network for Skeleton Based Action Recognition |
![]() |
Unsupervised Learning for Real-World Super-Resolution |
![]() |
Adversarial Feedback Loop |
![]() |
Attention Based Glaucoma Detection: A Large-scale Database and CNN Model |
![]() |
SlowFast Networks for Video Recognition |
![]() |
Libra R-CNN: Towards Balanced Learning for Object Detection |
![]() |
How Does Batch Normalization Help Optimization? |
![]() |
Decoupling Direction and Norm for Efficient Gradient-Based L2 Adversarial Attacks and Defenses |
![]() |
What’s important to boost performance in deep learning |
![]() |
Anchor Diffusion for Unsupervised Video Object Segmentation |
![]() |
Collaborative Learning of Semi-Supervised Segmentation and Classification for Medical Images |
![]() |
Trust Region Based Adversarial Attack on Neural Networks |
![]() |
Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation |
![]() |
Tracking Without Bells and Whistles |
![]() |
DeepCO3 - Deep Instance Co-segmentation by Co-peak Search and Co-saliency Detection |
![]() |
Noise2Void - Learning Denoising from Single Noisy Images |
![]() |
Action Recognition from Single Timestamp Supervision in Untrimmed Videos |
![]() |
Momentum Contrast for Unsupervised Visual Representation Learning |
![]() |
Efficient Parameter-free Clustering Using First Neighbor Relations |
![]() |
DeepUSPS: Deep Robust Unsupervised Saliency Prediction With Self-Supervision |
![]() |
Cyclic Guidance for Weakly Supervised Joint Detection and Segmentation |
![]() |
Efficient Online Multi-Person 2D Pose Tracking with Recurrent Spatio-Temporal Affinity Fields |
![]() |
Multi-person Articulated Tracking with Spatial and Temporal Embeddings |
![]() |
ComDefend_An Efficient Image Compression Model to Defend Adversarial Examples |
![]() |
Second-order Attention Network for Single Image Super-Resolution |
![]() |
Image Aesthetic Assessment Based on Pairwise Comparison – A Unified Approach to Score Regression, Binary Classification, and Personalization |
![]() |
Learning Semantics-aware Distance Map with Semantics Layering Network for Amodal Instance Segmentation |
![]() |
A Style-Based Generator Architecture for Generative Adversarial Networks |
![]() |
SeGAN: Segmenting and Generating the Invisible |
![]() |
Action Assessment by Joint Relation Graphs |
![]() |
Learning Temporal Action Proposals With Fewer Labels |
![]() |
Quality Assessment of In-the-Wild Videos |
![]() |
RankSRGAN: Generative Adversarial Networks with Ranker for Image Super-Resolution |
![]() |
Adaptive Pyramid Context Network for Semantic Segmentation |
![]() |
Surgical Skill Assessment on In-Vivo Clinical Data via the Clearness of Operating Field |
![]() |
Weakly Supervised Energy-Based Learning for Action Segmentation |
![]() |
SparseFool_A_Few_Pixels_Make_a_Big_Difference |
![]() |
An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition |
![]() |
A General and Adaptive Robust Loss Function |
![]() |
Graph convolutional tracking |
![]() |
Visual Attention Consistency under Image Transforms for Multi-Label Image Classification |
![]() |
Timeception for Complex Action Recognition |
![]() |
Efficient Video Classification Using Fewer Frames |
![]() |
ScratchDet: Training Single-Shot Object Detectors from Scratch |
![]() |
Learning Loss for Active Learning |
![]() |
Do Better ImageNet Models Transfer Better? |
![]() |
Weakly Supervised Image Classification through Noise Regularization |
![]() |
“Double-DIP”: Unsupervised Image Decomposition via Coupled Deep-Image-Priors |
![]() |
Neural Task Graphs: Generalizing to Unseen Tasks from a Single Video Demonstration |
![]() |
Defense Against Adversarial Images using Web-Scale Nearest-Neighbor Search |
![]() |
Adversarial Attacks Beyond the Image Space |
![]() |
Collaborative Global-Local Networks for Memory-Efficient Segmentation |
![]() |
Accel-A Corrective Fusion Network for Efficient Semantic Segmentation on Video |
![]() |
Pose2Seg- Detection Free Human Instance Segmentation |
![]() |
UPSNet- A Unified Panoptic Segmentation Network |
![]() |
Laso: Label-Set Operations networks for multi-label few-shot learning |
![]() |
TraPHic: Trajectory Prediction in Dense and Heterogeneous Traffic Using Weighted Interactions |
![]() |
Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition |
![]() |
Skin Lesion Classification in Dermoscopy Images Using Synergic Deep Learning |
![]() |
1. A Relation-Augmented Fully Convolutional Network for Semantic Segmentation in Aerial Scenes 2. Co-occurrent Features in Semantic Segmentation |
![]() |
Not All Areas Are Equal: Transfer Learning for Semantic Segmentation via Hierarchical Region Selection |
![]() |
Learning Correspondence from the Cycle-consistency of Time |
![]() |
SiamRPN++ |
Bag of Tricks for Image Classification with Convolutional Neural Networks |
|
![]() |
Making Convolutional Networks Shift-Invariant Again |
![]() |
Complement Objective Training |
![]() |
What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment |
DAVANet: Stereo Deblurring with View Aggregation |
|
![]() |
Towards Robust Detection of Adversarial Examples |
![]() |
Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers |
![]() |
On the Use of Deep Learning for Blind Image Quality Assessment |
![]() |
Fast Online Object Tracking and Segmentation: A Unifying Approach |
![]() |
Learning Deep Compositional Grammatical Architectures for Visual Recognition |
![]() |
ExFuse: Enhancing Feature Fusion for Semantic Segmentation |
![]() |
The Pros and Cons: Rank-aware Temporal Attention for Skill Determination in Long Videos |
![]() |
SFNet: Learning Object-aware Semantic Correspondence |
![]() |
Data augmentation using learned transforms for one-shot medical image segmentation |
A Comparative Study for Single Image Blind Deblurring |
|
![]() |
Graph CNNs with Motif and Variable Temporal Block for Skeleton-based Action Recognition |
![]() |
Deep Video Quality Assessor: From Spatio-temporal Visual Sensitivity to A convolutional Neural Aggregation Network |
![]() |
CCNet: Criss-Cross Attention for Semantic Segmentation |
![]() |
DenseASPP for Semantic Segmentation in Street Scenes |
![]() |
Eliminating Background-Bias for Robust Person Re-identification |
![]() |
Application-Driven No-Reference Quality |
![]() |
R-FCN: Object Detection via Region-based Fully Convolutional Networks |
![]() |
ActionVLAD: Learning spatio-temporal aggregation for action classification |
![]() |
MoNet: Deep Motion Exploitation for Video Object Segmentation |
![]() |
A Constrained Deep Neural Network for Ordinal Regression |
![]() |
DeepFool: a simple and accurate method to fool deep neural networks |
![]() |
Modeling Surgical Technical Skill Using Expert Assessment for Automated Computer Rating |
![]() |
Self-Ensembling Attention Networks- Addressing Domain Shift for Semantic Segmentation |
![]() |
IMAGENET-TRAINED CNNS ARE BIASED TOWARDS TEXTURE; INCREASING SHAPE BIAS IMPROVES ACCURACY AND ROBUSTNESS |
![]() |
Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles |
![]() |
Dual Attention Network for Scene Segmentation |
![]() |
A Benchmark for Automatic Visual Classification of Clinical Skin Disease Images |
![]() |
Clinical Skin Lesion Diagnosis using Representations Inspired by Dermatologist Criteria |
![]() |
An Information-Theoretic Definition of Similarity |
![]() |
Generating Images with Perceptual Similarity Metrics based on Deep Networks |
![]() |
Spatio-Temporal Graph Routing for Skeleton-based Action Recognition |
![]() |
Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation |
![]() |
Towards Robust Interpretability with Self-explaining Neural Networks |
![]() |
Trajectory Convolution for Action Recognition |
![]() |
Videos as Space-Time Region Graphs |
![]() |
Recurrent Autoregressive Networks for Online Multi-Object Tracking |
![]() |
Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks |
![]() |
High Performance Visual Tracking with Siamese Region Proposal Network |
![]() |
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric |
![]() |
Fast Video Object Segmentation by Reference-Guided Mask Propagation |
![]() |
What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets |
![]() |
A New Representation of Skeleton Sequences for 3D Action Recognition |
![]() |
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition |
![]() |
Learning Deep Features for Discriminative Localization |
![]() |
Networks and the Best Approximation Property |
![]() |
Temporal Deformable Residual Networks for Action Segmentation in Videos |
![]() |
Bi-box Regression for Pedestrian Detection and Occlusion Estimation |
![]() |
Instance Segmentation and Tracking with Cosine Embeddings and Recurrent Hourglass Networks |
Person Re-identification with Deep Similarity-Guided Graph Neural Network |
|
![]() |
Collaborative Deep Reinforcement Learning for Multi-Object Tracking |
![]() |
Video quality assessment accounting for temporal visual masking of local flicker |
![]() |
Deep Reinforcement Learning for Surgical Gesture Segmentation and Classification |
![]() |
Generate To Adapt: Aligning Domains using Generative Adversarial Networks |
![]() |
Direction-aware Spatial Context Features for Shadow Detection and Removal |
![]() |
Online Multi-Object Tracking with Dual Matching Attention Networks |
![]() |
Fusing Crowd Density Maps and Visual Object Trackers for People Tracking in Crowd Scenes |
![]() |
Eigen-Distortions of Hierarchical Representations |
![]() |
Gaussian Process |