【计算机视觉】24-Object Detection

文章目录

  • 24-Object Detection
    • 1. Introduction
    • 2. Methods
      • 2.1 Sliding Window
      • 2.2 R-CNN: Region-Based CNN
      • 2.3 Fast R-CNN
      • 2.4 Faster R-CNN: Learnable Region Proposals
      • 2.5 Results of objects detection
    • 3. Summary
    • Reference

24-Object Detection

1. Introduction

  1. Task Definition

    Input: Single RGB Image

    Output: A set of detected objects;

    For each object predict:

    • Category label (from fixed, known set of categories)

    • Bounding box(four numbers: x, y, width, height)

  2. Challenges

    • Multiple outputs: Need to output variable numbers of objects per image
    • Multiple types of output: Need to predict ”what” (category label) as well as “where” (bounding box)
    • Large images: Classification works at 224x224; need higher resolution for detection, often ~800x600
  3. Detecting a single object

    image-20231120145632741

    With two branches, outputting label, and box

    Problem: Images can have more than one object! And if we use multiple single object detection, it will decrease the efficiency.

2. Methods

2.1 Sliding Window

Apply a CNN to many different crops of the image, CNN classifies each crop as an object or background:

image-20231120150748738

Problem: Need too many calculations

  • Consider an image of size H*W and a box of size h*w
  • Total possible boxes: ∑ h = 1 H ∑ w = 1 W ( W − w + 1 ) ( H − h + 1 ) = H ( H + 1 ) 2 W ( W + 1 ) 2 \sum_{h=1}^{H}\sum_{w=1}^{W}(W-w+1)(H-h+1)=\frac{H(H+1)}{2}\frac{W(W+1)}{2} h=1Hw=1W(Ww+1)(Hh+1)=2H(H+1)2W(W+1)
  • 800 x 600 image has ~58M boxes! No way we can evaluate them all.

2.2 R-CNN: Region-Based CNN

  1. Region Proposals(Selective Search)

    Selective Search is a region proposal algorithm used in object detection. It is based on computing hierarchical grouping of similar regions based on color, texture, size and shape compatibility.

    Selective Search starts by over-segmenting the image based on intensity of the pixels using a graph-based segmentation method by Felzenszwalb and Huttenlocher.

    image-20231120213007261

    Selective Search algorithm takes these oversegments as initial input and performs the following steps

    1. Add all bounding boxes corresponding to segmented parts to the list of regional proposals
    2. Group adjacent segments based on similarity
    3. Go to step 1

    At each iteration, larger segments are formed and added to the list of region proposals. Hence we create region proposals from smaller segments to larger segments in a bottom-up approach.

    As for the calculation of similarity measures based on color, texture, size and shape compatibility, please refer to Selective Search for Object Detection (C++ / Python) | LearnOpenCV

  2. Architecture of the network

    image-20231120214110598

    On two thousand selected regions, we narrow them down to the size required for classification, and after passing through the convolutional network, we output the category along with the box offset

  3. Steps

    1. Run region proposal method to compute ~2000 region proposals
    2. Resize each region to 224x224 and run independently through CNN to predict class scores and bbox transform
    3. Use scores to select a subset of region proposals to output (Many choices here: threshold on background, or per-category? Or take top K proposals per image?)
    4. Compare with ground-truth boxes
  4. Details(Focus on step3 and 4)

    1. Intersection over Union (IoU)
      I o U = Area of Intersection Area of Union IoU=\frac{\color{yellow}{\text{Area of Intersection}}}{\color{purple}{\text{Area of Union}}} IoU=Area of UnionArea of Intersection
      在这里插入图片描述

    2. Non-Max Suppression (NMS)

      • Select next highest-scoring box

      • Eliminate lower-scoring boxes(Comparing the highest-scoring box to all the others ) with IoU > threshold (e.g. 0.7)

      • If any boxes remain, GOTO 1

      Problem: NMS may eliminate ”good” boxes when objects are highly overlapping:

在这里插入图片描述

  1. Mean Average Precision (mAP)

    Use the gif to understand it(but I only have the final image):

在这里插入图片描述 For example, the mAP in COCO dataset is 0.4.

  1. Problem: Very slow! Need to do ~2k forward passes for each image!

    Solution: Run CNN before warping!

2.3 Fast R-CNN

  1. Architecture:

    image-20231120151757798
    • Most of the computation happens in the backbone network; this saves work for overlapping region proposals

    • Per-Region network is relatively lightweight

  2. The concrete architecture in Alexnet and Resnet:

    image-20231120152141617 image-20231120152156583
  3. Details:

    How to crop features?

    image-20231120222841764

    In this process, there are two errors:

    img

    如下图,假设输入图像经过一系列卷积层下采样32倍后输出的特征图大小为8x8,现有一 RoI 的左上角和右下角坐标(x, y 形式)分别为(0, 100) 和 (198, 224),映射至特征图上后坐标变为(0, 100 / 32)和(198 / 32,224 / 32),由于像素点是离散的,因此向下取整后最终坐标为(0, 3)和(6, 7),这里产生了第一次量化误差。

    假设最终需要将 RoI 变为固定的2x2大小,那么将 RoI 平均划分为2x2个区域,每个区域长宽分别为 (6 - 0 + 1) / 2 和 (7 - 3 + 1) / 2 即 3.5 和 2.5,同样,由于像素点是离散的,因此有些区域的长取3,另一些取4,而有些区域的宽取2,另一些取3,这里产生了第二次量化误差。

  4. RoI Align in Mask R-CNN

在这里插入图片描述

Notice: RoI Align needs to set a hyperparameter to represent the number of sampling points in each region, which is usually 4.

  1. Speed

    It has an enormous increase from R-CNN. But we can find that region proposals costs lots of time.

2.4 Faster R-CNN: Learnable Region Proposals

  1. Architecture:

    Insert Region Proposal Network (RPN) to predict proposals from feature
    在这里插入图片描述

  2. Details:

在这里插入图片描述

At each point, predict whether the corresponding anchor contains an object. And we use logistic regression to express the error. predict scores with conv layer

  1. Evaluation

在这里插入图片描述

  1. Improvement

    Faster R-CNN is a Two-stage object detector:

    But we want to design the structure of end to end, eliminating the second stage. So we change the function of region proposal network to predict the class label.
    在这里插入图片描述

2.5 Results of objects detection

在这里插入图片描述

  • Two-stage method (Faster R-CNN) gets the best accuracy but are slower.
  • Single-stage methods (SSD) are much faster but don’t perform as well
  • Bigger backbones improve performance, but are slower
  • Diminishing returns for slower methods

在这里插入图片描述

These results are a few years old …since then GPUs have gotten faster, and we’ve improved performance with many tricks:

  • Train longer!
  • Multiscale backbone: Feature
    Pyramid Networks
  • Better backbone: ResNeXt
  • Single-Stage methods have improved
  • Very big models work better
  • Test-time augmentation pushes
    numbers up
  • Big ensembles, more data, etc

3. Summary

Reference

[1] RoI Pooling 系列方法介绍(文末附源码) - 知乎 (zhihu.com)

[2] Selective Search for Object Detection (C++ / Python) | LearnOpenCV

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/199071.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Android Studio常见问题

Run一直是上次的apk 内存占用太大,导致闪退

基于Python3的scapy解析SSL报文

scapy对于SSL的支持个人觉得不太好,至少在构造报文方面没有HTTP或者DNS这种常见的报文有效方便,但是scapy对于SSL的解析还是可以的。下面我们以一个典型的HTTPS的报文为例,展示scapy解析SSL报文。 一:解析ClientHello报文 from sc…

【zabbix监控三】zabbix之部署代理服务器

一、部署代理服务器 分布式监控的作用: 分担server的几种压力解决多机房之间的网络延时问题 1、搭建proxy主机 1.1 关闭防火墙,修改主机名 systemctl disbale --now firewalld setenforce 0 hostnamectl set-hostname zbx-proxy su1.2 设置zabbix下…

Docker 可视化面板 ——Portainer

Portainer 是一个非常好用的 Docker 可视化面板,可以让你轻松地管理你的 Docker 容器。 官网:Portainer: Container Management Software for Kubernetes and Docker 【Docker系列】超级好用的Docker可视化工具——Portainer_哔哩哔哩_bilibili 环境 …

服务注册发现 springcloud netflix eureka

文章目录 前言角色(三个) 工程说明基础运行环境工程目录说明启动顺序(建议):运行效果注册与发现中心服务消费者: 代码说明服务注册中心(Register Service)服务提供者(Pro…

三十二、W5100S/W5500+RP2040树莓派Pico<UPnP示例>

文章目录 1 前言2 简介2 .1 什么是UPnP?2.2 UPnP的优点2.3 UPnP数据交互原理2.4 UPnP应用场景 3 WIZnet以太网芯片4 UPnP示例概述以及使用4.1 流程图4.2 准备工作核心4.3 连接方式4.4 主要代码概述4.5 结果演示 5 注意事项6 相关链接 1 前言 随着智能家居、物联网等…

ESP32-BLE基础知识

一、存储模式 两种存储模式: 大端存储:低地址存高字节,如将0x1234存成[0x12,0x34]。小端存储:低地址存低字节,如将0x1234存成[0x34,0x12]。 一般来说,我们看到的一些字符串形式的数字都是大端存储形式&a…

服务器端请求伪造(SSRF)

概念 SSRF(Server-Side Request Forgery,服务器端请求伪造) 是一种由攻击者构造形成的由服务端发起请求的一个安全漏洞。一般情况下,SSRF是要攻击目标网站的内部系统。(因为内部系统无法从外网访问,所以要把目标网站当做中间人来…

Flutter 中在单个屏幕上实现多个列表

今天,我将提供一个实际的示例,演示如何在单个页面上实现多个列表,这些列表可以水平排列、网格格式、垂直排列,甚至是这些常用布局的组合。 下面是要做的: 实现 让我们从创建一个包含产品所有属性的产品模型开始。 …

Android描边外框stroke边线、rotate旋转、circle圆形图的简洁通用方案,基于Glide与ShapeableImageView,Kotlin

Android描边外框stroke边线、rotate旋转、circle圆形图的简洁通用方案,基于Glide与ShapeableImageView,Kotlin 利用ShapeableImageView专门处理圆形和外框边线的特性,通过Glide加载图片装载到ShapeableImageView。注意,因为要描边…

⑩⑤【DB】详解MySQL存储过程:变量、游标、存储函数、循环,判断语句、参数传递..

个人简介:Java领域新星创作者;阿里云技术博主、星级博主、专家博主;正在Java学习的路上摸爬滚打,记录学习的过程~ 个人主页:.29.的博客 学习社区:进去逛一逛~ MySQL存储过程 1. 介绍2. 使用3. 变量①系统变…

每天学习一点点之 Spring Web MVC 之抽象 HandlerInterceptor 实现常用功能(限流、权限等)

背景 这里介绍一下本文的背景(废话,可跳过)。上周有个我们服务的调用方反馈某个接口调用失败率很高,排查了一下,发现是因为这个接口被我之前写的一个限流器给拦截了,随着我们的服务接入了 Sentinel&#x…

MTK Pump Express 快速充电原理分析

1 MTK PE 1.1 原理 在讲正文之前,我们先看一个例子。 对于一块电池,我们假设它的容量是6000mAh,并且标称电压是3.7V,换算成Wh(瓦时)为单位的值是22.3Wh(6000mAh*3.7V);普通的充电器输出电压电流是5V2A(10W)&#xff0c…

RK3588平台开发系列讲解(项目篇)嵌入式AI的学习步骤

文章目录 一、嵌入式AI的学习步骤1.1、入门Linux1.2、入门AI 二、瑞芯微嵌入式AI2.1、瑞芯微的嵌入式AI关键词2.2、AI模型部署流程 沉淀、分享、成长,让自己和他人都能有所收获!😄 📢 本篇将给大家介绍什么是嵌入式AI。 一、嵌入…

Docker与Kubernetes结合的难题与技术解决方案

文章目录 1. **版本兼容性**技术解决方案 2. **网络通信**技术解决方案 3. **存储卷的管理**技术解决方案 4. **安全性**技术解决方案 5. **监控和日志**技术解决方案 6. **扩展性与自动化**技术解决方案 7. **多集群管理**技术解决方案 结语 🎈个人主页&#xff1a…

编程刷题网站以及实用型网站推荐

1、牛客网在线编程 牛客网在线编程https://www.nowcoder.com/exam/oj?page1&tab%E8%AF%AD%E6%B3%95%E7%AF%87&topicId220 2、力扣 力扣https://leetcode.cn/problemset/all/ 3、练码 练码https://www.lintcode.com/ 4、PTA | 程序设计类实验辅助教学平台 PTA | 程…

【Java 进阶篇】Ajax 实现——原生JS方式

大家好,欢迎来到这篇关于原生 JavaScript 中使用 Ajax 实现的博客!在前端开发中,我们经常需要与服务器进行数据交互,而 Ajax(Asynchronous JavaScript and XML)是一种用于创建异步请求的技术,它…

云原生专栏丨基于服务网格的企业级灰度发布技术

灰度发布(又名金丝雀发布)是指在黑与白之间,能够平滑过渡的一种发布方式。在其上可以进行A/B testing,即让一部分用户继续用产品特性A,一部分用户开始用产品特性B,如果用户对B没有什么反对意见,…

广州华锐互动VRAR:VR教学楼地震模拟体验增强学生防震减灾意识

在当今社会,地震作为一种自然灾害,给人们的生活带来了巨大的威胁。特别是在学校这样的集体场所,一旦发生地震,后果将不堪设想。因此,加强校园安全教育,提高师生的防震减灾意识和能力,已经成为了…

springboot中动态api如何设置

1.不需要编写controller 等mvc层,通过接口动态生成api。 这个问题,其实很好解决,以前编写接口,是要写controller,需要有 RestController RequestMapping("/test1") public class xxxController{ ApiOperat…