[4] RCNN Object Detection

Coding/Image

by linguana 2021. 4. 12. 19:56

www.pyimagesearch.com/2020/07/13/r-cnn-object-detection-with-keras-tensorflow-and-deep-learning/

R-CNN object detection with Keras, TensorFlow, and Deep Learning - PyImageSearch

In this tutorial, you will learn how to build an R-CNN object detector using Keras, TensorFlow, and Deep Learning.

www.pyimagesearch.com

복 습

Selective search (이하 SS 알고리즘) → image of pyramid & sliding window 대체
Pretrained model → classification
Confidence filtering & NMS (non-maxima suppression)

꽤나 잘 작동했지만 좀 시원치 않은 부분이 있음.

커스텀 데이터에 객체탐기지(object detection)를 훈련시키고 싶으면 어떻게 해야하나?
SS 알고리즘을 어떻게 훈련시켜야 하나?
SS 알고리즘을 이용하면 객체 탐지에 어떤 영향을 미치나?

이번 튜토리얼을 통해 이러한 질문들에 답을 해보도록 하겠다.

목 차:

RCNN 객체 탐지기를 구현하기 위한 순서
예시 데이터셋 탐색 및 파일 구조 소개
설정(configuration) 파일과 객체 탐지 정확도를 측정해줄 유틸리티 함수 IoU (Intersection over Union) 살펴보기
SS 알고리즘을 적용하여 객체 탐지기 데이터셋 구축 (후처리 로직을 같이 사용하면 인풋 이미지에서 우리가 원하는 대상이 포함되어있는지 여부를 알려줄 것임)

이렇게 얻은 regions를 훈련 데이터로 사용하여 이미지넷에 미세조정된 MobileNet에 넣어 데이터셋에 있는 객체를 분류 및 탐지할 것이다. (+ 시각화도 할 예정)

가. RCNN Object Detector를 구현하기 위한 순서

<그림1. RCNN 객체 탐지기 구현하는 전체적인 과정> 1. SS 알고리즘을 통해 객체 탐지를 위한 데이터셋 구축 2. 해당 데이터셋에 분류 모델을 미세조정 3. 추론 과정의 일부로 SS 알고리즘을 인풋 이미지에 적용 4. 훈련된 모델을 활용하여 각 후보 지역에 대해 예측 실시 5. NMS 적용 6. 최종적으로 객체 탐지 결과 반환

RCNN 객체 탐지기를 구현하는 것은 여러 순서를 거쳐야 하는 복잡한 과정이다.

선행되어야 할 지식: (1) SS 알고리즘 작동 방법, (2) 객체 탐지기에서 RPN의 역할, (3) 모델 미세조정 방법

위 그림에 표시된 6단계 순서대로 RCNN 객체 탐지기를 구현할 수 있다.

나. 예시 데이터셋 탐색

그림2. Dat Tran. (dat-tran.com)이 제시한 라쿤 데이터셋

200개의 이미지에 217마리의 라쿤 있음. (한 이미지에는 여러 마리의 라쿤이 있을 수 있음)

다. 파일구조 소개

raccoons 폴더 아래에 있는 파일들은 (annotations, images)는 Dat Tran이 만든 것인데, build_dataset.py이 만들어낸 dataset 폴더 아래에 있는 파일들과 헷갈리면 안된다. dataset 폴더 아래에 있는 것들은 MobileNet V2 모델을 미세조정하여 라쿤 분류기를 만들기 위해 준비된 것이다. (raccoons 파일: http://github.com/datitran/raccoon_dataset)

pyimagesearch폴더 아래에는 다음의 모듈이 있다:

config.py: 설정 환경이 저장된 스크립트
iou.py: 객체 탐지기의 성능을 평가해줄 지표인 IoU를 계산해줌
nms.py: NMS 알고리즘을 수행하여 객체 주변에 겹쳐지는 박스를 제거해줌

위 파일들은 아래의 파이썬 스크립트의 보조적 역할을 한다:

build_dataset.py: raccoons 폴더 아래에 있는 파일을 가져와 dataset 폴더 아래에 있는 파일을 구성함
fine_tune_rcnn.py: 미세조정을 통해 라쿤 분류 모델을 훈련시킴
detect_object_rcnn.py: 모든 부분을 통합해서 기초적인 R-CNN 객체 탐지를 실시. 주된 요소는 SS 알고리즘과 분류임. (이 튜토리얼의 코드는 SS 알고리즘이 모델 내부에 장착된 end-to-end R-CNN을 구현하지는 않는다)

NMS 알고리즘은 여기서 설명하지 않으니 다음 링크를 통해 알아보길: Non-Maximum Suppression for Object Detection in Python - PyImageSearch

라. 설정(config) 파일

설정 파일에는 주요 상수(constant)와 환경(setting)이 담겨있다.

pyimagesearch 모듈에서 config.py 파일을 열어보자:

config.py Dat Tran의 오리지널 이미지에 접근하는 경로 설정

raccoons 폴더에 있는 파일에 대한 경로를 설정해준다. 이어서 다음과 같이 작성하자:

config.py 데이터 생성기 스크립트에 넣고 만들어질 데이터에 대한 경로 설정

dataset 폴더 아래에 있을 파일에 대한 경로를 설정한다. 라쿤 이미지가 들어있는 폴더 racoon, 라쿤 이미지가 없는 폴더no_raccoon에 대한 경로이다. 이 경로는 build_dataset.py 스크립트에서 활용될 것이다.

다음으로 SS 알고리즘에서 훈련과 추론에서 각 사용될 후보 지역(proposals)의 최대값을 설정한다.

config.py MAX_PROPOSALS: 훈련에 사용될 후보 지역의 최대 개수; MAX_PROPOSALS_INFER: 추론에 사용될 후보 지역의 최대 개수

다음으로 데이터셋으로 생성될 이미지의 개수를 설정한다.

config.py MAX_POSITIVE: 라쿤 있는 이미지; MAX_NEGATIVE: 라쿤 없는 이미지

마지막으로 모델에 필요한 상수를 설정한다.

config.py INPUT_DIMS: 이미지 사이즈; MODEL_PATH: h5 출력 경로; ENCODER_PATH: pickle 출력 경로; MIN_PROBA: 긍정값으로 인정되기 위한 최소 신뢰값

MobileNet에서 요구하는 사이즈 (224, 224)로 이미지 사이즈를 설정하고 (28), 라쿤 분류기(h5파일)와 레이블 인코더를 출력할 경로 설정하며 (31-32), 예측에서 라쿤이 있는 것으로 판단된 이미지가 넘어야 할 최소신뢰도를 설정한다 (36).

마. 객체 탐지 정확도를 측정해줄 유틸리티 함수 IoU (Intersection over Union)

<그림 3> 교통표시판을 탐지한 예시 이미지. 예측된 바운딩 박스는 빨간색이고 정답(ground-truth) 박스는 초록색이다. 우리의 목표는 이 두 박스가 겹치는 비율인 IoU를 계산하는 것이다.

우리의 객체 탐지기가 그리는 바운딩 박스의 성능을 측정하기 위해선 IoU라는 지표를 활용해야 한다.

IoU는 예측한 바운딩 박스가 실제(혹은 정답) 박스의 합을 분모로, 두 박스의 겹치는 부분을 분자로 가진다.

위처럼 보듯이 IoU는 단순한 비율값이란 걸 알 수 있다. (이름 자체로 그 의미를 담고 있다: Intersection over Union) 이 지표는 객체 탐지기의 정확도를 측정하게 해준다. IoU에 대한 더 많은 내용은 다음 링크에서 확인하자: Intersection over Union (IoU) for object detection - PyImageSearch

다음은 IoU를 구현한 코드이다 (pyimagesearch 경로 아래에 있는 iou.py 파일):

compute_iou 함수는 두 파라미터(boxA, boxB)를 받는다. boxA는 정답 박스이고, boxB는 예측된 바운딩 박스이다. 파라미터의 순서는 계산 상 무의미하다. (참고: boxA 요소(startX, startY, endX, endY), boxB 마찬가지)

각 바운딩 박스의 우측 상단 지점과 좌측 하단 지점의 좌표(x, y)를 구한다 (3-6). 이 좌표를 이용해서 우리는 겹치는 부분(9, 공식의 분자)과 두 박스의 각 크기를 구해 (13-14) IoU를 계산할 수 있다 (19).

바. 본격적인 RCNN 객체 탐지기 구현

(Step 1) 객체 탐지를 위한 데이터셋 구축

<그림3 데이터셋 구축> 1. raccoons 데이터 이미지의 경로를 인풋으로 받아 2. 데이터셋에 있는 이미지에 대해 반복문 수행 2a. 인풋 이미지 로드 2b. 이미지 로드 및 바운딩 박스 파싱 3. 인풋 이미지에 대해 SS 알고리즘 수행 4. 정답(groud-truth) 이미지와 충분히 겹치는 바운딩 박스 확인 5. 정답과 겹치는 부분과 안 겹치는 부분을 저장

RCNN 객체 탐지기를 만들기 전에 먼저 데이터셋을 구축해야 한다. (그림 1의 첫번째 단계)

build_dataset.py 스크립트는 그림 3의 과정을 수행하여 데이터셋을 만든다.

하위 폴더에 있는 모듈을 임포트 하고 (2-3), 필요한 도구들(BeautifulSoup, imutils, cv2 등)을 가져온다.

다음으로 라쿤 이미지를 담을 두 경로를 만들어준다 (10-13):

raccoons/images 경로에 있는 이미지 리스트를 imagePaths 변수에 담는다(16).

라쿤이 있는 이미지와 라쿤이 없는 이미지의 개수를 기록할 변수 두 개를 초기화한다(20-21).다음으로 imagePaths에 대한 반복문을 실행한다:

라인 31-34에서 이미지 경로를 통해 어노테이션된 XML 파일에 접근한다. (참조: COCO and Pascal VOC data format for Object detection | by Renu Khandelwal | Towards Data Science) 이 XML파일은 현재 우리가 사용하는 이미지에 대한 정답 객체의 위치 레이블을 담고 있다.

38-39에서는 XML 객체를 로드하고 파싱한다.

gtBoxes 리스트 변수에는 데이터셋의 정답 객체의 위치를 담게 된다(40).

XML파일에서 처음으로 추출할 정보는 이미지의 가로와 높이다(43-44).

다음으로 XML파일에 있는 <object> 요소로부터 바운딩 박스의 좌표를 가져온다.

위 반복문을 통해서:

XML파일로부터 label과 바운딩 박스의 좌표를 추출하고(49-53)
바운딩 박스의 좌표가 이미지의 밖에 위치하지 않도록 확인하고 넘어간다면 잘라내고(57-60).
정답 바운딩 박스를 담은 리스트를 업데이트한다(63).

이 시점에서 우리는 이미지를 로딩하여 SS 알고리즘을 시행한다.

데이터셋으로부터 이미지를 로드하고(66), 후보 지역(region proposals)를 찾기 위해 SS 알고리즘을 수행하고 (70-73), proposedRects 리스트 변수에 결과를 추가시킨다(74-80).

현 단계까지 진행했을 때 우리는 (1) 정답 바운딩 박스와 (2) SS 알고리즘을 통해 제시된 지역 후보을 확보하였다. 이제는 IoU를 적용하여 어떤 지역 후보가 정답 바운딩 박스와 충분히 겹치는지 판별할 수 있다.

위에서 다음과 같은 변수를 확인할 수 있다:

positiveROIs: 현재 이미지에 대해서 (1) 충분히 정답 바운딩 박스와 겹쳐져서 (2) config.POSITIVE_PATH에 지정된 경로에 저장된 지역 후보의 개수
negativeROIs: 현재 이미지에 대해서 (1) 정답 바운딩 박스와 IoU가 70%에 미치지 못하여 (2) config.NEGATIVE_PATH에 지정된 경로에 저장된 지역 후보의 개수

위 두 변수는 84-85에서 초기화되었다.

88부터 우리는 SS 알고리즘으로부터 생성된 지역 후보에 대한 반복문을 (config파일에 설정한 값에 따라 최대 2000개의 지역 후보에 대해) 시행한다. 이 반복문 안에는:

SS 알고리즘이 생성한 바운딩 박스의 좌표를 가져와서 (90)
정답 바운딩박스에 대한 반복문을 시행하는데 (93)
두 박스에 대한 IoU를 계산하여 해당 지역 후보가 positive ROI인지 negative ROI인지 구분한다 (96).
그리고 나서 다음 단계를 위한 roi와 outputPath를 초기화한다 (100-101).

이 반복문에서 proposedRect와 gtBox가 positive ROI인지 판별해보자:

해당 지역 후보가 IoU > 70% 기준을 통과했고 아직 반복문에서 아직 config.MAX_POSITIVE에 도달하지 않았다고 할 때 (105), 다음과 같은 처리를 진행한다:

넘파이 슬라이싱을 통해 positive roi를 추출하고 (108)
해당 ROI가 출력될 outputPath를 만들어주고 (109-111)
카운터를 증가시켜준다 (114-115)

한편, 이 proposedRect와 gtBox 쌍이 진짜로 negative ROI인지 판별하기 위해 먼저 full overlap 상황인지 확인해보자:

만약 제시된 지역 후보의 바운딩박스 (proposedRect)가 완전히 정답 바운딩 박스 (gtBox) 안에 위치하게 되면 우리는 이것을 fullOverlap이라고 판단한다. 119-122에서 fullOverlap의 상황인지 확인한다.

여기까지 했으면 이제 proposedRect와 gtBox가 negative ROI인지 확인할 수 있다:

127-128에서 제시된 조건문은 다음을 확인한다:

fullOverlap이 아닌지
IoU가 충분히 작은지 (5% 미만)
현재 이미지에 대한 negative ROI의 한계에 도달했는지

만약 모든 조건이 충족되면,

해당 negative ROI (131)을 추출하고
해당 negative ROI가 저장될 경로를 설정하고 (132-134)
negative 카운터를 증가시킨다 (137-138)

이제 데이터셋을 만드는 마지막 단계에 왔다. 현재 roi를 적절한 경로로 출력하면 된다.

해당 ROI와 그에 대한 경로가 None이 아니라고 할 때 (141), ROI를 CNN 입력 사이즈에 맞게 크기를 조정해주고 ROI를 저장해준다 (145-147).

한가지 유의할 점은 각 ROI의 ouputPath는 config.POSITIVE_PATH (혹은 config.NEGATIVE_PATH) 및 현재 totalPositive (혹은 totalNegative)와 연관되어 있음을 기억하자.

따라서 ROI는 dataset/raccoon 혹은 dataset/no_raccoon 경로에 저장된다.

데이터셋 구축 (build_dataset.py 실행하기)

이제 우리는 RCNN 객체 탐지기를 만들기 위한 준비가 끝났다. 터미널을 열고 다음 커맨드를 실행하자:

위에서 볼 수 있듯이, 200개 이미지에 대해서 SS 알고리즘을 돌리는 것은 5분 42초가 소요되었다.

dataset 폴더 아래에 있는 raccoons와 no_raccoons 하위 폴더를 확인해보면, raccoon에 1,560개의 이미지와 no raccoon에 2,200개의 이미지가 생성된 것을 확인할 수 있다.

위 예시는 맥 환경에서 했기 때문에 ls를 활용했다. 윈도우는 dir을 사용해서 확인하자.

아래는 몇 가지의 예시다.

위 <그림 6>의 왼쪽을 보면 no raccoon 클래스에는 SS 알고리즘이 생성한 이미지 중에 정답 바운딩 박스와 유의미하게 겹치지 않은 이미지만 있는 것을 확인할 수 있다. 그리고 <그림 6> 오른쪽에는 raccoon 이미지가 있다.

보다보면 이 이미지들 중 일부는 서로 닮았고 어떤 경우에는 거의 복사본에 가까운 것을 알아차릴 수 있다. (예를 들어 raccoon의 (1,3) 이미지와 (3,5) 이미지) 그런데 이것은 의도한 바이다.

다시 기억을 되살려보면, SS 알고리즘은 우리가 찾고 싶은 객체가 잠재적으로 존재할 이미지 영역을 식별하려고 한다. 그렇기 때문에 SS 알고리즘이 비슷한 구역에서 비슷한 이미지를 여러 개 출력할 가능성은 다분하다.

필요에 따라 (이 튜토리얼에서 했듯이) 이 비슷한 이미지들을 다 남겨놓을 수도 있고 혹은 추가적인 로직을 덧붙여서 상당히 많이 겹치는 이미지를 걸러낼 수도 있다. (이건 직접 해보기 바란다)

(Step 2) 분류 모델 미세조정

데이터셋이 구축되었으니 이제 CNN 분류기를 미세조정하여 두 클래스를 식별할 수 있도록 할 준비가 되었다.

SS 알고리즘과 이 분류기를 조합하면, 우리는 RCNN 객체 탐지기를 만들 수 있을 것이다.

이 튜토리얼의 목적에 따라, ImageNet 데이터셋에 있는 1,000개의 클래스에 이미 훈련된 MobileNet V2 CNN 네트워크를 미세조정하기로 했다. 만약 전이학습(transfer learning)과 미세조정(fine-tuning)에 대한 개념이 낯설다면 다음을 읽어보길 권한다:Transfer Learning with Keras and Deep Learning - PyImageSearch, Fine-tuning with Keras and Deep Learning - PyImageSearch

MobileNet을 미세조정하면 raccoon과 no_raccoon 클래스를 구별할 수 있는 분류기를 만들 수 있을 것이다.

준비가 되었다면, fine_tune_rcnn.py 파일을 열어 시작해보도록 하자:

필요한 패키지를 위와 같이 임포팅하도록 한다. 각 패키지의 용도는 다음과 같다:

config: 파일 경로와 상수가 설정되어 있는 파일
ImageDataGenerator: 데이터 증강을 위해 사용
MobileNetV2: 케라스에 내장되어 있는 모델. 미세조정하기 위해 우리는 ImageNet에 이미 훈련된 가중치를 불러올 것이고, 모델의 윗부분을 잘라 다른 것으로 교체하고 성능이 좋을 때까지 훈련시킬 것이다.
tensorflow.keras.layers: MobileNet V2를 대체할 레이어들을 불러온다
Adam: SGD (Stochastic Gradient Descent)를 대체하는 옵티마이저
LabelBinarizer & to_categorical: 클래스 레이블에 대한 원핫인코딩을 하기 위해 사용
train_test_split: 데이터셋을 훈련셋과 테스트셋으로 나눠줌
classification_report: 모델의 평가 결과를 통계적으로 요약해줌
matplotlib: 정확도/손실 곡선을 그려줄 도구

임포팅이 끝났으니 cmd 아규먼트를 파싱하고 하이퍼파라미터를 설정해주자:

--plot 아규먼트는 정확도/손실 도식에 대한 경로를 정의해준다 (27-30)

다음으로 우리는 초기 학습률(initial learning rate), 훈련 에포크(training epochs), 배치 크기(batch size)등 같은 훈련 하이퍼파라미터를 설정해준다 (34-36).

데이터셋을 불러오는 것은 간단하다:

우리의 새로운 데이터가 config.BASE_PATH에 정의된 경로에 저장되는 점을 상기하자. 41에선 베이스 경로 및 하위 경로에 있는 모든 imagePaths를 가져온다.

거기로부터 data와 labels 리스트를 꾸려나가려고 하는데 (42-43), imagePaths에 대한 반복문을 실시한다:

특정 이미지의 클래스 레이블 label을 경로로부터 직접적으로 추출한다 (48)
이미지를 로딩하고 전처리하는데 MobileNet V2에 맞는 인풋 차원을 설정한다 (51-53)
image와 label을 data와 labels 리스트에 추가한다 (56-57)

데이터를 준비하기 위해 몇 가지 더 해야 할 것이 있다:

여기에선 다음과 같은 처리를 한다:

data와 label 리스트를 넘파이 배열로 변환하고 (60-61)
labels를 원핫인코딩하고 (64-66)
훈련셋과 테스트셋으로 분리하고 (70-71)
이미지를 랜덤하게 변형하여 모델의 일반화 성능을 향상시키기 위해 데이터 증강 객체를 초기화한다 (74-81)

데이터가 준비되었기 때문에 MobileNet V2를 미세조정하도록 하자:

MobileNet V2가 미세조정되도록 하려면 다음과 같이 해야 한다:

이미지넷 데이터셋에 사전 훈련된 MobileNet을 로드하는데 완전연결(fully-connect)층은 뺀다
새로운 완전연결층을 만든다
새로운 완전연결층을 기반층인 MobileNet에 연결하여 우리의 model을 만든다
기반층인 MobileNet을 동결하여 훈련이 안 되도록 설정한다.

이 코드 블록에서 한 것이 무엇인지 살펴보도록 하자. 우리 모델의 기반층인 MobileNet은 사전훈련된 가중치가 있고 이는 동결되었다. 우리는 우리 모델의 머리(head)부분만 훈련시킬 것이다. 우리의 머리 부분은 raccoon과 no_raccoon 클래스에 대응하는 2개의 출력을 가진 Softmax 분류기이다.

현재까지, 우리는 데이터를 로드했고, 데이터 증강 객체를 초기화했고, 미세조정을 위한 준비를 마쳤다. 이제는 모델을 미세조정할 차례이다.

먼저 모델을 binary crossentropy 로스에 대한 Adam 옵티마이저로 컴파일한다.

※ 주 의: 만약 이 코드를 3개 이상의 클래스에 대해 훈련하기 위해 사용하고 있다면 다음과 같이 하도록 한다: (1) 로스를 "categorical_crossentropy"로 사용하고 (109-110), (2) 클래스 개수에 맞게 Softmax 분류기의 출력을 조정하도록 한다 (95).

훈련을 114-119에서 실시하도록 하자. 텐서플로 2.0 버전이 출시된 이후 fit 메소드가 데이터 증강 제너레이터를 다룰 수 있게 되었다. (이전에는 fit_generator 메소드를 사용함) 두 메소드에 대한 상세한 내용은 다음을 참고하도록 하자: How to use Keras fit and fit_generator (a hands-on tutorial) - PyImageSearch

모델 훈련이 끝나면 이제는 모델을 테스트셋에 평가할 수 있다.

123에선 테스트셋에 대한 예측을 실시하고, 127에선 가장 높은 예측 확률을 보인 모든 레이블에 대한 인덱스를 가져온다.

그리고선 터미널에 통계분석을 위해 classification_report를 출력한다 (130-131)

훈련된 모델과 레이블 인코더를 저장하도록 하자.

135에서 모델을 저장한다. 텐서플로 2.0 버전 이상에서는 명시적으로 save_format="h5"를 설정해주는 것을 추천한다 (hdf5 형식).

레이블 인코더는 pickle 형식으로 저장된다 (139-141).

마지막으로 훈련 히스토리부터 정확도/손실 곡선을 그려보도록 하자:

matplotlib을 활용하여 우리는 시각화를 할 수 있다 (144-154). 최종적인 도표를 커맨드 라인 --plot 아규먼트로 설정된 경로에 저장한다.

분류 모델 미세조정 실시

터미널에서 다음과 같이 실행한다:

약 99퍼센트의 정확도를 보인다

도표에서 보듯이 잘 훈련된 것을 볼 수 있다.

현재까지 한 과정을 되돌아보면 총 6단계 중 2 단계를 완성하였다.

(1) SS 알고리즘을 통해 데이터셋 구축 (2) 분류기 모델을 데이터셋에 미세조정을 했으니, 이제는 3-6단계를 할 차례다.

이제 훈련된 모델을 활용하여 새로운 이미지에 대해서 예측을 하면 되겠다.

(Step 3) 새로운 이미지에 대해 예측 실시

detect_object_rcnn.py를 실시하자:

여기까지 따라왔다면 나머지 코드는 새로울 것이 없지만 2에 등장하는 non_max_suppression은 한 번 주목해볼만 하다. NMS에 대한 상세한 설명은 다음 링크를 보도록 하자: Non-Maximum Suppression for Object Detection in Python - PyImageSearch

우리의 코드는 커맨드라인 아규먼트 --image를 통해 인풋 이미지 경로를 표시해준다 (14-17).

이제 (1) 모델 로드, (2) 이미지 로드, (3) SS 알고리즘 수행을 해보자:

먼저 미세조정된 라쿤 모델과 관련된 label binarizer을 로드한다 (21-22).

그리고 나선 인풋 이미지를 로딩하고 필요한 사이즈로 재조정한다 (25-26).

다음으로, 지역후보를 생성하기 위해 이미지에 대해 SS 알고리즘을 수행한다 (31-34).

이제 proposal ROI를 추출하고 전처리를 할 것이다.

첫째로, ROI를 담을 proposals 리스트와 바운딩 박스의 (x, y) 좌표를 담을 boxes 리스트를 초기화한다 (38-39).

SS 알고리즘으로 생성된 지역후보 바운딩 박스에 대해서 반복문을 정의하고 (43), 반복문 안에서 roi를 (build_dataset.py와 마찬가지로) 넘파이 슬라이싱과 전처리를 통해 추출한다 (47-54).

roi와 (x, y) 좌표를 proposals, boxes 리스트에 추가한다 (57-58).

다음으로, 모든 proposals에 대해서 분류를 실시하자:

61-62에선 proposals와 boxes를 넘파이 배열로 변환하고 int32로 설정해준다.

모델을 이용하여 proposals에 대해서 예측을 하도록 한다 (67).

정리하자면 우리는 객체 탐지를 하기 위해 SS 알고리즘으로 만들어진 지역후보들에 대해서 분류기를 사용했다. boxes는 원본 인풋 --image (raccoon일수도 있고 no_raccoon일수도 있음)에 대한 바운딩 박스의 위치(=좌표) 정보를 담고 있다. 이제 남은 코드는 예측한 라쿤 바운딩 박스의 위치와 관련된 레이블을 원본 이미지에 표시해주는 것이다.

no_raccoon에 대한 결과는 버리고 raccoon 결과만 남기도록 하자:

이를 하기 위해 다음과 같이 하였다:

raccoon에 대해서 양성으로 나온 예측을 모두 추출하고 (72-73)
인덱스를 이용하여 바운딩박스와 클래스 레이블에 대한 확률값을 추출하고 (77-78)
최소 확률값을 지정하여 결과값을 걸러내었다 (82-84).

NMS를 적용하기 이전을 시각화해보자:

바운딩박스와 확률값에 대한 반복문에서 (90) 다음을 수행하였다:

바운딩박스의 좌표를 추출하고 (92)
바운딩박스에 대한 직사각형을 그리고 (93-94)
레이블과 확률값을 text를 이용해 바운딩박스의 좌측상단에 표기하였다 (95-98)

NMS를 적용한 이후를 시각화해보자:

NMS를 적용하여 객체 주위에 여러 겹쳐있는 직사각형을 효과적으로 제거하였다 (104).

107-119에서 바운딩 박스, 레이블, 확률을 표시하였다.

이것으로 기초적인 R-CNN 객체 탐지기를 구현은 완성되었다!

새로운 이미지에 대해 예측 실시

cmd에서 다음과 같이 실행해보자:

다음과 같은 결과를 확인할 수 있다:

나머지 raccoon_02.jpg 및 raccoon_03.jpg에 대해서 똑같이 실행해볼 수 있다.

코드 정리:

1. config.py

# import the necessary packages
import os

# define the base path to the *original* input dataset and then use
# the base path to derive the image and annotations directories
ORIG_BASE_PATH = "raccoons"
ORIG_IMAGES = os.path.sep.join([ORIG_BASE_PATH, "images"])
ORIG_ANNOTS = os.path.sep.join([ORIG_BASE_PATH, "annotations"])

# define the base path to the *new* dataset after running our dataset
# builder scripts and then use the base path to derive the paths to
# our output class label directories
BASE_PATH = "dataset"
POSITVE_PATH = os.path.sep.join([BASE_PATH, "raccoon"])
NEGATIVE_PATH = os.path.sep.join([BASE_PATH, "no_raccoon"])

# define the number of max proposals used when running selective
# search for (1) gathering training data and (2) performing inference
MAX_PROPOSALS = 2000
MAX_PROPOSALS_INFER = 200

# define the maximum number of positive and negative images to be
# generated from each image
MAX_POSITIVE = 30
MAX_NEGATIVE = 10

# initialize the input dimensions to the network
INPUT_DIMS = (224, 224)

# define the path to the output model and label binarizer
MODEL_PATH = "raccoon_detector.h5"
ENCODER_PATH = "label_encoder.pickle"

# define the minimum probability required for a positive prediction
# (used to filter out false-positive predictions)
MIN_PROBA = 0.99

2. iou.py

def compute_iou(boxA, boxB):
	# determine the (x, y)-coordinates of the intersection rectangle
	xA = max(boxA[0], boxB[0])
	yA = max(boxA[1], boxB[1])
	xB = min(boxA[2], boxB[2])
	yB = min(boxA[3], boxB[3])
    
	# compute the area of intersection rectangle
	interArea = max(0, xB - xA + 1) * max(0, yB - yA + 1)
    
	# compute the area of both the prediction and ground-truth
	# rectangles
	boxAArea = (boxA[2] - boxA[0] + 1) * (boxA[3] - boxA[1] + 1)
	boxBArea = (boxB[2] - boxB[0] + 1) * (boxB[3] - boxB[1] + 1)
    
	# compute the intersection over union by taking the intersection
	# area and dividing it by the sum of prediction + ground-truth
	# areas - the intersection area
	iou = interArea / float(boxAArea + boxBArea - interArea)
    
	# return the intersection over union value
	return iou

3. build_dataset.py

# import the necessary packages
from pyimagesearch.iou import compute_iou
from pyimagesearch import config
from bs4 import BeautifulSoup
from imutils import paths
import cv2
import os

# loop over the output positive and negative directories
for dirPath in (config.POSITVE_PATH, config.NEGATIVE_PATH):
	# if the output directory does not exist yet, create it
	if not os.path.exists(dirPath):
		os.makedirs(dirPath)
        
# grab all image paths in the input images directory
imagePaths = list(paths.list_images(config.ORIG_IMAGES))

# initialize the total number of positive and negative images we have
# saved to disk so far
totalPositive = 0
totalNegative = 0

# loop over the image paths
for (i, imagePath) in enumerate(imagePaths):
	# show a progress report
	print("[INFO] processing image {}/{}...".format(i + 1,
		len(imagePaths)))
        
	# extract the filename from the file path and use it to derive
	# the path to the XML annotation file
	filename = imagePath.split(os.path.sep)[-1]
	filename = filename[:filename.rfind(".")]
	annotPath = os.path.sep.join([config.ORIG_ANNOTS,
		"{}.xml".format(filename)])
        
	# load the annotation file, build the soup, and initialize our
	# list of ground-truth bounding boxes
	contents = open(annotPath).read()
	soup = BeautifulSoup(contents, "html.parser")
	gtBoxes = []
    
	# extract the image dimensions
	w = int(soup.find("width").string)
	h = int(soup.find("height").string)
    
    # loop over all 'object' elements
	for o in soup.find_all("object"):
		# extract the label and bounding box coordinates
		label = o.find("name").string
		xMin = int(o.find("xmin").string)
		yMin = int(o.find("ymin").string)
		xMax = int(o.find("xmax").string)
		yMax = int(o.find("ymax").string)
        
		# truncate any bounding box coordinates that may fall
		# outside the boundaries of the image
		xMin = max(0, xMin)
		yMin = max(0, yMin)
		xMax = min(w, xMax)
		yMax = min(h, yMax)
        
		# update our list of ground-truth bounding boxes
		gtBoxes.append((xMin, yMin, xMax, yMax))
        
	# load the input image from disk
	image = cv2.imread(imagePath)
    
	# run selective search on the image and initialize our list of
	# proposed boxes
	ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation()
	ss.setBaseImage(image)
	ss.switchToSelectiveSearchFast()
	rects = ss.process()
	proposedRects= []
    
	# loop over the rectangles generated by selective search
	for (x, y, w, h) in rects:
		# convert our bounding boxes from (x, y, w, h) to (startX,
		# startY, startX, endY)
		proposedRects.append((x, y, x + w, y + h))
        
	# initialize counters used to count the number of positive and
	# negative ROIs saved thus far
	positiveROIs = 0
	negativeROIs = 0
    
	# loop over the maximum number of region proposals
	for proposedRect in proposedRects[:config.MAX_PROPOSALS]:
		# unpack the proposed rectangle bounding box
		(propStartX, propStartY, propEndX, propEndY) = proposedRect
        
		# loop over the ground-truth bounding boxes
		for gtBox in gtBoxes:
			# compute the intersection over union between the two
			# boxes and unpack the ground-truth bounding box
			iou = compute_iou(gtBox, proposedRect)
			(gtStartX, gtStartY, gtEndX, gtEndY) = gtBox
            
			# initialize the ROI and output path
			roi = None
			outputPath = None
            
			# check to see if the IOU is greater than 70% *and* that
			# we have not hit our positive count limit
			if iou > 0.7 and positiveROIs <= config.MAX_POSITIVE:
				# extract the ROI and then derive the output path to
				# the positive instance
				roi = image[propStartY:propEndY, propStartX:propEndX]
				filename = "{}.png".format(totalPositive)
				outputPath = os.path.sep.join([config.POSITVE_PATH,
					filename])
                    
				# increment the positive counters
				positiveROIs += 1
				totalPositive += 1
                
			# determine if the proposed bounding box falls *within*
			# the ground-truth bounding box
			fullOverlap = propStartX >= gtStartX
			fullOverlap = fullOverlap and propStartY >= gtStartY
			fullOverlap = fullOverlap and propEndX <= gtEndX
			fullOverlap = fullOverlap and propEndY <= gtEndY
            
			# check to see if there is not full overlap *and* the IoU
			# is less than 5% *and* we have not hit our negative
			# count limit
			if not fullOverlap and iou < 0.05 and \
				negativeROIs <= config.MAX_NEGATIVE:
				# extract the ROI and then derive the output path to
				# the negative instance
				roi = image[propStartY:propEndY, propStartX:propEndX]
				filename = "{}.png".format(totalNegative)
				outputPath = os.path.sep.join([config.NEGATIVE_PATH,
					filename])
                    
				# increment the negative counters
				negativeROIs += 1
				totalNegative += 1
                
			# check to see if both the ROI and output path are valid
			if roi is not None and outputPath is not None:
				# resize the ROI to the input dimensions of the CNN
				# that we'll be fine-tuning, then write the ROI to
				# disk
				roi = cv2.resize(roi, config.INPUT_DIMS,
					interpolation=cv2.INTER_CUBIC)
				cv2.imwrite(outputPath, roi)

4. fine_tune_rcnn.py

# import the necessary packages
from pyimagesearch import config
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.layers import AveragePooling2D
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import pickle
import os

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-p", "--plot", type=str, default="plot.png",
	help="path to output loss/accuracy plot")
args = vars(ap.parse_args())

# initialize the initial learning rate, number of epochs to train for,
# and batch size
INIT_LR = 1e-4
EPOCHS = 5
BS = 32

# grab the list of images in our dataset directory, then initialize
# the list of data (i.e., images) and class labels
print("[INFO] loading images...")
imagePaths = list(paths.list_images(config.BASE_PATH))
data = []
labels = []

# loop over the image paths
for imagePath in imagePaths:
	# extract the class label from the filename
	label = imagePath.split(os.path.sep)[-2]
    
	# load the input image (224x224) and preprocess it
	image = load_img(imagePath, target_size=config.INPUT_DIMS)
	image = img_to_array(image)
	image = preprocess_input(image)
    
	# update the data and labels lists, respectively
	data.append(image)
	labels.append(label)
    
# convert the data and labels to NumPy arrays
data = np.array(data, dtype="float32")
labels = np.array(labels)

# perform one-hot encoding on the labels
lb = LabelBinarizer()
labels = lb.fit_transform(labels)
labels = to_categorical(labels)

# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
(trainX, testX, trainY, testY) = train_test_split(data, labels,
	test_size=0.20, stratify=labels, random_state=42)
    
# construct the training image generator for data augmentation
aug = ImageDataGenerator(
	rotation_range=20,
	zoom_range=0.15,
	width_shift_range=0.2,
	height_shift_range=0.2,
	shear_range=0.15,
	horizontal_flip=True,
	fill_mode="nearest")

# load the MobileNetV2 network, ensuring the head FC layer sets are
# left off
baseModel = MobileNetV2(weights="imagenet", include_top=False,
	input_tensor=Input(shape=(224, 224, 3)))
    
# construct the head of the model that will be placed on top of the
# the base model
headModel = baseModel.output
headModel = AveragePooling2D(pool_size=(7, 7))(headModel)
headModel = Flatten(name="flatten")(headModel)
headModel = Dense(128, activation="relu")(headModel)
headModel = Dropout(0.5)(headModel)
headModel = Dense(2, activation="softmax")(headModel)

# place the head FC model on top of the base model (this will become
# the actual model we will train)
model = Model(inputs=baseModel.input, outputs=headModel)

# loop over all layers in the base model and freeze them so they will
# *not* be updated during the first training process
for layer in baseModel.layers:
	layer.trainable = False
    
# compile our model
print("[INFO] compiling model...")
opt = Adam(lr=INIT_LR)
model.compile(loss="binary_crossentropy", optimizer=opt,
	metrics=["accuracy"])
    
# train the head of the network
print("[INFO] training head...")
H = model.fit(
	aug.flow(trainX, trainY, batch_size=BS),
	steps_per_epoch=len(trainX) // BS,
	validation_data=(testX, testY),
	validation_steps=len(testX) // BS,
	epochs=EPOCHS)
    
# make predictions on the testing set
print("[INFO] evaluating network...")
predIdxs = model.predict(testX, batch_size=BS)

# for each image in the testing set we need to find the index of the
# label with corresponding largest predicted probability
predIdxs = np.argmax(predIdxs, axis=1)

# show a nicely formatted classification report
print(classification_report(testY.argmax(axis=1), predIdxs,
	target_names=lb.classes_))
    
# serialize the model to disk
print("[INFO] saving mask detector model...")
model.save(config.MODEL_PATH, save_format="h5")

# serialize the label encoder to disk
print("[INFO] saving label encoder...")
f = open(config.ENCODER_PATH, "wb")
f.write(pickle.dumps(lb))
f.close()

# plot the training loss and accuracy
N = EPOCHS
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0, N), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, N), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, N), H.history["accuracy"], label="train_acc")
plt.plot(np.arange(0, N), H.history["val_accuracy"], label="val_acc")
plt.title("Training Loss and Accuracy")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
plt.savefig(args["plot"])

5. detect_object_rcnn.py

# import the necessary packages
from pyimagesearch.nms import non_max_suppression
from pyimagesearch import config
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.models import load_model
import numpy as np
import argparse
import imutils
import pickle
import cv2

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
	help="path to input image")
args = vars(ap.parse_args())

# load the our fine-tuned model and label binarizer from disk
print("[INFO] loading model and label binarizer...")
model = load_model(config.MODEL_PATH)
lb = pickle.loads(open(config.ENCODER_PATH, "rb").read())

# load the input image from disk
image = cv2.imread(args["image"])
image = imutils.resize(image, width=500)

# run selective search on the image to generate bounding box proposal
# regions
print("[INFO] running selective search...")
ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation()
ss.setBaseImage(image)
ss.switchToSelectiveSearchFast()
rects = ss.process()

# initialize the list of region proposals that we'll be classifying
# along with their associated bounding boxes
proposals = []
boxes = []

# loop over the region proposal bounding box coordinates generated by
# running selective search
for (x, y, w, h) in rects[:config.MAX_PROPOSALS_INFER]:
	# extract the region from the input image, convert it from BGR to
	# RGB channel ordering, and then resize it to the required input
	# dimensions of our trained CNN
	roi = image[y:y + h, x:x + w]
	roi = cv2.cvtColor(roi, cv2.COLOR_BGR2RGB)
	roi = cv2.resize(roi, config.INPUT_DIMS,
		interpolation=cv2.INTER_CUBIC)
        
	# further preprocess the ROI
	roi = img_to_array(roi)
	roi = preprocess_input(roi)
    
	# update our proposals and bounding boxes lists
	proposals.append(roi)
	boxes.append((x, y, x + w, y + h))
    
# convert the proposals and bounding boxes into NumPy arrays
proposals = np.array(proposals, dtype="float32")
boxes = np.array(boxes, dtype="int32")
print("[INFO] proposal shape: {}".format(proposals.shape))

# classify each of the proposal ROIs using fine-tuned model
print("[INFO] classifying proposals...")
proba = model.predict(proposals)

# find the index of all predictions that are positive for the
# "raccoon" class
print("[INFO] applying NMS...")
labels = lb.classes_[np.argmax(proba, axis=1)]
idxs = np.where(labels == "raccoon")[0]

# use the indexes to extract all bounding boxes and associated class
# label probabilities associated with the "raccoon" class
boxes = boxes[idxs]
proba = proba[idxs][:, 1]

# further filter indexes by enforcing a minimum prediction
# probability be met
idxs = np.where(proba >= config.MIN_PROBA)
boxes = boxes[idxs]
proba = proba[idxs]

# clone the original image so that we can draw on it
clone = image.copy()

# loop over the bounding boxes and associated probabilities
for (box, prob) in zip(boxes, proba):
	# draw the bounding box, label, and probability on the image
	(startX, startY, endX, endY) = box
	cv2.rectangle(clone, (startX, startY), (endX, endY),
		(0, 255, 0), 2)
	y = startY - 10 if startY - 10 > 10 else startY + 10
	text= "Raccoon: {:.2f}%".format(prob * 100)
	cv2.putText(clone, text, (startX, y),
		cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2)
        
# show the output after *before* running NMS
cv2.imshow("Before NMS", clone)

# run non-maxima suppression on the bounding boxes
boxIdxs = non_max_suppression(boxes, 0.1)

# loop over the bounding box indexes
for i in range(len(boxIdxs)):
	# draw the bounding box, label, and probability on the image
	(startX, startY, endX, endY) = boxes[i]
	cv2.rectangle(image, (startX, startY), (endX, endY),
		(0, 255, 0), 2)
	y = startY - 10 if startY - 10 > 10 else startY + 10
	text= "Raccoon: {:.2f}%".format(proba[i] * 100)
	cv2.putText(image, text, (startX, y),
		cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2)
        
# show the output image *after* running NMS
cv2.imshow("After NMS", image)
cv2.waitKey(0)

6. nms.py

# import the necessary packages
import numpy as np

#  Felzenszwalb et al.
def non_max_suppression(boxes, overlapThresh):
	# if there are no boxes, return an empty list

	if len(boxes) == 0:
		return []
        
	# initialize the list of picked indexes
	pick = []
    
	# grab the coordinates of the bounding boxes
	x1 = boxes[:,0]
	y1 = boxes[:,1]
	x2 = boxes[:,2]
	y2 = boxes[:,3]
    
	# compute the area of the bounding boxes and sort the bounding
	# boxes by the bottom-right y-coordinate of the bounding box
	area = (x2 - x1 + 1) * (y2 - y1 + 1)
	idxs = np.argsort(y2)
    
	# keep looping while some indexes still remain in the indexes
	# list
	while len(idxs) > 0:
		# grab the last index in the indexes list, add the index
		# value to the list of picked indexes, then initialize
		# the suppression list (i.e. indexes that will be deleted)
		# using the last index
		last = len(idxs) - 1
		i = idxs[last]
		pick.append(i)
		suppress = [last]

		# loop over all indexes in the indexes list
		for pos in range(0, last):
			# grab the current index
			j = idxs[pos]
            
			# find the largest (x, y) coordinates for the start of
			# the bounding box and the smallest (x, y) coordinates
			# for the end of the bounding box
			xx1 = max(x1[i], x1[j])
			yy1 = max(y1[i], y1[j])
			xx2 = min(x2[i], x2[j])
			yy2 = min(y2[i], y2[j])
            
			# compute the width and height of the bounding box
			w = max(0, xx2 - xx1 + 1)
			h = max(0, yy2 - yy1 + 1)
            
			# compute the ratio of overlap between the computed
			# bounding box and the bounding box in the area list
			overlap = float(w * h) / area[j]

			# if there is sufficient overlap, suppress the
			# current bounding box
			if overlap > overlapThresh:
				suppress.append(pos)
                
		# delete all indexes from the index list that are in the
		# suppression list
		idxs = np.delete(idxs, suppress)
        
	# return only the bounding boxes that were picked
	return boxes[pick]

Non-Maximum Suppression for Object Detection in Python - PyImageSearch

# import the necessary packages
import numpy as np

# Malisiewicz et al.
def non_max_suppression_fast(boxes, overlapThresh):
	# if there are no boxes, return an empty list
	if len(boxes) == 0:
		return []
        
	# if the bounding boxes integers, convert them to floats --
	# this is important since we'll be doing a bunch of divisions
	if boxes.dtype.kind == "i":
		boxes = boxes.astype("float")
        
	# initialize the list of picked indexes	
	pick = []
    
	# grab the coordinates of the bounding boxes
	x1 = boxes[:,0]
	y1 = boxes[:,1]
	x2 = boxes[:,2]
	y2 = boxes[:,3]
    
	# compute the area of the bounding boxes and sort the bounding
	# boxes by the bottom-right y-coordinate of the bounding box
	area = (x2 - x1 + 1) * (y2 - y1 + 1)
	idxs = np.argsort(y2)
    
	# keep looping while some indexes still remain in the indexes
	# list
	while len(idxs) > 0:
		# grab the last index in the indexes list and add the
		# index value to the list of picked indexes
		last = len(idxs) - 1
		i = idxs[last]
		pick.append(i)
        
		# find the largest (x, y) coordinates for the start of
		# the bounding box and the smallest (x, y) coordinates
		# for the end of the bounding box
		xx1 = np.maximum(x1[i], x1[idxs[:last]])
		yy1 = np.maximum(y1[i], y1[idxs[:last]])
		xx2 = np.minimum(x2[i], x2[idxs[:last]])
		yy2 = np.minimum(y2[i], y2[idxs[:last]])
        
		# compute the width and height of the bounding box
		w = np.maximum(0, xx2 - xx1 + 1)
		h = np.maximum(0, yy2 - yy1 + 1)
        
		# compute the ratio of overlap
		overlap = (w * h) / area[idxs[:last]]
        
		# delete all indexes from the index list that have
		idxs = np.delete(idxs, np.concatenate(([last],
			np.where(overlap > overlapThresh)[0])))
            
	# return only the bounding boxes that were picked using the
	# integer data type
	return boxes[pick].astype("int")

(Faster) Non-Maximum Suppression in Python - PyImageSearch

Non-maximum suppression is crucial for HOG + Linear SVM object detection systems. Learn how to obtain a 100x speedup when applying non-maximum suppression.

www.pyimagesearch.com

cf> annotations 폴더에서 xml 파일 중 만약 너구리가 두 마리 있다고 하면 다음과 같은 형식으로 박스가 두 개 있다.

저작자표시 비영리 변경금지 (새창열림)

'Coding > Image' 카테고리의 다른 글

RCNN Miscellaneous Repository (0)	2021.04.15
NMS (Non-Maximum Suppression) (0)	2021.04.14
[3] Region Proposal Object Detection (0)	2021.04.12
[2] Selective Search (0)	2021.04.12
[1] CNN Image Classifier to Object Detector (0)	2021.04.12

linguana

고정 헤더 영역

메뉴 레이어

메뉴 리스트

검색 레이어

검색 영역

상세 컨텐츠

본문 제목

본문

'Coding > Image' 카테고리의 다른 글

관련글 더보기

추가 정보

인기글

최신글

티스토리툴바