Pytorch를 이용한 YOLO v3 논문 구현 #2

Notice

Recent Posts

Recent Comments

Link

유튜브 채널

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

yusukaid's IT note

Pytorch를 이용한 YOLO v3 논문 구현 #2 - 신경망 순전파 구현 본문

YOLO

Pytorch를 이용한 YOLO v3 논문 구현 #2 - 신경망 순전파 구현

yusukaid__ 2022. 9. 27. 04:38

※본 포스팅은 아래 블로그를 참조해 번역하고 공부한 것입니다.

https://blog.paperspace.com/how-to-implement-a-yolo-object-detector-in-pytorch/

Tutorial on implementing YOLO v3 from scratch in PyTorch

Tutorial on building YOLO v3 detector from scratch detailing how to create the network architecture from a configuration file, load the weights and designing input/output pipelines.

blog.paperspace.com

전 포스팅에 이어, darknet.py를 계속 작성합니다.

신경망 정의

#신경망 정의
class Darknet(nn.Module):
    def __init__(self, cfgfile):
        super(Darknet, self).__init__()
        self.blocks = parse_cfg(cfgfile)
        self.net_info, self.module_list = create_modules(self.blocks)

pytorch로 아키텍처를 구축하기 위해 nn.Module class를 사용한다. nn.Module의 서브클래스로 지정하고, darknet class로 이름을 정한다. 신경망을 blocks, net_info, module_list로 초기화한다.

신경망 순전파 구현

신경망의 순전파는 nn.Module 클래스의 forward 매서드를 오버라이딩함으로써 구현한다. forward의 목적은 다음과 같다.

output 계산
output detection feature map을 처리하기 쉬운 방법으로 변환

    def forward(self, x, CUDA):
        modules = self.blocks[1:]
        outputs = {}   #route layer에 대한 출력값 저장

forward는 self, input, x, 순전파를 빠르게 처리하기 위해 CUDA의 세가지 인자를 취한다.

self.blocks의 첫 번째 요소는 순전파가 아닌 net block이기 때문에, self.blocks[1:]을 반복한다.
route와 shorcut 레이어는 이전 레이어의 output map이 필요하기 때문에, 모든 레이어의 output map을 갖는 outputrs를 선언한다.

        write = 0
        for i, module in enumerate(modules):        
            module_type = (module["type"])

신경망의 모듈을 포함하고 있는 module_list를 반복한다.

Convolutional and Upsample Layers

            if module_type == "convolutional" or module_type == "upsample":
                x = self.module_list[i](x)

모듈이 convolutional이나 upsample인 경우, 순전파가 작동한다.

Route Layer / Shortcut Layer

            elif module_type == "route":
                layers = module["layers"]
                layers = [int(a) for a in layers]
    
                if (layers[0]) > 0:
                    layers[0] = layers[0] - i
    
                if len(layers) == 1:
                    x = outputs[i + (layers[0])]
    
                else:
                    if (layers[1]) > 0:
                        layers[1] = layers[1] - i
    
                    map1 = outputs[i + layers[0]]
                    map2 = outputs[i + layers[1]]
                    x = torch.cat((map1, map2), 1)
                
    
            elif  module_type == "shortcut":
                from_ = int(module["from"])
                x = outputs[i-1] + outputs[i+from_]

만약 두 가지 feature maps를 연결해야 하는 경우, 두 번째 인자를 1로 하여 torch.cat 함수를 사용한다. feature maps는 깊이에 따라 연결되기 때문이다.

YOLO (Detection Layer)

YOLO의 출력값은 convolutional feature map이다. 이를 수식으로 간단히 표현하면 5x(B+C)와 같다.

YOLO v3에서는 세 개의 scale에 detection이 발생하기 때문에, prediction map의 차원이 서로 다르다. 그러나 출력값 처리 연산은 동일하기 때문에 연산에 문제가 발생할 수 있다. 이를 해결하기 위해 함수 predict_transform을 정의한다.

util.py

from __future__ import division

import torch 
import torch.nn as nn
import torch.nn.functional as F 
from torch.autograd import Variable
import numpy as np
import cv2

필요한 라이브러리를 import.

def predict_transform(prediction, inp_dim, anchors, num_classes, CUDA = True):

predict_transform은 5개의 매개변수를 가진다. 순서대로 출력값, input 이미지의 차원, anchors, num_classes, cuda flag 이다.

이 함수는 detection feature map을 취하고 이를 2D tensor로 변경한다. 2D tensor는 아래 그림의 순서대로 바운딩 박스들의 속성에 해당하는 tensor의 각 행으로 이루어져 있다.

    batch_size = prediction.size(0)
    stride =  inp_dim // prediction.size(2)
    grid_size = inp_dim // stride
    bbox_attrs = 5 + num_classes
    num_anchors = len(anchors)
    
    prediction = prediction.view(batch_size, bbox_attrs*num_anchors, grid_size*grid_size)
    prediction = prediction.transpose(1,2).contiguous()
    prediction = prediction.view(batch_size, grid_size*grid_size*num_anchors, bbox_attrs)

이를 변환하기 위한 코드이다.

    anchors = [(a[0]/stride, a[1]/stride) for a in anchors]

anchors의 차원은 net block의 높이와 너비 속성에 해당한다. 이 속성들은 stride 인자에 의해 더 큰 input 이미지의 차원을 나타낸다. 그렇기 때문에 detection feature map의 stride로 anchors를 나누는 코드를 위와 같이 작성한다.

    #중심 x, y 좌표와 object confidence를 sigmoid
    prediction[:,:,0] = torch.sigmoid(prediction[:,:,0])
    prediction[:,:,1] = torch.sigmoid(prediction[:,:,1])
    prediction[:,:,4] = torch.sigmoid(prediction[:,:,4])

챕터 0에서 설명한 이론과 같게, 각 좌표와 object confidence를 sigmoid 수행하는 코드다.

    #중심 offset 추가
    grid = np.arange(grid_size)
    a,b = np.meshgrid(grid, grid)

    x_offset = torch.FloatTensor(a).view(-1,1)
    y_offset = torch.FloatTensor(b).view(-1,1)

    if CUDA:
        x_offset = x_offset.cuda()
        y_offset = y_offset.cuda()

    x_y_offset = torch.cat((x_offset, y_offset), 1).repeat(1,num_anchors).view(-1,2).unsqueeze(0)

    prediction[:,:,:2] += x_y_offset

중심 좌표 예측에 gird offset을 추가하는 코드.

    #높이와 너비를 log space 변환
    anchors = torch.FloatTensor(anchors)

    if CUDA:
        anchors = anchors.cuda()

    anchors = anchors.repeat(grid_size*grid_size, 1).unsqueeze(0)
    prediction[:,:,2:4] = torch.exp(prediction[:,:,2:4])*anchors

anchors를 바운딩 박스의 차원에 적용하는 코드.

   #class score에 sigmoid 적용
    prediction[:,:,5: 5 + num_classes] = torch.sigmoid((prediction[:,:, 5 : 5 + num_classes]))

class score에 sigmoid를 적용하는 코드.

    #detection map을 input 이미지의 크기로 resize
    prediction[:,:,:4] *= stride
    
    return prediction

최종적으로, 입력 이미지의 크기로 detection map을 resize해주는 작업이 필요하다. 바운딩 박스 속성은 feature map에 따라 크기가 정해지기 때문에, *= stride 연산을 수행한다. 그 후, 루프를 종료하며 prediction을 마지막으로 반환한다.

다시 YOLO Layer로

            elif module_type == 'yolo':        
                anchors = self.module_list[i][0].anchors
                #input 얻기
                inp_dim = int (self.net_info["height"])
        
                #클래스 수 얻기
                num_classes = int (module["classes"])
        
                #predict_transform 함수를 불러와서 반환 
                x = x.data
                x = predict_transform(x, inp_dim, anchors, num_classes, CUDA)
                if not write:              #collector가 초기화 되지 않은 경우
                    detections = x
                    write = 1
        
                else:       
                    detections = torch.cat((detections, x), 1)
        
            outputs[i] = x
        
        return detections

empty tensor를 초기화 할 수 없으므로, 다른 형태의 tensor를 연결해야 한다. 따라서 첫 번째 detection map을 얻을 때까지 collector (detecton을 포함한 tensor)의 초기화를 지연시킨다. 그 후 연속적인 detection map 값을 얻을 때, 이를 map으로 연결한다.

만약 forward 함수에서 write가 0이면, collector가 초기화 되지 않은 것을 의미한다. 반대로 write가 1이면 초기화된 것을 의미한다. 그 후, detections를 반환하며 함수가 종료된다.

'YOLO' 카테고리의 다른 글

Pytorch를 이용한 YOLO v3 논문 구현 #4 - NMS 구현과 objectness score (0)	2022.09.27
Pytorch를 이용한 YOLO v3 논문 구현 #3 - weights file 이해하기 (0)	2022.09.27
Pytorch를 이용한 YOLO v3 논문 구현 #1 - 신경망 구조의 계층 생성 (22.10.04 수정) (0)	2022.09.27
Pytorch를 이용한 YOLO v3 논문 구현 #0 - YOLO 구조 파악하기 (0)	2022.09.27
TensorFlow를 이용한 YOLO v1 논문 구현 #10 - 리뷰 (0)	2022.07.12

'YOLO' Related Articles