Pytorch를 이용한 YOLO v3 논문 구현 #5

Notice

Recent Posts

Recent Comments

Link

« 2024/12 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Tags more

Archives

Today

Total

관리 메뉴

사냥꾼의 IT 노트

Pytorch를 이용한 YOLO v3 논문 구현 #5 - input, output 본문

YOLO

Pytorch를 이용한 YOLO v3 논문 구현 #5 - input, output

가면 쓴 사냥꾼 2022. 9. 27. 06:12

※본 포스팅은 아래 블로그를 참조해 번역하고 공부한 것입니다.

https://blog.paperspace.com/how-to-implement-a-yolo-object-detector-in-pytorch/

Tutorial on implementing YOLO v3 from scratch in PyTorch

Tutorial on building YOLO v3 detector from scratch detailing how to create the network architecture from a configuration file, load the weights and designing input/output pipelines.

blog.paperspace.com

마무리 단계입니다. 이번 챕터에서는 이미지를 읽고 예측을 수행한 후 바운딩 박스를 그리는 로직을 수행합니다. 모든 코드는 detector.py에서 진행합니다.

필요한 라이브러리 import

from __future__ import division
import time
import torch 
import torch.nn as nn
from torch.autograd import Variable
import numpy as np
import cv2 
from util import *
import argparse
import os 
import os.path as osp
from darknet import Darknet
import pickle as pkl
import pandas as pd
import random

util과 darknet에서 힘든게 만든 함수들도 불러옵시다.

실행에 도움을 주는 인자 생성

def arg_parse():
    """
    detect module에 대한 인자 분석
    
    """
    
    parser = argparse.ArgumentParser(description='YOLO v3 Detection Module')
   
    parser.add_argument("--images", dest = 'images', help = 
                        "Image / Directory containing images to perform detection upon",
                        default = "imgs", type = str)
    parser.add_argument("--det", dest = 'det', help = 
                        "Image / Directory to store detections to",
                        default = "det", type = str)
    parser.add_argument("--bs", dest = "bs", help = "Batch size", default = 1)
    parser.add_argument("--confidence", dest = "confidence", help = "Object Confidence to filter predictions", default = 0.5)
    parser.add_argument("--nms_thresh", dest = "nms_thresh", help = "NMS Threshhold", default = 0.4)
    parser.add_argument("--cfg", dest = 'cfgfile', help = 
                        "Config file",
                        default = "cfg/yolov3.cfg", type = str)
    parser.add_argument("--weights", dest = 'weightsfile', help = 
                        "weightsfile",
                        default = "yolov3.weights", type = str)
    parser.add_argument("--reso", dest = 'reso', help = 
                        "Input resolution of the network. Increase to increase accuracy. Decrease to increase speed",
                        default = "416", type = str)
    
    return parser.parse_args()
    
args = arg_parse()
images = args.images
batch_size = int(args.bs)
confidence = float(args.confidence)
nms_thesh = float(args.nms_thresh)
start = 0
CUDA = torch.cuda.is_available()

detector.py는 최종적으로 검출을 수행하는 파일이기 때문에, 실행자가 보기에 편한 커맨드 라인의 인자가 필요합니다. 이를 위해 ArgParse 모듈을 선언합니다.

images: input 이미지 및 이미지의 directory를 명시하기 위해 사용
det: detectors를 저장하는 directory
reso: input 이미지의 해상도, 속더-정확도의 균형을 위해 사용
cfg: configuration file
weight file

COCO dataset 및 신경망 불러오기

num_classes = 80
classes = load_classes("data/coco.names")

본 구현과 논문에서는 coco dataset을 사용한다. coco dataset의 클래스는 총 80개이며, 이를 위해 파일을https://raw.githubusercontent.com/ayooshkathuria/YOLO_v3_tutorial_from_scratch/master/data/coco.names

에서 다운 받고 detector directory에 data 폴더를 생성한다.

def load_classes(namesfile):
    fp = open(namesfile, "r")
    names = fp.read().split("\n")[:-1]
    return names

이 함수는 util.py에서 정의된 함수로, 모든 coco class의 index와 이름 string을 매핑하는 dictionary를 반환하는 함수다.

#신경망 설정
print("Loading network.....")
model = Darknet(args.cfgfile)
model.load_weights(args.weightsfile)
print("Network successfully loaded")

model.net_info["height"] = args.reso
inp_dim = int(model.net_info["height"])
assert inp_dim % 32 == 0 
assert inp_dim > 32

#GPU 이용
if CUDA:
    model.cuda()


#evaluation mode로 설정
model.eval()

신경망을 초기화한 후 설정에 weights를 불러온다. cuda를 이용하면 속도가 기하급수적으로 빨라지기 때문에, 이를 이용해 evaluation mode로 설정해주는 코드를 추가한다.

Input 이미지 읽기

read_dir = time.time()

try:
    imlist = [osp.join(osp.realpath('.'), images, img) for img in os.listdir(images)]
except NotADirectoryError:
    imlist = []
    imlist.append(osp.join(osp.realpath('.'), images))
except FileNotFoundError:
    print ("No file or directory with the name {}".format(images))
    exit()

read_dir은 시간을 측정하기 위해 사용하는 모듈이다. detection을 진행하는데 시간을 측정하기 위해서이다.
input 이미지를 읽고, 그 경로는 imlist에 저장된다.

if not os.path.exists(args.det):
    os.makedirs(args.det)

만약 정의된 detections을 저장하기 위한 디렉토리가 없다면, 이를 생성해주기 위한 로직이다.

load_batch = time.time()
loaded_ims = [cv2.imread(x) for x in imlist]

이미지를 불러오기 위해 OpenCV를 사용한 코드.

def letterbox_image(img, inp_dim):
    '''padding을 사용해 aspect ratio가 변하지 않고 이미지를 resize 함'''
    img_w, img_h = img.shape[1], img.shape[0]
    w, h = inp_dim
    new_w = int(img_w * min(w/img_w, h/img_h))
    new_h = int(img_h * min(w/img_w, h/img_h))
    resized_image = cv2.resize(img, (new_w,new_h), interpolation = cv2.INTER_CUBIC)
    
    canvas = np.full((inp_dim[1], inp_dim[0], 3), 128)

    canvas[(h-new_h)//2:(h-new_h)//2 + new_h,(w-new_w)//2:(w-new_w)//2 + new_w,  :] = resized_image
    
    return canvas

이는 input 이미지를 resize하고, aspect ratio를 일정하게 유지하며 남은 영역을 (128, 128, 128) 색상으로 패딩하는 함수이다. 이 함수를 작성하는 이유는 OpenCV는 이미지를 BGR 순서의 numpy array 형태로 불러오지만 pytorch의 이미지 형식은 RGB 순서를 지닌 (Batchse*Channels*Height*Width)이다. 그러므로 numpy array를 pytorch의 형식으로 변환하기 위해 위 함수와 prep_image를 util.py에 정의한다.

def prep_image(img, inp_dim):
    """
    신경망에 입력하기 위한 이미지 
    변수 반환
    
    Returns a Variable 
    """
    img = (letterbox_image(img, (inp_dim, inp_dim)))
    img = img[:,:,::-1].transpose((2,0,1)).copy()
    img = torch.from_numpy(img).float().div(255.0).unsqueeze(0)
    return img

#pytorch variables
im_batches = list(map(prep_image, loaded_ims, [inp_dim for x in range(len(imlist))]))
#기존 이미지의 차원을 갖는 list
im_dim_list = [(x.shape[1], x.shape[0]) for x in loaded_ims]
im_dim_list = torch.FloatTensor(im_dim_list).repeat(1,2)

#배치 생성
leftover = 0
if (len(im_dim_list) % batch_size):
    leftover = 1

if batch_size != 1:
    num_batches = len(imlist) // batch_size + leftover            
    im_batches = [torch.cat((im_batches[i*batch_size : min((i +  1)*batch_size,
                        len(im_batches))]))  for i in range(num_batches)]  

write = 0


if CUDA:
    im_dim_list = im_dim_list.cuda()

기존 이미지의 차원과 이미지를 유지하기 위해 list를 생성한다. 그 후, 배치를 생성하는 로직이 위와 같다.

Detection Loop

start_det_loop = time.time()
for i, batch in enumerate(im_batches):
    #이미지 불러오기
    start = time.time()
    if CUDA:
        batch = batch.cuda()
    with torch.no_grad():
        prediction = model(Variable(batch), CUDA)

    prediction = write_results(prediction, confidence, num_classes, nms_conf = nms_thesh)

    end = time.time()

    if type(prediction) == int:

        for im_num, image in enumerate(imlist[i*batch_size: min((i +  1)*batch_size, len(imlist))]):
            im_id = i*batch_size + im_num
            print("{0:20s} predicted in {1:6.3f} seconds".format(image.split("/")[-1], (end - start)/batch_size))
            print("{0:20s} {1:s}".format("Objects Detected:", ""))
            print("----------------------------------------------------------")
        continue

    prediction[:,0] += i*batch_size    #배치 index 속성을 imlist index 속성으로 변환

    if not write:                      #output이 초기화 되지 않은 경우
        output = prediction  
        write = 1
    else:
        output = torch.cat((output,prediction))

    for im_num, image in enumerate(imlist[i*batch_size: min((i +  1)*batch_size, len(imlist))]):
        im_id = i*batch_size + im_num
        objs = [classes[int(x[-1])] for x in output if int(x[0]) == im_id]
        print("{0:20s} predicted in {1:6.3f} seconds".format(image.split("/")[-1], (end - start)/batch_size))
        print("{0:20s} {1:s}".format("Objects Detected:", " ".join(objs)))
        print("----------------------------------------------------------")

    if CUDA:
        torch.cuda.synchronize()

배치를 반복하고 prediction을 생성해 detection을 수행한 모든 이미지의 prediction tensors를 연결한다.

각 배치에서 입력을 받고 write_result 함수의 출력이 생성되기까지 걸리는 시간을 측정한다.
모든 이미지의 주소를 갖고 있는 imlist에 있는 이미지의 index를 나타내는 방식으로 변환한다.
각 이미지에서 검출된 객체뿐만 아니라 각 detection에 걸리는 시간을 출력한다.
만약 wirte_results 함수의 출력값이 int(0)이면, detection이 존재하지 않다고 판단해 반복 수행을 continue (건너뛰기)한다.

torch.cuda.synchronize()는 cuda kernel을 cpu와 동기화하기 위한 코드이다. 이 작업이 진행돼야 gpu가 대기중이 되면 바로 cpu에 컨트롤을 되돌리고, gpu 작업이 끝나기 전에 cpu에 컨트롤을 되롤린다.

이미지에 bounding boxes 그리기

try:
    output
except NameError:
    print ("No detections were made")
    exit()

바운딩 박스를 그리기 전에, detection이 생성됐는지 안 됐는지 확인하기 위해 위와 같은 코드를 작성한다.

#바운딩 박스 좌표 변환
im_dim_list = torch.index_select(im_dim_list, 0, output[:,0].long())

scaling_factor = torch.min(416/im_dim_list,1)[0].view(-1,1)


output[:,[1,3]] -= (inp_dim - scaling_factor*im_dim_list[:,0].view(-1,1))/2
output[:,[2,4]] -= (inp_dim - scaling_factor*im_dim_list[:,1].view(-1,1))/2

출력 tensor에 포함된 predictions는 기존 이미지의 크기가 아닌 신경망의 input 사이즈를 따른다. 따라서 바운딩 박스를 그리기 전에, 각 바운딩 박스의 꼭지점 인자를 원래 이미지의 원래 차원으로 변환한다.

predictions 또한 패딩된 이미지로 얻은 값들이기 때문에 패딩된 이미지에 대한 경계를 고려하여 바운딩 박스의 좌표를 변환한다.

#기존 이미지 바운딩 박스의 좌표를 위해 되돌림
output[:,1:5] /= scaling_factor

letterbox_image 함수에서 scaling 인자에 의해 이미지를 resize했다. 기존 이미지의 바운딩 박스 좌표를 얻기 위해 이 resize를 원본으로 되돌린다.

#바운딩 박스 잘라내기
for i in range(output.shape[0]):
    output[i, [1,3]] = torch.clamp(output[i, [1,3]], 0.0, im_dim_list[i,0])
    output[i, [2,4]] = torch.clamp(output[i, [2,4]], 0.0, im_dim_list[i,1])
    
    
output_recast = time.time()
class_load = time.time()
colors = pkl.load(open("pallete", "rb"))

만약 이미지에 많은 바운딩 박스가 있다면, 색이 다양한 것이 가시성에 좋기 때문에 pickled file을 detector 폴더에 다운로드 하기 위한 로직이다.

draw = time.time()


def write(x, results):
    c1 = tuple(x[1:3].cpu().int().numpy())
    c2 = tuple(x[3:5].cpu().int().numpy())
    img = results[int(x[0])]
    cls = int(x[-1])
    color = random.choice(colors)
    label = "{0}".format(classes[cls])
    cv2.rectangle(img, c1, c2,color, 3)
    t_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_PLAIN, 1 , 1)[0]
    c2 = c1[0] + t_size[0] + 3, c1[1] + t_size[1] + 4
    cv2.rectangle(img, c1, c2,color, -3)
    cv2.putText(img, label, (c1[0], c1[1] + t_size[1] + 4), cv2.FONT_HERSHEY_PLAIN, 1, [225,255,255], 1);
    return img

바운딩 박스를 그리는 함수 정의이다. colors로부터 임의의 색상을 받아 사각형을 그리며, 왼쪽 상단에 클래스가 적혀있는 사각형도 그린다. cv2.rectangle의 -3 인자는 색칠된 사각형을 생성한다.

여기서 사각형의 두께를 조절하기 위해서는, 간단하게 rectangle의 3, -3 인자를 원하는 숫자로 바꿔주기만 하면 된다.

list(map(lambda x: write(x, loaded_ims), output))

loaded_ims 안에 있는 이미지들을 차례대로 수정.

det_names = pd.Series(imlist).apply(lambda x: "{}/det_{}".format(args.det,x.split("/")[-1]))

이미지 이름의 앞에 'det_'를 고정적으로 추가하여 저장.
detection 이미지들을 저장할 경로 생성

list(map(cv2.imwrite, det_names, loaded_ims))


end = time.time()

detections를 가진 이미지들을 det_names 형태로 저장.

시간 요약본 출력

print("SUMMARY")
print("----------------------------------------------------------")
print("{:25s}: {}".format("Task", "Time Taken (in seconds)"))
print()
print("{:25s}: {:2.3f}".format("Reading addresses", load_batch - read_dir))
print("{:25s}: {:2.3f}".format("Loading batch", start_det_loop - load_batch))
print("{:25s}: {:2.3f}".format("Detection (" + str(len(imlist)) +  " images)", output_recast - start_det_loop))
print("{:25s}: {:2.3f}".format("Output Processing", class_load - output_recast))
print("{:25s}: {:2.3f}".format("Drawing Boxes", end - draw))
print("{:25s}: {:2.3f}".format("Average time_per_img", (end - load_batch)/len(imlist)))
print("----------------------------------------------------------")

detector.py에 로직이 얼마나 오랫동안 실행됐는지를 나타내는 시간 요약본을 출력한다. 이는 detector의 속도에 하이퍼 파라미터가 얼마나 영향을 미쳤는지 비교할 때 유용하다. 주요 하이퍼파라미터는 다음과 같다.

batch size
objectness confidence
NMS threshold

이 하이퍼파라미터들은 detection.py가 실행되는 동안에 설정이 가능하다.

코드 구현은 이 챕터가 마지막입니다. 다음은 v3의 마지막 챕터가 되겠네요.

'YOLO' 카테고리의 다른 글

Human Respiratory Signals based on 1D YOLO Model (0)	2024.06.06
Pytorch를 이용한 YOLO v3 논문 구현 #6 - 최종 실행 (0)	2022.09.27
Pytorch를 이용한 YOLO v3 논문 구현 #4 - NMS 구현과 objectness score (0)	2022.09.27
Pytorch를 이용한 YOLO v3 논문 구현 #3 - weights file 이해하기 (0)	2022.09.27
Pytorch를 이용한 YOLO v3 논문 구현 #2 - 신경망 순전파 구현 (0)	2022.09.27

'YOLO' Related Articles